Poor performance on EC2

,

I’m wondering if anyone has any suggestions or pointers for troubleshooting poor performance on an EC2 instance.

Using top, I noticed that CPU usage was way up (e.g. 200%) – that was easy to fix: I had put a huge pool size for the PostGres instance. Once I brought that down and restarted the app, top showed everything as normal.

I started processing files – there are maybe 100 long txt files, each with maybe a quarter million lines. The script is mostly formatting each line and streaming to an output file using a pattern like this:

input_files = [
   # list of input files
]

Task.Supervisor.async_stream_nolink(
  TmpTaskSupervisor,
  input_files,
  fn input_file ->

    {:ok, output_file} = File.open(input_file <> ".parsed", [:append, {:delayed_write, 500, 200}])

    lines = File.stream!(input_file)

    lines
    |> Stream.with_index()
    |> Stream.each(fn {line, _index} ->

       {:ok, parsed} = parse(line)
       IO.binwrite(output_file, Jason.encode!(parsed) <> "\n")
       
    end)
    |> Stream.run()
  end,
  timeout: 100_000,
  max_concurrency: 25
)
|> Enum.to_list()

I’ve played around with the max_concurrency… the performance seems to stay more or less the same.

On my local laptop, this completes in a couple hours. But on EC2, this takes days and days. The input and output files are on an attached EFS volume – I’m not sure if that makes any difference. I tried writing to the local volume where the app is running, but there was no change in the performance.

What is significant is that on my local laptop, top shows that BEAM is gobbling up CPU – it spikes up to 200%. But on the EC2 instance, top shows very little activity for CPU, e.g. maybe 30% or 40% tops.

Can anyone recommend some other ways to troubleshoot a performance issue like this?

Thanks in advance!

This thread may be of interest to you.

I would try out File.stream! and doing more lines at once:

Getting fast I/O stream processing in Elixir does require some non-obvious tweaks. Due to the way I/O works on the BEAM (there’s a special process that does I/O and passes results to your process as a message), you want the messages to be as long as possible to avoid overheads.

One thing I’ve been bitten by before with AWS services is how they scale bandwidth / IOPS relative to size - for instance, an RDS database’s write performance is directly related!

There’s similar language in the EFS docs:

In Bursting Throughput mode, the base throughput is proportionate to the file system’s size in the EFS Standard storage class, at a rate of 50 KiBps per each GiB of storage.

(below the chart) Amazon EFS provides a metered throughput of 1 MiBps to all file systems, even if the baseline rate is lower.

You can tell for sure by watching the EFS metrics in CloudWatch while running this code. My prediction is that the writes eat all the credits and then you end up with floppy-disk-level IO performance.

You may get some benefit from bumping the buffer size for delayed_writes up, since EFS considers a 500-byte write request as taking 4kB from your quota (likely aggravating the issue above):

Every Network File System (NFS) request is accounted for as 4 kilobyte (KB) of throughput, or its actual request and response size, whichever is larger.

2 Likes

Thank you, these are both helpful posts. I think I am getting dinged by the weird taxi-cab metering here. Some of the writes are buried more deeply in the code and would be difficult to consolidate (e.g. via Stream.chunk_every/2), but I did bump up the read_ahead values and skipped a few extraneous writes and that did improve the performance (admittedly only to maybe 2x floppy-disk level IO). For a more thorough fix, I think I’d need to rope in some telemetry or some other side-effects so I can measure progress more effectively.

Side note: IO.binwrite supports IO data, so a low hanging-fruit that might speed it up could be replacing Jason.encode! by Jason.encode_to_iodata!:

IO.binwrite(output_file, [Jason.encode_to_iodata!(parsed), "\n"])

This function should be preferred to encode/2, if the generated JSON will be handed over to one of the IO functions or sent over the socket. The Erlang runtime is able to leverage vectorised writes and avoid allocating a continuous buffer for the whole resulting string, lowering memory use and increasing performance.

Ah, of course, that would help. Thank you.

However, the biggest problem remains the writing of millions of individual files (not shown in the example), and the difference between running this locally and running it on EC2 is jaw-dropping. I have yet to find a good AWS solution for dealing with millions of individual files.

Why would they need to be individual files?

The thing being built is a file manager: individual assets need to be tracked.

Not sure how much of an argument that is. S3 does rather well at tracking individual files and it’s not using a filesystem.

S3 failed horribly at this particular setup. If I write up a formal evaluation, I’ll post the numbers here.

I’m not trying to say S3 is perfect. My argument is that at a certain level the difference between a database and a filesystem become blurry and if one implementation of that is not fast enough another one might be more suitable.

The EFS docs explicitly suggest that “lots of small files” is a performance concern because of the protocol overhead.

What about writing to instance storage or EBS instead? Either of those should have closer-to-local performance, though IIRC there are similar bandwidth/size correlations for EBS.

I saw the same performance when writing to the local EC2 file system. Is that subject to the same limits as writing to EFS?