Performance with Explorer scales linearly with size and then suddenly degrades on large file

I was playing around with some real-life solutions for 1BRC using elixir and external libraries such as Flow and Explorer and found that Explorer’s performance is great but changes 10x (for worse) when dealing with 1bil lines vs 500mil or less.

Here’s the code with some comments:

defmodule WithExplorer do
  # Results
  # [
    # 1_000_000_000: 675483.000ms,
    #   500_000_000: 58244.713ms,
    #   100_000_000: 10321.046ms,
    #    50_000_000: 5104.949ms,
  # ]
  require Explorer.DataFrame
  alias Explorer.{DataFrame, Series}

  @filename "./data/measurements.txt"

  def run() do
    parent = self()

    results = @filename
    |> DataFrame.from_csv!(header: false, delimiter: ";", eol_delimiter: "\n")
    |> DataFrame.group_by("column_1")
    |> DataFrame.summarise(min: Series.min(column_2), mean: Series.mean(column_2), max: Series.max(column_2))
    |> DataFrame.arrange(column_1)

    # for idx <- 0..(results["column_1"] |> Series.to_list() |> length() |> Kernel.-(1)) do
    #   "#{results["column_1"][idx]}=#{results["min"][idx]}/#{:erlang.float_to_binary(results["mean"][idx], decimals: 2)}/#{results["max"][idx]}"
    # end
  end
end

What I observe is that CPUs are still busy but not fully utilized and suddenly a lot of disk IO shows up. I have some idea of what might be happening and wonder if there is a way to control this behavior from the high-level API or by compiling Explorer with some Polars specific options.

1 Like

Probably the data no longer fits in memory and then it is using disk swap? If that’s the case, that’s happening at the operating system level, so there isn’t much to control.

However, you can pass the :lazy Option to from_csv and then Call collect to perform the operation at once. It should go easier on the memory usage.

6 Likes

Thank you, José. Enabling :lazy cut the time in half. Your suggestion also made me read the docs with more attention and I found that I could set the floats to f32 instead of using f64, which had been automatically inferred.
This made the computation light enough to fit in memory and go even faster, regardless of lazy mode.

Results:

Reading and aggregating 1 Billion Lines with Explorer
- Eager f64:  675483.00ms
- Lazy (f64): 389491.00ms
- Lazy (f32):  53575.23ms 
- Eager f32:   55091.87ms
3 Likes

You got to love a forum that you can just happen to stumble into and read a post about making “checks notes” … a billion row csv “checks notes again”… run faster.

I love this place :smiling_face_with_three_hearts:

Edit: Oh dang also welcome new user @betoparcus

3 Likes

Thanks! Have been a reader for several years but this might indeed be my first post.

I love this place :smiling_face_with_three_hearts:

1 billion % agree.

2 Likes

what kind of hardware specs are getting you under a minute?

Me: “slaps hood” this baby can go from 0 to a billion in under a minute…

Good old gaming computer, but I’d love to try on the M1 too.

OS Name	Microsoft Windows 11 Home (wsl)
System Model	X570 AORUS ELITE WIFI
Processor	AMD Ryzen 9 3900X 12-Core Processor, 
            3801 Mhz, 12 Core(s), 24 Logical Processor(s)
Installed Physical Memory (RAM)	32.0 GB
1 Like

If you can provide the data and the code, I can benchmark it on M1 Max with 32GB RAM for you.

2 Likes

Thanks, Stefan. I just uploaded code and generator with some comments GitHub - rparcus/ex_1brc

1 Like

36130.104 :slight_smile:

$ elixir -v                                                                                                                                                    
Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Elixir 1.16.0 (compiled with Erlang/OTP 26)

1 Like

45592.203 on Mac Mini M2 Pro but with only 16GB of memory.

2 Likes