Hello everyone,
I’m currently working on a project that involves using Elixir to perform data analysis tasks. As part of this work, I’ve been experimenting with different libraries and tools, including Explorer and Nx.
To get a better sense of how these tools perform, I’ve been running some benchmarks on simple functions like mean, variance, and standard deviation. However, I’ve run into a strange issue when I try to combine these libraries - specifically, when I convert an Explorer.Series to an Nx.tensor and then use Nx functions like Nx.mean.
What I’ve found is that this combined operation is much slower than either operation alone, which seems counterintuitive. I’m not sure what’s causing this issue, but I suspect it could be due to inefficiencies in the conversion process, memory usage, or other performance bottlenecks in the code.
I’m reaching out to the community to see if anyone has experienced similar issues, or has any advice on how to improve the performance of this operation. I’d be grateful for any insights or suggestions you can offer.
Thank you in advance for your help!
defmodule Bench do
import Nx.Defn
deftransform mean_nx_series(series) do
Explorer.Series.to_tensor(series)
|> Bench.mean_nx()
end
defn mean_nx(tensor) do
Nx.mean(tensor)
end
def mean_explorer(series) do
Explorer.Series.mean(series)
end
end
bench_means =
Benchee.run(
%{
"explorer_mean" => fn -> Bench.mean_explorer(rand_series) end,
"nx_mean_s64" => fn -> Bench.mean_nx(rand_tensor_s64) end,
"nx_mean_s32" => fn -> Bench.mean_nx(rand_tensor_s32) end,
"nx_mean_s16" => fn -> Bench.mean_nx(rand_tensor_s16) end,
"nx_mean_of_series" => fn -> Bench.mean_nx(rand_series) end,
"nx_series_with_deftransform" => fn -> Bench.mean_nx_series(rand_series) end,
"converting_series_to_nx" => fn -> Explorer.Series.to_tensor(rand_series) end,
"pre_converting_series_to_nx_nx_mean" => fn -> Explorer.Series.to_tensor(rand_series) |> Bench.mean_nx() end
},
warmup: 1,
time: 2
)
Results using EXLA cuda backend. Series and Tensor has length 1million.
Operating System: Linux
CPU Information: AMD Ryzen 9 3900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.24 GB
Elixir 1.14.2
Erlang 25.2
Benchmark suite executing with the following configuration:
warmup: 1 s
time: 2 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s
Benchmarking converting_series_to_nx ...
Benchmarking explorer_mean ...
Benchmarking nx_mean_of_series ...
Benchmarking nx_mean_s16 ...
Benchmarking nx_mean_s32 ...
Benchmarking nx_mean_s64 ...
Benchmarking nx_series_with_deftransform ...
Benchmarking pre_converting_series_to_nx_nx_mean ...
Name ips average deviation median 99th %
converting_series_to_nx 73.64 K 13.58 μs ±40.03% 13.57 μs 17.90 μs
nx_mean_s16 5.56 K 179.73 μs ±57.99% 156.00 μs 774.57 μs
nx_mean_s32 4.86 K 205.92 μs ±51.52% 187.21 μs 775.33 μs
nx_mean_s64 4.20 K 238.37 μs ±46.83% 216.76 μs 814.91 μs
explorer_mean 1.30 K 770.52 μs ±2.35% 765.21 μs 852.85 μs
pre_converting_series_to_nx_nx_mean 0.0116 K 86121.10 μs ±14.73% 95941.01 μs 97585.60 μs
nx_series_with_deftransform 0.0112 K 89152.43 μs ±11.92% 95158.30 μs 101969.55 μs
nx_mean_of_series 0.0107 K 93060.98 μs ±10.03% 94988.81 μs 107701.51 μs
Comparison:
converting_series_to_nx 73.64 K
nx_mean_s16 5.56 K - 13.23x slower +166.15 μs
nx_mean_s32 4.86 K - 15.16x slower +192.34 μs
nx_mean_s64 4.20 K - 17.55x slower +224.79 μs
explorer_mean 1.30 K - 56.74x slower +756.94 μs
pre_converting_series_to_nx_nx_mean 0.0116 K - 6341.60x slower +86107.52 μs
nx_series_with_deftransform 0.0112 K - 6564.81x slower +89138.85 μs
nx_mean_of_series 0.0107 K - 6852.62x slower +93047.40 μs
And I did the same with EXLA cpu as backend:
Operating System: Linux
CPU Information: AMD Ryzen 9 3900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.24 GB
Elixir 1.14.2
Erlang 25.2
Benchmark suite executing with the following configuration:
warmup: 1 s
time: 2 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s
Benchmarking converting_series_to_nx ...
Benchmarking explorer_mean ...
Benchmarking nx_mean_of_series ...
Benchmarking nx_mean_s16 ...
Benchmarking nx_mean_s32 ...
Benchmarking nx_mean_s64 ...
Benchmarking nx_series_with_deftransform ...
Benchmarking pre_converting_series_to_nx_nx_mean ...
Name ips average deviation median 99th %
converting_series_to_nx 83012.68 0.0120 ms ±30.03% 0.0118 ms 0.0164 ms
nx_mean_s64 3780.90 0.26 ms ±3.07% 0.26 ms 0.29 ms
explorer_mean 1302.36 0.77 ms ±1.84% 0.76 ms 0.85 ms
nx_mean_s16 912.97 1.10 ms ±9.29% 1.13 ms 1.30 ms
nx_mean_s32 826.72 1.21 ms ±9.29% 1.24 ms 1.43 ms
pre_converting_series_to_nx_nx_mean 11.47 87.19 ms ±15.18% 97.30 ms 99.32 ms
nx_series_with_deftransform 11.31 88.45 ms ±11.97% 95.49 ms 96.91 ms
nx_mean_of_series 10.88 91.93 ms ±9.95% 95.05 ms 98.16 ms
Comparison:
converting_series_to_nx 83012.68
nx_mean_s64 3780.90 - 21.96x slower +0.25 ms
explorer_mean 1302.36 - 63.74x slower +0.76 ms
nx_mean_s16 912.97 - 90.93x slower +1.08 ms
nx_mean_s32 826.72 - 100.41x slower +1.20 ms
pre_converting_series_to_nx_nx_mean 11.47 - 7237.96x slower +87.18 ms
nx_series_with_deftransform 11.31 - 7342.16x slower +88.43 ms
nx_mean_of_series 10.88 - 7630.95x slower +91.91 ms