While I agree with @akash-akya that making such a piece of code faster is not necessarily the most interesting feature of Elixir, in that case there is room for improvement.
I think a good tool to figure out where a piece of code is spending its time is Benchee.
My hypothesis is that the code is slow because of the list creations in count_alpha
. On the reduce side of the code, the [encoding: :utf8]
option passed to File.stream!
might be problematic.
To test this hypothesis I wrote a quick and dirty version (probably not the fastest and certainly not the most idiomatic or elegant) :
defmodule AlphamaxOptim do
@valid_chars Enum.map(?a..?z, fn x -> <<x::utf8>> end) ++
Enum.map(?A..?Z, fn x -> <<x::utf8>> end) ++
Enum.map(?0..?9, fn x -> <<x::utf8>> end)
@init_char_map Enum.map(@valid_chars, fn x -> {x, 0} end) |> Enum.into(%{})
def count_alpha(string) do
recursive_count(@init_char_map, string)
end
def recursive_count(acc, ""), do: acc
for ch <- @valid_chars do
def recursive_count(acc, unquote(ch) <> next) do
recursive_count(%{acc | unquote(ch) => acc[unquote(ch)] + 1}, next)
end
end
def recursive_count(acc, str) do
case String.next_codepoint(str) do
nil -> recursive_count(acc, "")
{_, next} -> recursive_count(acc, next)
end
end
def process(file) do
File.stream!(file, [], 1_000_000)
|> Task.async_stream(fn x -> count_alpha(IO.iodata_to_binary(x)) end,
max_concurrency: 6,
ordered: false
)
|> Stream.map(fn {:ok, x} -> x end)
|> Enum.reduce(fn a, b -> Map.merge(a, b, fn _k, v1, v2 -> v1 + v2 end) end)
|> Enum.max_by(fn {_k, v} -> v end)
|> IO.inspect()
end
end
I believe it is correct as it spits out the same result as the original. Ideally, it should be tested with something like proper, using the original as an anchor.
Then I benchmarked it against the original with a 100M file :
Benchmarking optimized100...
{"0", 8819253}
{"0", 8819253}
{"0", 8819253}
Benchmarking original100...
{"0", 8819253}
{"0", 8819253}
Name ips average deviation median 99th %
optimized100 0.22 4.48 s ±0.37% 4.48 s 4.49 s
original100 0.0183 54.79 s ±0.00% 54.79 s 54.79 s
Comparison:
optimized100 0.22
original100 0.0183 - 12.23x slower +50.31 s
And with a 500M file (just to be sure) :
Benchmarking optimized500...
{"0", 31722406}
{"0", 31722406}
Benchmarking original500...
{"0", 31722406}
{"0", 31722406}
Name ips average deviation median 99th %
optimized500 0.0404 0.41 min ±0.00% 0.41 min 0.41 min
original500 0.00359 4.64 min ±0.00% 4.64 min 4.64 min
Comparison:
optimized500 0.0404
original500 0.00359 - 11.25x slower +4.23 min
Then I wondered what would happen if we had mutability, only to remember that we did : ets tables or :counters
. I’ve never used counters, so I tried it :
defmodule AlphamaxOptimMutable do
@valid_chars Enum.map(?a..?z, fn x -> <<x::utf8>> end) ++
Enum.map(?A..?Z, fn x -> <<x::utf8>> end) ++
Enum.map(?0..?9, fn x -> <<x::utf8>> end)
@indexed_chars Enum.with_index(@valid_chars, 1) |> Enum.into(%{})
def count_alpha(ref, string) do
recursive_count(ref, string)
end
def recursive_count(ref, ""), do: ref
for ch <- @valid_chars do
def recursive_count(ref, unquote(ch) <> next) do
:counters.add(ref, @indexed_chars[unquote(ch)], 1)
recursive_count(ref, next)
end
end
def recursive_count(ref, str) do
case String.next_codepoint(str) do
nil -> recursive_count(ref, "")
{_, next} -> recursive_count(ref, next)
end
end
def process(file) do
ref = :counters.new(length(@valid_chars), [:write_concurrency])
File.stream!(file, [], 1_000_000)
|> Task.async_stream(fn x -> count_alpha(ref, IO.iodata_to_binary(x)) end,
max_concurrency: 6,
ordered: false
)
|> Enum.map(fn x -> x end)
@indexed_chars
|> Enum.map(fn {ch, ix} -> {ch, :counters.get(ref, ix)} end)
|> Enum.max_by(fn {_k, v} -> v end)
|> IO.inspect()
end
end
Benchmarked it again :
Name ips average deviation median 99th %
optimized_mutable100 0.27 3.65 s ±0.55% 3.65 s 3.66 s
original100 0.0184 54.32 s ±0.00% 54.32 s 54.32 s
Comparison:
optimized_mutable100 0.27
original100 0.0184 - 14.88x slower +50.67 s
optimized_mutable500 0.0525 0.32 min ±0.00% 0.32 min 0.32 min
original500 0.00360 4.63 min ±0.00% 4.63 min 4.63 min
Comparison:
optimized_mutable500 0.0525
original500 0.00360 - 14.56x slower +4.31 min
There are probably much better ways to improve the original code (performance wise), but I think the methodology would be more or less similar.