Hi everyone, posting for the first time here, so please bear with me.
I came across a HN thread that mentioned about a code-golf problem for high-throughput FizzBuzz. Link here: fastest code - High throughput Fizz Buzz - Code Golf Stack Exchange.
The naive single-threaded implementation in C gets about 70 MiB/s throughput on my Mac machine.
Here’s the C program for reference:
#include <stdio.h>
int main() {
for (int i = 1; i < 1000000000; i++) {
if ((i % 3 == 0) && (i % 5 == 0)) {
printf("FizzBuzz\n");
} else if (i % 3 == 0) {
printf("Fizz\n");
} else if (i % 5 == 0) {
printf("Buzz\n");
} else {
printf("%d\n", i);
}
}
}
It is my understanding from learning Elixir so far that if your work is CPU intensive, pure Elixir is not a good choice. But since this is a simple problem (and mostly IO oriented as I’ve come to believe), I thought of testing the waters a bit. I came up with the approach of dividing the range 1…1000000000 into #{System.schedulers_online}
chunks and using Task.async_stream to process each chunk in its own task.
I made the following Fizzbuzz module:
defmodule Fizzbuzz do
def fizzbuzz_with_io(enumerable) do
# I am printing here, because its my understanding it takes time to move data from one process to another.
Stream.map(enumerable, &reply/1)
|> Stream.chunk_every(5000)
|> Enum.into(IO.stream())
end
def fizzbuzz_no_io(enumerable) do
Stream.map(enumerable, &reply/1) |> Stream.run()
end
def reply(n) do
case {rem(n, 3), rem(n, 5)} do
{0, 0} -> "FizzBuzz\n"
{0, _} -> "Fizz\n"
{_, 0} -> "Buzz\n"
{_, _} -> "#{n}\n"
end
end
end
My main CLI driver code (I acknowledge that the below code is not 100% correct and there are some numbers I might be missing in my quest to divide work, but it is OK enough to roughly illustrate the problem):
defmodule Fizzbuzz.Cli do
def main([lower, upper]) do
{lower, upper} = {String.to_integer(lower), String.to_integer(upper)}
chunk_size = div(upper - lower, System.schedulers_online)
input_enumerable = get_input_ranges(lower, upper, chunk_size)
IO.inspect(input_enumerable)
input_enumerable
|> Task.async_stream(Fizzbuzz, :fizzbuzz, [], timeout: :infinity, max_concurrency: 64)
|> Stream.run()
end
def main(_), do: IO.puts("Usage: fizzbuzz 1 10000")
defp get_input_ranges(lower, upper, chunk_size) do
if chunk_size >= 10 do
if lower >= upper, do: [], else: [lower..min(lower+chunk_size, upper) | get_input_ranges(min(lower+chunk_size, upper) + 1, upper, chunk_size)]
else
[lower..upper]
end
end
end
Compiling above with MIX_ENV=prod mix escript.build
, I have observed the following:
- If I merely want to do the computation (with no IO at all using
:fizzbuzz_no_io
), the program finishes in 31 seconds on my machine. Its much faster than 3 minutes and 5 seconds of single threaded execution in Elixir, but its still slower than single-threaded C’s 7 seconds. - If I use
:fizzbuzz_with_io
, the program running time exceeds more than 5 minutes and I get a throughput of 2.5-2.7 MiB/s. Here, C gives a throughput of 70 MiB/s on my machine and a total running time of 1 minute and 43 seconds.
UPDATE-1:
- I have updated the driver code to handle all cases properly.
- I have updated fizzbuzz_with_io function to use Stream chunking, which has improved throughput to around 70 MiB/s on my system. That’s 35x improvement in performance. It was just a hunch that drove me to use this, since I thought adding chunking might add small chunks to IO.stream incrementally, rather than overloading it with everything at once. Could somebody please explain it better?
- There are throughputs that still go in order of _ GiB/s. So, is there any further scope for improvement?
So, here’s my question: How do I improve the throughput of the Elixir version of Fizzbuzz program?
Thanks