Hello Elixir community,
I would like to share with you some results of my first 2 months of Elixir learning and ask important question about Elixir language performance as such.
I’ve made some POC application and was able to run it on production-like host (AWS c5.9xl - 96 cores, 70Gb of RAM) and I got about 3-3.5 time worse tps comparing to our current prod metrics. To troubleshoot I just decided make some very rudimentary tests of the main logic which is key in our business - the logic is very simple:
- iterate over files (compressed JSONs or plain CSVs) and
- line by line make some map transformation and
- perform aggregation by applying a sum function for the decimal value on string key
Simple enough, however the only feature of our business is the volume - it is 1 quadrillion (10^15) records per month what I need to account which is ~380 * 10^6 tps. We have such applications which are handling this volume in prod ATM.
For Elixir capability evaluation I took 2 files a plain csv what I need to aggregate (reduce by sum function) - the small (~1000 records) and big (11.7 million records)
Please see snippet of this file
HeaderLength=8
DatasetName=xxxBillingHourly
CreationTimestamp=1561942800000
StartRangeMarker=2019-07-01-01-00-00
EndRangeMarker=2019-07-01-02-00-00
FieldSpec=field-1,field-2,field-3,field-4,field-5,field-6,field-7,field-8,field-9,startTime,endTime,value
KeySpec=field-1,field-2,field-3,field-4,field-5,field-6,field-7,field-8,field-9,startTime,endTime
DataSources=someS3bucket
007931482,abcStore,abcStore,GetPar,GetPar,,us-west-1,,,2018-07-01 01:00:00,2018-07-01 02:00:00,10
016299379,abcEC2,abcEC2,Hours,Gateway,,,arn:aws:someARN,,2018-07-01 01:00:00,2018-07-01 02:00:00,10
018870929,abcLambda,abcLambda,Second,Invoke,,,arn:aws:some-function,,2018-07-01 01:00:00,2018-07-01 02:00:00,59.35000000000002086
.
.
and used Elixir and Java programs to compare with each other (Java is current language we use).
Bellow a listing of two programs which are doing the same things.
Java
@Test
public void konaAggregate() throws IOException {
long start = System.currentTimeMillis();
System.out.println("Start");
String fName = "/home/temp/1/real_kona";
Path totalFilePath = Paths.get(fName + "_java_aggregated");
Stream<Entry<String, BigDecimal>> stream = Files
.lines(Paths.get(fName))
.skip(9)
.map(l -> {
String[] parts = l.split(",");
return Tuple.of(String.join(",", parts[0], parts[1], parts[2], parts[3], parts[4]), new BigDecimal(parts[11]));
})
.collect(
Collectors.groupingBy(
Tuple2::_1,
TreeMap::new,
Collectors.mapping(Tuple2::_2, Collectors.reducing(BigDecimal.ZERO, BigDecimal::add))
)
)
.entrySet()
.stream();
try (BufferedWriter writer = Files.newBufferedWriter(totalFilePath)) {
stream.forEach(line -> {
String record = line.getKey() + "," + line.getValue().toPlainString();
try {
writer.write(record);
writer.newLine();
} catch (IOException e) {
// ignore
}
});
}
long end = System.currentTimeMillis();
System.out.println("Total time: " + (end - start));
}
Elixir
def aggregate_kona() do
IO.puts("Start")
kona = "/home/temp/1/real_kona"
output_path = kona <> "_elixir_aggregated"
start = :os.system_time(:millisecond)
kona
|> File.stream!
|> Stream.map(&String.split(&1, "\n"))
|> Stream.drop(9)
|> Stream.map(&(&1 |> hd))
|> Stream.map(&parse_granular(&1))
|> Enum.reduce(
%{},
fn %{record_key: key, record_value: value}, acc ->
Map.update(acc, key, value, &Decimal.add(&1, value))
end
)
|> Stream.map(&((elem(&1, 0) <> "," <> Decimal.to_string(elem(&1, 1))) <> "\n"))
|> Stream.into(File.stream!(output_path, [:write, :utf8]))
|> Stream.run()
stop = :os.system_time(:millisecond)
IO.puts("Total: #{stop - start}")
end
def parse_granular(record) do
[p,
pr,
cpc,
ut,
op,
_,
_,
_,
_,
_,
_,
value] =
record |> String.split(",")
%{record_key: p <> "," <> pr <> "," <> cpc <> "," <> ut <> "," <> op, record_value: Decimal.new(value)}
end
Let me give you results:
For small file which is 1K records there is no issues and both are (Elixir is faster) latency is 50 milliseconds
But for the big file (11.7 million records and ~2Gb size)
Java gives 16 seconds latency and is able to aggregate file
Elixir was running for ~20 minutes with no produced result and I just terminated an iex session
To get somewhere, I made flow in Elixir even simpler, I commented a reduce part of the flow and made a run. It gave me latency ~115 seconds. Java version for such work coped for ~13 seconds.
So the question to Elixir community, could you review please my task, implementation whether I have any obvious mistakes in Elixir part and also why Elixir’s performance is such that I simply can not accept it?
Thank you.