I’ve mentioned this before: https://medium.com/@dmitriid/erlang-is-dead-long-live-e-885ccbcbc01f#.fyc9oiwv2
But I’ll mention this again: Java is becoming Erlang much faster than Erlang is becoming Java.
The following are illusions of grandeur that have no place in the real world:
- “Erlang/OTP being more well-suited for the task”
- “I still have the feeling that Elixir could kill in all these areas”
- “it seems that way because OTP has mostly solved the big and distributed part of the problem.”
- “Java is not the most suitable language”
- “Erlang or Elixir, which provides much more than what Hadoop of Spark gives you (fault tolerance distributed system with native map reduce)”
- etc.
It’s one thing two spawn a million green threads on your laptop. It’a different thing running a couple of thousand of distributed nodes with multiple tasks per node, spread across several data centres.
Guess what, Erlang/OTP is as suited for this task as Java: it’s not. Erlang was never designed to handle thousands of nodes in heterogenous environments. There are multiple known limitations of its distribution protocol. There’s a reason RELEASE exists.
However, Java provides established libraries and approaches that solve these problems. It might not be easy, but definitely not writing everything from scratch (or relying on Basho’s riak_*
libraries).
Distribution? Distributed storage? Distributed databases? Streaming? Distributed map-reduce? Real-time analytics on streams?
All these problems have been solved multiple times over in other languages (primarily Java). Or they have been solved by AWS (stream your data into Kinesis/Kinesis Firehose, analyse in real time with Kinesis Analytics, dump into RDS for warehousing). Aaaand to use AWS you’ll undoubtedly use Python or Java, not Erlang.
There are multiple reasons for that, obviously. The main is, definitely, that Java (Python or whatever else) are just so much popular than Erlang.
The other important on is: Erlang has too long prided itself on being oh so much superior to other languages in anything that comes to parallel and distributed computing. So long that it completely missed other languages marching on and improving in the same areas. If not on a language level, then on library and infrastructure level.
So, the reality of today is the following. Unless someone is smart enough and has enough resources to build a library/infrastructure on par with other languages (a new Kafka/Hadoop/Kubernetes/Cassandra/Spark/Fink/…the list just goes on and on and on, doesn’t it?..), the only way to deal with BigData in Erlang/Elixir is to write a proper library to interface with any of these systems. Anything else is just illusions of grandeur.