What is the best BEAM server?

I’ve just read this article, which is of the opinion that there are some very powerful multi-core machines available these days: Use One Big Server - Speculative Branches

This server has 128 cores with 256 simultaneous threads. With all of the cores working together, this server is capable of 4 TFLOPs of peak double precision computing performance…above and below each CPU is the memory: 16 slots of DDR4-3200 RAM per socket. The largest capacity “cost effective” DIMMs today are 64 GB. Populated cost-efficiently, this server can hold 1 TB of memory. Populated with specialized high-capacity DIMMs (which are generally slower than the smaller DIMMs), this server supports up to 8 TB of memory total. At DDR4-3200, with a total of 16 memory channels, this server will likely see ~200 Gbps of memory throughput across all of its cores.

In terms of I/O, each CPU offers 64 PCIe gen 4 lanes. With 128 PCIe lanes total, this server is capable of supporting 30 NVMe SSDs plus a network card. Typical configurations you can buy will offer slots for around 16 SSDs or disks… and this server is equipped with a 50-100 Gbps network connection.

So my question is: would this machine be ideal absolutely ideal for running a massive, high-traffic Elixir app?

2 Likes

I was surprised to learn that WhatsApp dropped beefy machines and now run on servers capped at 32GB of RAM:

They also no longer run FreeBSD:

It wasn’t for performance tho, so I’d be interested in hearing more about optimal server hardware and OS combos as well (tho from what WhatsApp have done, it’s probably safe to say you don’t ‘need’ beefy machines to run very high traffic services :D)

2 Likes

But the main reason they did so was to fit in the Facebook DevOps model, not because was best. In fact to make it work they end-up to make a lot of contributions to the Erlang core and they still have some hacks that weren’t accepted into the core.

2 Likes

Yup… but the takeaway is that Erlang can run huge-scale apps on less powerful hardware too :003: (which could be more cost effective than fewer beefier machines).

I’d still be interested in any tests people have done to come up with optimal hardware specs/set-ups.

1 Like

How can it be that?

Core cost scales more or less linearly, same for mem. Both onprem and in cloud. However bandwith cost will always increase the more nodes you have.

In that regard fewer nodes equalling same core and memory should always be cheaper if your work is parallellizable across all the cores, which on the beam is usually true.

1 Like

Memory cost only scales linearly to a point in a single node - for instance 2x64GB DIMMs are somewhat cheaper than a single 128GB DIMM, and 4x64GB are much cheaper than a single 256GB. The bigger ones still sell, because there’s only so many slots in a single server motherboard.

2 Likes

If you run your own datacenter or you are co-locating then bandwidth per box shouldn’t be an issue - or am I missing something? (I’m speaking purely of dedicated servers, not cloud based hosting.)

(Dedicated) Server cost goes up noticeably after the higher-mid range, hence I’d be interested in knowing what would be better for an Erlang/Elixir app - 2 Hexa-Core Coffee Lake servers with 64GB ram each vs a single 18 core Cascade Lake with 128GB.

There’s also redundancy with additional servers (depending on how they were being used and whether it would be one big vs multiple smaller).

I was talking virtualized on that part, but also relevant for dedicated if you dont want all eggs in same basket though. More servers means more individual crosschatter across the mesh etc.

I was just making a hyptothetical point, the way you put your lego pieces of hosting together will rarely impact cost in meaningful way :stuck_out_tongue:

1 Like

I’m not so sure Oliver :lol:

Even just with dedicated servers the difference between 1 large vs 2 smaller, or 2 large vs 3 or 4 smaller could be big enough to consider smaller (and again if you have a whole datacenter full of them).

But what’s more performant could influence the decision if the difference is more than just slight… so if anyone does do such tests, I for one would be interested in hearing your findings :smiley:

I wonder if @OvermindDL1 did any further experiments on that set-up^^?

3 Likes

Thats comparing lego with a sack of plastic raw material though :stuck_out_tongue:

1 Like

Since a many-core server would need a huge memory and I/O bandwidth between all cores and/or chiplets I’d wager that the best many-core-friendly setup is either an EPYC server with 32 / 64 / 128 cores or one of the ARM platforms with 80 cores (sorry, forgot the name :frowning:). They have insanely quick inter-bus communication so even with 100+ Erlang schedulers (one per CPU thread) the parallel overhead would be minimal to non-existent.

Or you can get the Ryzen 5800X3D since it has humongous L3 cache which probably helps a lot to not have CPU cache misses all the time.

Answering OP directly: yep, that would be an awesome server for Erlang/Elixir. Although I think the CPU power will be kind of wasted there, you could probably get away with less GHz and maybe even less cores. The rest looks ideal, especially inter-bus comms.

2 Likes