Is this a correct way to avoid wasting or reaching the atom limit?

Ninigi · June 14, 2019, 9:57am

Well, if you know for a fact that the number of possible names for :“node_name@DNS_name” is limited (which it probably is, since you don’t really create an unlimited amount of nodes, right?) then it could be save enough to use.

Jaypee · June 14, 2019, 10:25am

Yes, that’s what I thought… Until M. Virding said that dynamically creating atoms (even occasionally) is BAD !

rvirding · June 14, 2019, 10:34am

Fortunately we can’t have the many nodes in a distributed system. Yet.

rvirding · June 14, 2019, 10:36am

OK, I will qualify that and add “unless you know there will only be a few new atoms”. Recreating existing atoms is ok as that just reuses them.

Jaypee · June 14, 2019, 10:44am

YES !!! Thanks you, Robert !!!
I promise to NOT generate too many nodes !
…Hmm, BTW, what do you considere as “few” ?
In one of my experimentations, I currently use each node as a client/server entity which exchange files with my others distributed nodes. So, inside one node, I have to “generate”, at some point, the (atoms) names of each node that this node communicate (ie exchange files) with…
But perhaps am I following the wrong way (I’m an Elixir noob so…)

Ninigi · June 14, 2019, 11:19am

1 million atoms is actually a metric shit ton of possible atoms, ask the guys working in Absinthe (that library creates a lot of atoms and wants us to to create even more with all the objects, fields etc) - it is usually plenty and some if you are building a regular system. If you create 100 atoms with that node thing, it’s close to nothing, 10_000 would be a few, 100_000 would be a seizable amount but still not enough to threaten your system.

Let’s say you create 100 new nodes every day, it would take you more than 2 years to reach the limit (provided the system keeps running, no duplicates and that the rest of the system doesn’t create an equally insane amount).

NobbZ · June 14, 2019, 11:21am

But not at runtime, based on some users input. They are generated once, and after the system has loaded all of its modules, atom count won’t change anymore (at least not because of absinth).

Ninigi · June 14, 2019, 11:24am

Yes, don’t get me wrong! I was not trying to suggest absinthe is doing it wrong or anything, just saying that it requires you to create a lot of atoms, and we are still not even near the danger zone.

Jaypee · June 14, 2019, 11:41am

After asking my Project Manager, one of my node file server (the big “supervisor” one) should be connected to a max of 2000 others nodes but others node file servers should be connected to an average 5 to 10 others.
So I have to “generate” some 2000 atoms on one server(node) and only 5 to 10 on the others…

NobbZ · June 14, 2019, 12:09pm

Is this using “distributed erlang”? If yes, then you’ll end up with 2000(2000 - 1) / 2 = 1999000 connections unless you are not very careful in how you grow the cluster.

Given these numbers I would not rely on distributed erlang but connect to the other servers by other means that are easier to controll and as a side effect probably do not rely on erlang node-names anymore but regular hostnames/ip-addresses.

benwilson512 · June 14, 2019, 12:15pm

Yeah as @NobbZ notes, “connect” can’t mean “connect via distributed erlang”, it simply can’t support clusters that large. If it’s a file server I’d consider serving files via ordinary tcp / http.

Jaypee · June 14, 2019, 12:25pm

Ok.
Thanks for the advices.
I give up this “node” (distributed Erlang) option then to focus on my other one (Elixir sftp server)

Ninigi · June 14, 2019, 12:34pm

Interesting. I was under the impression that we are talking about nodes with static names.

The whole thing made me think about how elixir/erlang create process IDs. Are they always unique, or could there be a finished process with the same ID?

dom · June 15, 2019, 1:41am

You could have 2000 nodes connecting to a “manager” if you make it a hidden node or use -connect_all false. The default in Erlang clustering is to have a fully connected mesh, but it’s not mandatory, if you can live without :global and :pg2.

dom · June 15, 2019, 1:44am

Yes, PIDs get reused: https://stackoverflow.com/questions/46138098/can-erlang-reuse-process-ids-if-so-how-to-be-sure-of-correctness

garazdawi · June 15, 2019, 11:58am

Or you use :erlang.system_info(:atom_count).

Rich_Morin · June 15, 2019, 3:36pm

My current project loads hundreds of TOML files into a shallow tree of maps. The top-level keys are strings (relative path names), but the interior keys are all converted to atoms. For safety, this conversion uses String.to_existing_atom/1.

Before loading the tree, I load several “schema by example” files, using String.to_atom/1. This defines atoms for a few dozen legal keys. I believe that this is a safe approach, because the schema files are small and I’m in control of their content.

Jaypee · June 16, 2019, 8:52am

Anyway if I managed to read/save files between distributed nodes, I didn’t find a way to stream files with this configuration.
This work (nearly) ok on my SFTP version…