I’m not really sure how this would be safer than just using String.to_atom(str). String.to_atom(str) does not create a new atom, if one already exists for the given name. Atoms are basically put in a table mapping a name to a number (which makes them so quick e.g. to compare), so there can never be multiple entries for the same name. The problem is not that, but plain too many names in said table. What you have above doesn’t guard against that at all.
Also the limit in atoms is not per process, but it’s global to the vm. The atom table is never cleaned up (and probably can’t be) and it simply takes memory. Once it grows enough there might not be enough memory left for all the other things your vm does.
As others have specified the limit is global, although configurable. It is generally not recommended to dynamically generate atoms, for if they reach the limit the entire BEAM VM (and consequently all the applications it is holding) will crash.
The safest way to avoid reaching the limit (which is around 1 million) would be to not create atoms dynamically. If this however is not possible, then my recommendation would be to leverage erlang system info and find out the size limit of your table, and how many atoms it has.
There are several says to achieve this, but the one I prefer is using the io:put_chars(erlang:system_info(info)). call. It gives you a ton of information, so you will need to go through it, but the relevant part should be this bit:
This will still make the system unable to operate at some point. It might generate “faulty” or otherwise “useless” atoms and fill the table until it eventually reaches the point when the table is “full” and no further atoms would be generated. Now some “valid” atom that hasn’t been seen earlier shall be created, but it isn’t able to do so, because the space is already exhausted. How to deal with this scenario?
The trick is to not blindly convert, but to filter faulty and invalid data near to the entrypoints.
I understand that you agree with me that dynamically creating atoms is dangerous.
But I do not understand where your input about validating invalid data comes from, when OP does not talk about valid/invalid data in his post.
It also makes me feel as you are presenting my solution as being incorrect. Or perhaps I don’t understand what point you are trying to make. Can you explain to me how using erlang system info allows the system to reach a state of unoperability ? If you check against the limit and usage, and you realize you are going to hit the limit, then you can avoid creating the atoms, thus maintaining the system stable (this is what I defend in my solution).
You maintain a running system, but not operability. If your system dynamically generates atoms it will reach a point where it hits a limit. Either your manually set one or the vm one. Both mean you can no longer dynamically generate atoms, which quite probably means what ever did create those atoms before can no longer do its job. The only difference is if the system as a whole stays running or not.
Thanks, this is what I ignored (not very clear in the docs ?).
So then I can remove all the useless “try rescue” stuff while only using String.to_atom(str).
BTW I don’t know what is the purpose of String.to_existing_atom(str) now…
PS: I need that to generate simple node names (atoms) and the goal was less to avoid reaching the limit (which is largely high enough) than to save some space…
String.to_atom creates a new atom if one of its name wasn’t use before, which is dangerous for dynamic/unknown data. String.to_existing_atom does only allow you to convert strings to atoms when the atom already exists at that point via some other means of usage of it.
If you expect the atom to already exist I’d always use String.to_existing_atom to prevent new ones from being created accidentally. It will throw if the atom does NOT yet exist, but you try to convert to it.
I’m a bit confused now…
My only need is to generate an atom from a string:
If the atom already exists, then ok, I use the existing one (BTW if my node name already exists, that’s trapped later in my code).
If it doesn’t then create it.
As you wrote that String.to_atom(str) does that and won’t ever try to recreate (accidentally) an existing atom, this function is suffisant for me and I don’t need to verify with String.to_existing_atom.
From your replies, I think you have a slight misunderstanding of atoms. There can only ever be one atom with a given name. For example, there will ever only be one :foo. Once it has been created, all attempts at creating or referencing an atom with the same name will just reference the existing atom.
The danger of overflowing the atom table comes from an attacker using their input to generate :foo1, :foo2, :foo3, and so on, until the system has too many atoms and crashes. But even in that case there will only be one copy of any specific atom in the atom table.
If you expect strings, which don’t yet exist as atoms than you need String.to_atom. I think you’re now aware of the issues around it.
Just one last thing. There’s no such thing as “recreating an atom”. Atoms when being created are registered once by being put into the atoms table. They’ll continue to exist from that point onward. Using or converting to an already registered atom just reuses it’s value. Creating or converting to a new one triggers the registration in the atoms table.