Is this a correct way to avoid wasting or reaching the atom limit?

As I have to generate several atoms from strings and as atoms are not garbage collected and there’s a limit for each process, does this function is correct to “save” them atoms ? :

def string_to_atom(str) do
    try do
      String.to_existing_atom(str)
    rescue
      ArgumentError -> String.to_atom(str)
    end
  end

Or do you know any smarter (or “elixier”) way to achieve this (perhaps without this ugly “try rescue”) ?

I’m not really sure how this would be safer than just using String.to_atom(str). String.to_atom(str) does not create a new atom, if one already exists for the given name. Atoms are basically put in a table mapping a name to a number (which makes them so quick e.g. to compare), so there can never be multiple entries for the same name. The problem is not that, but plain too many names in said table. What you have above doesn’t guard against that at all.

Also the limit in atoms is not per process, but it’s global to the vm. The atom table is never cleaned up (and probably can’t be) and it simply takes memory. Once it grows enough there might not be enough memory left for all the other things your vm does.

6 Likes

the limit is global… though you can increase the limit http://erlang.org/doc/man/erl.html#+t

but what are you doing code wise that requires dynamic creation of atoms? maybe there is a better way?

1 Like

As others have specified the limit is global, although configurable. It is generally not recommended to dynamically generate atoms, for if they reach the limit the entire BEAM VM (and consequently all the applications it is holding) will crash.

The safest way to avoid reaching the limit (which is around 1 million) would be to not create atoms dynamically. If this however is not possible, then my recommendation would be to leverage erlang system info and find out the size limit of your table, and how many atoms it has.

There are several says to achieve this, but the one I prefer is using the io:put_chars(erlang:system_info(info)). call. It gives you a ton of information, so you will need to go through it, but the relevant part should be this bit:

=index_table:atom_tab
size: 8192
limit: 1048576
entries: 7227

There is a blog that builds a helper library on this concept:

My recommendation here would be for you to, each time you want to save a new atom, check the current usage against the limit of the table. This way you avoid rescuing from an error in the first place.

1 Like

As a rule of thumb, the best way to create atoms dynamically is to not do it at all…

The prefered way is to provide mapping functions for expected inputs, even dependent of their type, eg.:

defmodule User do
  @struct_data [:foo, :bar]
  defstruct @struct_data

  @struct
  |> Map.keys()
  |> Stream.reject(& &1 == :__struct__)
  |> Enum.each(fn atom ->
    def convert_key(unquote(atom)), do: unquote(to_string(atom))
    def convert_key(unquote(to_string(atom))), do: unquote(atom)
  end)
end

IO.inspect User.convert_key("foo")
IO.inspect User.convert_key(:foo)
1 Like

This will still make the system unable to operate at some point. It might generate “faulty” or otherwise “useless” atoms and fill the table until it eventually reaches the point when the table is “full” and no further atoms would be generated. Now some “valid” atom that hasn’t been seen earlier shall be created, but it isn’t able to do so, because the space is already exhausted. How to deal with this scenario?

The trick is to not blindly convert, but to filter faulty and invalid data near to the entrypoints.

I understand that you agree with me that dynamically creating atoms is dangerous.
But I do not understand where your input about validating invalid data comes from, when OP does not talk about valid/invalid data in his post.

It also makes me feel as you are presenting my solution as being incorrect. Or perhaps I don’t understand what point you are trying to make. Can you explain to me how using erlang system info allows the system to reach a state of unoperability ? If you check against the limit and usage, and you realize you are going to hit the limit, then you can avoid creating the atoms, thus maintaining the system stable (this is what I defend in my solution).

You maintain a running system, but not operability. If your system dynamically generates atoms it will reach a point where it hits a limit. Either your manually set one or the vm one. Both mean you can no longer dynamically generate atoms, which quite probably means what ever did create those atoms before can no longer do its job. The only difference is if the system as a whole stays running or not.

1 Like

Thanks, this is what I ignored (not very clear in the docs ?).
So then I can remove all the useless “try rescue” stuff while only using String.to_atom(str).
BTW I don’t know what is the purpose of String.to_existing_atom(str) now…:face_with_hand_over_mouth:

PS: I need that to generate simple node names (atoms) and the goal was less to avoid reaching the limit (which is largely high enough) than to save some space…

1 Like

String.to_atom creates a new atom if one of its name wasn’t use before, which is dangerous for dynamic/unknown data. String.to_existing_atom does only allow you to convert strings to atoms when the atom already exists at that point via some other means of usage of it.

iex(5)> str = "some_random_not_yet_used_atom"
"some_random_not_yet_used_atom"
iex(6)> String.to_existing_atom(str)
** (ArgumentError) argument error
    :erlang.binary_to_existing_atom("some_random_not_yet_used_atom", :utf8)
iex(6)> String.to_atom(str)
:some_random_not_yet_used_atom
iex(7)> String.to_existing_atom(str)
:some_random_not_yet_used_atom
2 Likes

Ok. So, in my case, I only need String.to_atom(str) as this call will not create a new atom in the table nor throw an error if the atom already exists.

If you expect the atom to already exist I’d always use String.to_existing_atom to prevent new ones from being created accidentally. It will throw if the atom does NOT yet exist, but you try to convert to it.

1 Like

I’m a bit confused now… :pensive:
My only need is to generate an atom from a string:
If the atom already exists, then ok, I use the existing one (BTW if my node name already exists, that’s trapped later in my code).
If it doesn’t then create it.

As you wrote that String.to_atom(str) does that and won’t ever try to recreate (accidentally) an existing atom, this function is suffisant for me and I don’t need to verify with String.to_existing_atom.

From your replies, I think you have a slight misunderstanding of atoms. There can only ever be one atom with a given name. For example, there will ever only be one :foo. Once it has been created, all attempts at creating or referencing an atom with the same name will just reference the existing atom.

The danger of overflowing the atom table comes from an attacker using their input to generate :foo1, :foo2, :foo3, and so on, until the system has too many atoms and crashes. But even in that case there will only be one copy of any specific atom in the atom table.

1 Like

If you expect strings, which don’t yet exist as atoms than you need String.to_atom. I think you’re now aware of the issues around it.

Just one last thing. There’s no such thing as “recreating an atom”. Atoms when being created are registered once by being put into the atoms table. They’ll continue to exist from that point onward. Using or converting to an already registered atom just reuses it’s value. Creating or converting to a new one triggers the registration in the atoms table.

As others have already pointed out there is only one truly safe way to handle dynamically creating atoms and that is DON’T. Even if you feel you must: don’t. Even if it is only occasionally: don’t.

14 Likes

Why, concretely, do you need to convert to atoms? Perhaps we can suggest an alternative.

Sir, Yes, Sir, Master !!!.. :smile:
… If I could catch the ones who decided to associate nodes names with atoms instead of GC strings ! :innocent:

Yep, that one’s just unfortunate :sweat_smile:

As I wrote: to generate several nodes which are created using… atoms ? :“node_name@DNS_name”
If you know some others simple way (not via) to create nodes…