Why do atoms need to be in the same module for String.to_existing_atom/1?

losvedir · June 14, 2024, 12:13pm

I’ve always used String.to_existing_atom/1 without issue after verifying that the atoms are in the codebase somewhere.

However, I noticed that starting with Elixir 1.14, the docs for the function carry this guideline:

Since Elixir is a compiled language, the atoms defined in a module will only exist after said module is loaded, which typically happens whenever a function in the module is executed. Therefore, it is generally recommended to call String.to_existing_atom/1 only to convert atoms defined within the module making the function call to to_existing_atom/1.

I’m trying to better understand how Elixir is compiled, loaded, and run. I don’t understand the limitation that the atoms need to be in the same module for this to be reliable. I found Issue with loading existing atoms from other modules · Issue #4832 · elixir-lang/elixir · GitHub which was about inconsistent behavior where the atoms weren’t always available, so it seems to be a real problem.

Suppose my app is deployed as a mix release. If the String.to_existing_atom/1 function is looking at runtime for existing atoms, why wouldn’t any atom in the code work? I would expect all the code to have been compiled prior to the release running, so wouldn’t the lookup always work?

Maybe the guideline is around partial compilation in the test environment or something like that?

Or maybe my mental model of an “internal atom database” that gets queried is wrong. Maybe the code compilation essentially inlines a mapping lookup in the code right at the time that it’s compiled?

al2o3cr · June 14, 2024, 12:26pm

In Erlang’s “interactive mode”, modules aren’t loaded until they are referenced.

So if a function in Foo does a String.to_existing_atom expecting to find an atom defined in Bar, it could fail if nothing else has yet referenced Bar.

IIRC releases run in the other mode (“embedded mode”) so that can’t happen.

gregvaughn · June 14, 2024, 5:13pm

Note “generally recommended”. Using atoms in the same module is always safe and doesn’t require any deeper knowledge of how the BEAM handles atoms. But it’s not necessary to do that.

If (building on @al2o3cr 's example) Foo requires Bar directly or indirectly, then Bars atoms will be available even in interactive mode. However, that brings in additional concerns about compile time dependencies that were out of scope for the simple recommendation in the docs.