To atom or not to atom - that is the question :)

mmeyerlein · July 8, 2022, 9:58am

hello everyone!
i have a rather general question about maps.
i have a highly volatile relatively large map (nested, 1000++ keys/value) map as object store, which is stored frequently in the database.
since i use postgres, i thought it was obvious to use the datatype map, but then i would have to change my whole app from the dot notation foo.bar to the ugly foo[“bar”] notation…
since i don’t manipulate the saved map in the database, i could also save the map as “string”, but somehow my inner self resists.
also converting the loaded map back into atoms is not elegant for me.

my question now is, what is the best practice here? not the shortcuts

LostKobrakai · July 8, 2022, 10:16am

I’d be curious how your map can on the one hand be “hightly volatile” (which I take to mean unknown and/or changing keys) and at the same time you can hardcode keys in your code like foo.bar?

mmeyerlein · July 8, 2022, 10:25am

not the keys are highly volatile.
the structure and the values are.

LostKobrakai · July 8, 2022, 10:30am

Then I don’t see why you’d have problems using atoms. If all your keys are already in your code you can use String.to_existing_atom or :erlang.binary_to_term(bin, [:safe]) to load the map.

paulsullivanjr · July 8, 2022, 11:47am

:erlang.binary_to_term(bin, [:safe]) is great. I used it a couple of months ago as part of a migration project and it greatly simplified the process.

mmeyerlein · July 8, 2022, 4:16pm

but is this the intended way?
that was my original question, how the “right” way is? that there are many ways here, show already many threads with very smart ideas how to make atoms out of the strings again.

i thought there is a best practice for the chain (with atoms as keys):
map → ecto → postgres → ecto → map

something like this…
if you persist maps in the database, then:
a) please always use the postgres datatype map for maps, and make sure afterwards that either the keys are converted back to atoms, or use the non dot notation.
b) please always use the postgres datatype string for maps.
c) …?
d) …?

LostKobrakai · July 8, 2022, 4:25pm

There’s not. Postgres is external to the beam, therefore loading data out of it (without additional constraints) is not really much different to handling arbitrary user input – it might be bad to convert things to atoms. So you need additional constraints to safely convert things to atoms (or store them in postgres, so they stay atoms), but those constraints are project specific and depend very much on how said data is persisted to postgres.

joaoevangelista · July 8, 2022, 4:28pm

The right is first and foremost, don’t create dynamic atoms, e.g convert user input into atom String.to_atom.
Then how is your keys set, is it static and simple that abide atom naming rules? They are known before hand? Use atoms, if they are dynamic created, or too convoluted to be used as keys use strings.

If you expect them to be changed on the database and you need to dynamically access then on the map, then you will need to use strings, if your application will only understand what is already known, use atoms, the right way will be “forced” by your use case.

use always the jsonb/map datatype to store on postgres, this will allow you to leverage the database better, e.g. indexes, queries

mmeyerlein · July 8, 2022, 5:31pm

ah ok, thank you very much for your effort!

since i will neither search the database for things in the map, nor manipulate the data (write once) i will probably use the postgres string type, since all the above restrictions do not apply.

zachallaun · July 8, 2022, 6:35pm

I have a followup question about String.to_atom/1 – will it always create a new atom, or will it reuse the existing one if it’s already present? For instance, I’m working on a small library that accepts certain configuration as atoms in order to take advantage of nicer keyword list syntax, but internally breaks them apart and uses the components later on. For instance, :"foo.bar" becomes {:foo, :bar}.

I knew that atoms weren’t GC’d, but since this is a fixed set of atoms, I didn’t worry about it. I don’t want to be responsible for some kind of slow memory leak, however.

cevado · July 8, 2022, 6:41pm

so, this is something that I personally disagree with the elixir community in general, people really like the dot notation map.field and I personally don’t like it. It feels a lot like just some old habit from OO background.
people usually claims that the dot notation is more assertive, and that’s not true. I prefer to use map[:field] and map["field"] over dot notation. On your topic, I think you should use whatever feels more comfortable and suited for your use case, in the end it’s just data, you should be able to shape and access it whatever the way you want.

zachallaun · July 8, 2022, 6:57pm

There is also a semantic difference in Elixir – dot notation will raise if the key is not defined, whereas access notation will return nil.

cevado · July 8, 2022, 7:02pm

map[:field] || default_value
but jokes apart, you can merge a map of default values, I usually have a private function that does that so the defaults are consistent.

msimonborg · July 8, 2022, 7:31pm

It will reuse existing atoms. Each unique atom in a running VM is stored once in the atom reference table and takes up one word of memory (reference documentation). If you have a fixed set that you’re working with then you likely shouldn’t worry! But, String.to_esisting_atom/1 may still be the safer choice as it will raise an error if an unexpected value leaks in as input, and it’s more explicit that you expect only existing atoms.

What if you require some input/data for which there is no sensible default, and the only sensible thing to do is to crash? I’d rather get a KeyError upstream than a FunctionClauseError or MatchError downstream. The latter is as frustrating as Ruby’s no method on nil to me. Dot notation is explicit that you require the field to exist, especially for internal data structures, and can still be combined with defaults and merging when it makes sense. I’m more inclined to go with access for external data.

brightball · July 8, 2022, 7:40pm

FWIW, I agree with you. When I see the dot notation I assume I’m looking at a reference of an object or a function and not a key.

zachallaun · July 8, 2022, 8:13pm

Thanks for the clarification!

Agreed. To each their own, but I personally like the idea of access for maps and dot notation for structs. It helps to see at a glance at the call site whether you’re dealing with a generic collection or something more well-defined.

cevado · July 8, 2022, 10:08pm

if it’s a user facing stuff, it’s better to provide a proper error message and not raising on missing keys, so you’d be better covered using a schemaless changeset validating required fields. using the dot notation in this scenario is less explicit for the user.
if it’s something that is not user facing, it’s better to pattern match on the key and raise a FunctionClauseError because it defines an explict contract of what is required for that function to work. if you accepted the parameter and raised for a missing key, you need to be familiar with the full body of the function, and not just the definition where the pattern match happens. using dot notation in this scenario is less explicit for the dev using your code.
dot notation is just convenient for the OO habit.

msimonborg · July 8, 2022, 10:42pm

I agree with everything you said and practice it in my code, except the assertion about dot notation as an OO habit. I validate input with embedded schemas at the boundaries and prefer to extract values with pattern matching in function heads (and function bodies if necessary). I still prefer dot notation over access brackets when working with maps w/ atom keys, unless I need to fetch the value conditionally. These maps tend to be internal or validated data anyway. To me it is more about semantics than syntax, if brackets asserted the presence of the key, I would use them the way I use dot notation. The association you’re making to OO seems more focused on the syntax if I’m not mistaken. Maybe for some it is true, but I don’t think it’s universal

sodapopcan · July 9, 2022, 5:41am

@cevado cc: @brightball

I’m not meaning to be alarmist or shaming, but the idea that “dot notation is OO-centric” is misguided. Obviously, you can program however you want, but I feel the need to respond since I don’t like the idea of newbies reading this and rolling with it.

The use of . for map access is not Elixir-specific (even Haskell recently added support for it) just as . for method call is not OO specific. Different OO languages have a variety of method call operators—PHP uses ->, Lua uses :, OCaml uses #, and Smalltalk uses a space. If you wanna get into the weeds, technically JavaScript “objects” are simply maps whose keys can point to any value, including functions (actually, Lua too!), so in a sense JS’ . operator is actually just a map lookup—adding () after the key is what makes it a function, er, method call (oh yeah, Python too!)

But all that is irrelevant in the face of what has already been mentioned which is that [] and . in Elixir do relevantly different things. A reader of your code who doesn’t carry your biases is going to see foo[:bar] and think “Ok, so :bar can be nil!”. If you’re not around to explain that this isn’t exactly true, said reader is going to have a bad time.

In short, your beef is not with the Elixir community but rather with the language itself, and the language rules are static: .foo means “foo always exists” and [:foo] means “foo might not exist”.

zachallaun · July 9, 2022, 1:32pm

Great explanation and completely agree!

The only minor nitpick I’d make here is that the difference between the two access methods isn’t about the value being defined, but about the key being defined. foo.bar tells me that bar has definitely been set already — it may be nil, but if it is nil, some piece of code somewhere has decided that it can be nil. This is a useful invariant in many cases. It doesn’t change that I may have to handle nils, but it saves me from silly mistakes like making a typo foo.baz.

The salient point being: you can choose not to like the syntax, but the two methods of lookup are not equivalent.