Is there any semantic meaning to atoms

Crowdhailer · December 8, 2016, 4:29pm

The short version of this questions is, what is the purpose of atoms?

I know that strings and atoms are represented differently at a low level and that atoms allow quick comparison amongst other benefits. However if all the do is optimize the code then could it be argued that their use is a premature optimization.

There is a cost when translating code that uses atoms to strings and visa versa. The main occasion where this comes up for me is when I have a web form that is decoded with string keys that I want to map to a struct to pass to some domain code.

Ideally I would have the following

sign_up_form = # pull from a request
sign_up = struct(MyApp.SignUp, webform)

However because the sign_up_form has string keys I end up doing something more like

sign_up = %MyApp.SignUp{
  password: sign_up_form["password"]
  username: sign_up_form["username"]
  # etc etc
}

dsissitka · December 8, 2016, 4:37pm

Have you seen http://stackoverflow.com/questions/34446221/atom-keys-vs-string-keys-in-phoenix?

Crowdhailer · December 8, 2016, 4:41pm

yes. I am aware there is a maximum number of atoms. That is why I can’t just iterate through the webform turning keys to atoms.

bbense · December 8, 2016, 4:46pm

Atoms only have the meaning you assign them. One reason to use atoms would be to perhaps map many input strings into a single field. (i.e. user, User, USER, etc… ) or to reject input types that you don’t know how to handle.

Structs are largely maps with Atoms as keys, the reasons for implementing them this way are to be unambiguous and finite. It also makes some of the syntactic sugar easier to implement.

However, if it makes more sense for your application to use a Map with strings as keys, then use that. It is however much easier to shoot your self in the foot that way. The general consensus is to use Maps with strings as keys at the border and then map those into the internal Structs that you know about.

It is however a trade-off that may or may not make sense for your code. The rules are there so you’ll think about them before you break them, not so you’ll always follow them.

OvermindDL1 · December 8, 2016, 5:21pm

An atom is basically an integer, nothing more. The runtime has mappings between these integers and a string value, but all comparisons are done on these integer values (literally an integer too if you look at how erlang encodes them at runtime, there is an atom map to do the translations at compile time that things like binary_to_atom look into).

In C parlance it would be like a global:

enum Atoms {
  List,
  All,
  Atoms,
  Here,
}
const char *getStringOfAtom(Atom atom) { ... }

In C++ it would be more accurately a flyweight’d string:

typedef flyweight<std::string> Atom;

Atom someAtom("SomeAtom");

// Use someAtom, compare it, etc..., it is actually a handle into a global atom map,
// which in this case does get 'collected' when all atoms go out of scope because of
// RAII, which BEAM/EVM does not do for efficiency.  But you can get the string back
// from it, compare it fast, whatever...

Although I made a library a while back when C++11 came out that I’ve been using since to give me atom-like things in C++ without any of the runtime or GC or memory costs, except it limits me to a set amount of characters of a maximum length, certainly not as generic as erlang’s atoms, but hey, I can even switch on them (since they are just integers under-the-hood)! The code I have a copy of in my OverECS example project: https://github.com/OvermindDL1/OverECS/blob/master/StringAtom.hpp

using namespace OverLib::StringAtom;
Atom64 atom{}; // A default-allocated atom is just the empty string ""
atom = "SomeAtom"_atom64; // The "SomeAtom" string as an atom, this happens at compile-time
atom = atomize64("SomeAtom"); // The "SomeAtom" string as an atom, happens at run-time
std::string atomString = deatomize64(atom); // Get the string that the atom represents, this only happens at compile-time
// Yes this works!  And was the original motivating reason too.
switch(atom) {
case "AnAtom"_atom64: blah(); break;
case "SomeAtom"_atom64: blorp(); break;
default: bloop();
}

I used those to great effect in a lot of systems. It is just a simple 5 or 10 char -> integer mapping via a mapping table with optional loose(default) or tight encodings, the tight gives you a few extra chars of length in exchange for forcing case-insensitivity. Usually I use flyweight strings for longer ‘interned’ strings that allow for pointer comparisons, which are fast, but when the overhead is too much or I want to store in less space, my atom’s have been awesome. For example I pass around events in some of my projects like this:

void handleEvent(VariantMap event) {
  // This mapping is done at compile-time so it just becomes a quick integer lookup:
  float tick = event["DELTA"_atom64].get<float>(0.0f);
  // This mapping is done at run-time, but still fast:
  float tick = event["DELTA"].get<float>(0.0f);
  // Though for generic things like a DELTA call I actually have a global helper type that does the casting/default/andAllElse:
  float tick = event[DELTA];
}

That is also exposed to LUA and the usage inside a VariantMap makes it very easy to use and make events from inside LUA:

local function handleEvent(event)
  -- Dynamic access, still pretty fast actually
  tick = event["DELTA"]
  -- Using a registered deserialization object, much faster, but does not really matter overall
  tick = event[DELTA]
end

And of course, OCaml has built-in ‘atoms’:

let someAtom = `GlobalAtom

let anotherAtom = `AnotherAtom

Except in OCaml you can also attach additional data to its ‘atom’ (polymorphic variant), basically like a tagged tuple in erlang:

let something = `GlobalAtom 42

A given atom without data and an atom with data are two different atom types and will not match, say like this:

`Ok (42, "string")
(* Does not match: *)
`Ok 42
(* Does not match: *)
`Ok

Just like:

{:ok, 42, "string"}
# Does not match:
{:ok, 42}
# Does not match:
:ok

Though like in erlang/elixir you can test that it ‘is’ a polymorphic variant, then refine on it if want to get data or not.

So in essence an atom is anything for whatever the context wants it to be.

sasajuric · December 8, 2016, 5:43pm

IMO they server as symbolic constants. Something akin to enums from C if you will.

For example, instead of passing around a magical number of 42, we could pass around :meaning_of_life, and convert to integer at the system boundaries, as @michalmuskala explained here just a few minutes ago.

Other examples include tagged tuples (e.g. {:ok, value}, {:error, reason}, reply tuples in behaviours, etc. If atoms didn’t exist, we’d either have to pass magical integers or strings.

Since atoms represent “well known” pieces of information, they are also used in structs or maps as keys representing fields we expect in them.

If you mean the cost of typing, there was some helper library for that (I can’t remember which one though, maybe someone can step in). Otherwise it’s not really hard to write a simple one converter yourself.

Crowdhailer · December 18, 2016, 11:14am

Thanks, That link about conversion at the boundaries was really helpful.