Decoding nested JSON the right way in Elixir

This is basically my answer to Decoding nested JSON the right way in Elixir | DanPetrov. I’m posting it here because it might be helpful to others.

I know this could be leading to memory overflow if you are always getting different keys. But if that’s not a thing, then it should be pretty safe.

defmodule MapHelper do

  def keys_to_atoms([]), do: []
  def keys_to_atoms([h | tail]) do
    [keys_to_atoms(h) | keys_to_atoms(tail)]
  end
  def keys_to_atoms(string_key_map) when is_map(string_key_map) do
    for {key, val} <- string_key_map, into: %{}, do: {String.to_atom(key), keys_to_atoms(val)}
  end
  def keys_to_atoms(value), do: value
end

It seems to me a pretty elegant solution that just works with all kinds of data. You can just Jason.decode!/1 the JSON and then use MapHelper.keys_to_atoms/1 to get your proper map.

This is dangerous because it basically allows the client to DoS the server by sending different keys until the atom table overflows. It’s not a memory issue, because the atom table doesn’t take much space and the default limit is quite low. But the VM will crash if it’s overflowed and the atoms are never garbage collected.

You must use String.to_existing_atom if you want to do something like this, unless the sending side is also in your complete control (usually it’s not).

4 Likes

That’s exactly what I meant by

I know this could be leading to memory overflow if you are always getting different keys. But if that’s not a thing, then it should be pretty safe.

Using String.to_existing_atom is a nice addition though, thank you. It could be used for external API requests with the only hindrance being that all the usual keys need to be defined once.

Yep, I just wanted to be clear about it, because there are newbies who end up in these forum threads through search and may get the wrong impression (“oh I have lots of memory so this doesn’t concern me”).

3 Likes

Hey guys!
Yes, you are absolutely right, which is why I wrote an update in the blog post very soon after publishing about how to force existing atom re-use.

For context, my use-case was an HTTP client that receives JSON payloads from an API.
The main issue with String.to_existing_atom under the hood is that if you receive an unexpected key that you have not defined in any struct module or loaded anywhere in your project, you will get an error. I hit an issue where the API author did not properly document all keys being sent over the wire, so it was a matter of speccing out the models until the parser was happy.
If you receive arbitrary JSON payloads from client requests, maybe this approach is quite elegant - you are protecting yourself from a DoS as you mentioned, and since you are on BEAM, you don’t really care if a bad payload fails the request. You can even do initial validation of part of the payload in a Plug, or on a reverse proxy (which is always a good idea).