Jason decode string and keys options, what do they do?

jeroenhouben · March 26, 2020, 9:50am

I’m reading the options in Jason.decode but the docs don’t explain it well enough for me to fully understand the pros and cons of the options:

https://hexdocs.pm/jason/Jason.html#functions

I’m parsing a long list of JSON files into Structs and storing those in a long running Agent. Which options should I be using?

TIA,

Jeroen

lucaong · March 26, 2020, 10:14am

The keys option controls whether keys are decided into strings or atoms. Atoms are never garbage-collected, so if the JSON can contain arbitrary keys, you should decode keys as strings to avoid the possibility that a malicious user would send many different keys, creating many atoms and causing your app to leak memory. If you expect the keys to belong to a known set of possible values, you could use :atoms!, which would decode keys to atoms but only allow existing atoms, not create new ones, to prevent the case above.

The strings option controls performance optimization in string keys and values. The end result is the same, but there are performance considerations to take into account. The default is to reference string keys and values: the decoded data would reference the original binary when possible, instead of copying over the strings. In other words, if the original JSON string was "{\"foo\": 123}", and the decoded data is %{"foo" => 123}, the "foo" key in the decoded data would not be a new string, but “point” to the part of the original JSON string that contains foo. This will save memory and time, as no copies of the strings need to be created, but could create memory leaks if you store pieces of the decoded data in memory for a long time: since the decoded data might be referencing the original binary data, the latter cannot be garbage collected. If you don’t store the parsed data in memory, but rather just use it, you can safely use the default here.

Is this clearer?

lpil · March 26, 2020, 10:15am

To add to the excellent info from @lucaong, if you don’t know specifically what you want then the defaults are best

jeroenhouben · March 26, 2020, 11:11am

Well first of all many thanks for this elaborate answer!

If you don’t store the parsed data in memory, but rather just use it, you can safely use the default here.

I read the json from files and then I create structs from the parsed data. And I store those structs in memory… I think I should use :copy to avoid mem leaks then… ?

lucaong · March 26, 2020, 12:00pm

Then :copy might be better, especially if it is fast enough for your case, but you could try to check the memory usage: in most cases it won’t make much difference. The worst case scenario is when you have a huge JSON, but you only save for long time a tiny part of the decoded data: the whole big JSON binary would stay in memory, even if you only use a small part of it.