Safe deserialisation of Elixir maps

Hi everyone,
Is there a safe way to save / load an elixir map to a file? I’ve seen that for example, mix.lock is loaded using Code.eval_quoted but presumably if you don’t trust the source string, it’s not a safe operation?

eg:

{:ok, quoted} = Code.string_to_quoted("%{a: \"foo\", b: 1+2}")
Code.eval_quoted(quoted)

gives:

{%{a: "foo", b: 3}, []}

I guess what I’m looking for is an equivalent to Javascript’s JSON or Clojure’s EDN, but in situations where JSON encoding doesn’t support things like atoms, keys in maps that aren’t strings etc, but also doesn’t allow code execution, maybe deals with structs / missing things?

My use case is kind of quick and hacky utility code to serialise stuff that’s also human readable and editable, then to load it back later.

Thanks for any suggestions!

1 Like
defmodule Consultant do
  def file(path) do
    with {:ok, data} <- File.read(path), do: string(data)
  end

  def string(input) do
    with {:ok, quoted} <- Code.string_to_quoted(input), do: {:ok, parse(wrap(quoted))}
  catch
    {:error, _} = error -> error
  end

  defp wrap({:__block__, _, data}), do: data
  defp wrap(data), do: [data]

  defp parse(data) when is_number(data) when is_binary(data) when is_atom(data),
    do: data

  defp parse(list) when is_list(list) do
    Enum.map(list, fn
      {k, v} -> {parse(k), parse(v)}
      other -> parse(other)
    end)
  end

  defp parse({:%{}, _, data}) do
    for {key, value} <- data, into: %{}, do: {parse(key), parse(value)}
  end

  defp parse({:{}, _, data}) do
    data
    |> Enum.map(&parse/1)
    |> List.to_tuple()
  end

  defp parse({:__aliases__, _, names}), do: Module.concat(names)

  defp parse({:sigil_W, _meta, [{:<<>>, _, [string]}, mod]}), do: word_sigil(string, mod)

  defp parse({:sigil_R, _meta, [{:<<>>, _, [string]}, mod]}),
    do: Regex.compile!(string, List.to_string(mod))

  defp parse({sigil, meta, _data} = quoted) when sigil in ~w[sigil_w sigil_r]a do
    line = Keyword.get(meta, :line)
    throw({:error, {:illegal_sigil, line, quoted}})
  end

  defp parse({_, meta, _} = quoted) do
    line = Keyword.get(meta, :line)
    throw({:error, {:invalid, line, quoted}})
  end

  defp word_sigil(string, []), do: word_sigil(string, 's')

  defp word_sigil(string, [mod]) when mod in 'sac' do
    parts = String.split(string)

    case mod do
      ?s -> parts
      ?a -> Enum.map(parts, &String.to_atom/1)
      ?c -> Enum.map(parts, &String.to_charlist/1)
    end
  end
end

I need to make it as a library or submit it to the core (but it would probably be rejected in favour of mentioned library).

Just beware that this will not support values like 1 + 2 as this format require all values to be constant (however adding support for some simple computations shouldn’t be that hard).

1 Like

Thanks, that’s really useful!

Actually, enforcing that the file is totally static and not able to execute anything would be a benefit, avoiding various security issues that affected Yaml for example. Then you can be sure that at the minimum loading a file either fails or it’s some safe subset of Elixir maybe generated via inspect.

I’ll have a play around with it for a few types of input I’d like to use.

1 Like

Credo has a very similar implementation. I agree that a canonical library (or even putting it in elixir itself) would be very useful:

There’s the EON library that does exactly what you’re looking for

1 Like

If you just want to store a raw map in a file and then be able to load it back into the system at a later time then there is a much simpler, and much faster, way of doing this and that is to use :erlang.term_to_binary/1 and :erlang.binary_to_term/1 . These functions convert any erlang/elixir term to the Erlang term format as a binary and then back again. So you could do something like

def save_term(file, term) do
  bin = :erlang.term_to_binary(term)
  File.write_file(file, bin)
end

def restore_term(file) do
  {:ok,bin} = File.read(file)
  :erlang.binary_to_term(bin)
end

Note that this stores and returns the raw data structure.

7 Likes

aha thanks @massimo, that’s exactly it.

Was thinking the package name Exon wouldn’t quite be appropriate ;-). Eon could use a bit of SEO in hex.pm, eg for ‘serialise’ perhaps.

Also just a comment for @hauleth, I had an error parsing maps with a key that’s a 2 element tuple, like: %{:a => "foo", {1,2} => "bar"}, one of the reasons why I was looking for json alternatives in the first place.

Thanks everyone!

But it won’t be human readable, as the OP mentionned :slight_smile:

Was this a requirement? If so I missed it and this would not be a solution. :slight_smile:

2 Likes

Otherwise term_to_binary/binary_to_term would be a perfect solution.

aha thanks for your reply, I was using term_to_binary in the mean time :slight_smile: But yep it’s nice to be able to read the files for what I’m using it for.

It probably could be fixed quite easily. I will try to make it into library and then provide test suite so such syntax would be possible as well.

1 Like

By the way those two functions should really be available without the :erlang. prefix from Kernel. It’s such a great tool but I feel many fellow developers do not know about them.

1 Like

Also, https://hexdocs.pm/plug_crypto/Plug.Crypto.html#non_executable_binary_to_term/2 exists, if you are willing to relax your requirement about human-readable intermediate format.