Parsing a Struct String - Code.eval_string/3? Custom Parser? Other Options?

After a few (successful?) attempts at creating DSLs to get structured data from users, I realized that Elixir structs’ syntax fits perfectly what we’re looking for.

My idea is to allow users to type in text that’s valid Elixir for given structs (atom keys) and then read/parse that into internal data structures.

I’ve considered using Code.eval_string/3, a thin layer on Poison’s custom decoders, and building a parser combinator from scratch with NimbleParsec.

How would you suggest I proceed (and why, please)?

These are the factors I’m evaluating to choose what approach to take

  • The security of the approach
  • The time and effort required to implement the approach
  • The work required for future maintenance and extensibility

Any help is greatly appreciated.

1 Like

Give us an example what would your users type?

If it’s something as simple as name: "James", phone: 123 then you can get away with just using String and Enum.

Code.eval_string/3 is not safe, as you have full access to the runtime system, you can read the warning in the documentation.

1 Like

My use case is more involved. I can have nested children up to 3 levels deep.

For example, this is possible:

%Parent{
    field: "Value",
    another_field: "Another value",
    child_list: [
        %Child{
            child_field: "Child 1 data",
            grandchildren_list: [%GrandChild{field: "Value"}, %GrandChild{field: "Some other value"}]
        },
        %Child{
            child_field: "Child 2 data"
        }
    ]
}

So your users basically type Elixir data structures?

Just use Code.string_to_quoted and traverse the AST. It is very easy to do

Exactly

I’ll try this and get back to you.

Thank you

1 Like

For the data you showed perhaps just consider JSON as the user-facing format? Non-evaling conversion to maps/structs without much work by you?

You’re right. The AST is easy to parse.

I might settle for this approach.

Thanks for pointing this out.

1 Like

Thanks, @kip

JSON would have been my goto approach.

However, there are two nuances I have to consider:

  1. Any of Parent, Child, and GrandChild in the example above can be the root node, and the system’s behavior adjusts accordingly.

  2. GrandChild can be of multiple types and the keys for each are different (imagine TypeOne{some_key: "Value",...}, TypeTwo{another_key: "Value"...}…)

If I go with JSON, handling these nuances might mean tagging each sub(object) with {"type": "NodeNameHere", ...}.

For our largely non-technical user base, requiring this tag (along with requiring quotes around each JSON key) will be too verbose.

Atom-keyed Elixir structs feel more succinct and expressive.

# JSON
{
  "type": "Parent",
  "field": "Value",
  "another_field": "Another value",
  "child_list": [
      {
          "type": "Child"
          "child_field": "Child 1 data",
          "grandchildren_list": [
            {
              "type": "TypeOne",
              "field": "Value"
            },
            {
              "type": "TypeTwo",
              "another_field": "Some other value"
            }
        ]
      },
      {
          "type": "Child",
          "child_field": "Child 2 data"
      }
  ]
}

vs

# Struct
%Parent{
  field: "Value",
  another_field: "Another value",
  child_list: [
      %Child{
          child_field: "Child 1 data",
          grandchildren_list: [
            %TypeOne{field: "Value"},
            %TypeTwo{another_field: "Some other value"}
        ]
      },
      %Child{child_field: "Child 2 data"}
  ]
}

If there’s a workaround to reduce this verbosity, I’ll surely want to know 'cause parsing JSON will be an almost zero-stress approach for me.

1 Like

Don’t forget to mark the solution

Another more robust option is to use the structure of YAML or equivalent JSON that looks like the elixir definition above and utilize Nestru library (bias warning - I’m the author) to turn the map into the nested structs like the following:

Install dependencies

Mix.install(
  [
    {:nestru, "~> 0.3.2"},
    {:yaml_elixir, "~> 2.9"}
  ],
  consolidate_protocols: false
)

Define the payload

All nodes with childs has the same named child_list for simplicity.

payload =
  """
  Parent:
    field: Value
    another_field: "Another value"
    child_list:
      - Child:
          child_field: Child 1 data
          child_list:
            - TypeOne:
                field: Value
            - TypeTwo:
                another_field: Some other value
      - Child:
          child_field: Child 2 data
  """
  |> YamlElixir.read_from_string!()
%{
  "Parent" => %{
    "another_field" => "Another value",
    "child_list" => [
      %{
        "Child" => %{
          "child_field" => "Child 1 data",
          "child_list" => [
            %{"TypeOne" => %{"field" => "Value"}},
            %{"TypeTwo" => %{"another_field" => "Some other value"}}
          ]
        }
      },
      %{"Child" => %{"child_field" => "Child 2 data"}}
    ],
    "field" => "Value"
  }
}

Define nodes

defmodule Parent do
  defstruct [:field, :another_field, :child_list]
end

defmodule Child do
  defstruct [:child_field, :child_list]
end

defmodule TypeOne do
  defstruct [:field]
end

defmodule TypeTwo do
  defstruct [:another_field]
end

Implement Nestru.Decoder for nodes

StructDetector.split_module_fields/1 returns the list of struct atom and fields from the given maps list (or a single map). Nestru will call it with an appropriate map to build the child list of nested structs according to the hint.

defmodule StructDetector do
  def split_module_fields(map) do
    list = List.wrap(map)

    try do
      {:ok,
       Enum.flat_map(list, fn a_map ->
         Enum.map(a_map, fn {module_string, fields} ->
           {Module.safe_concat([module_string]), fields}
         end)
       end)}
    rescue
      ArgumentError -> {:error, :nonexisting_struct}
    end
  end
end

require Protocol

defimpl Nestru.Decoder, for: [Parent, Child, TypeOne, TypeTwo] do
  def from_map_hint(_struct, _context, map) do
    if Map.has_key?(map, "child_list") do
      with {:ok, module_fields_list} <-
             StructDetector.split_module_fields(Map.fetch!(map, "child_list")) do
        {modules_list, fields_list} = Enum.unzip(module_fields_list)

        {:ok,
         %{
           child_list: fn _value -> Nestru.decode_from_list_of_maps(fields_list, modules_list) end
         }}
      end
    else
      # Empty hint, decode all keys as-is.
      {:ok, %{}}
    end
  end
end

Decode

{:ok, [{root_module, fields}]} = StructDetector.split_module_fields(payload)
Nestru.decode_from_map!(fields, root_module)
%Parent{
  field: "Value",
  another_field: "Another value",
  child_list: [
    %Child{
      child_field: "Child 1 data",
      child_list: [%TypeOne{field: "Value"}, %TypeTwo{another_field: "Some other value"}]
    },
    %Child{child_field: "Child 2 data", child_list: nil}
  ]
}

payload can start from any node that Nestru.Decoder protocol is implemented for and be of any level of nestedness.

And you can easily support new nodes by including their names in the list in defimpl.

Keep in mind that Code.string_to_quoted is as unsafe as String.to_atom. A new atom will be created for each key, identifier, etc in your input, so it’s an invitation to denial of service. You can avoid this by using :static_atoms_encoder but never, ever parse user input without taking that into account.

4 Likes

Whoa!

Thanks a ton. I didn’t consider this.