How to decode a JSON into a struct safely?

stefanchrobot · May 23, 2018, 1:45pm

What’s the safe way to decode a JSON string into a struct? I want to avoid calling String.to_atom. Jason.decode can give me a map with string keys, but struct() expects atom keys.

paulsullivanjr · May 23, 2018, 2:03pm

Is using Poison an option?

joefractal · May 23, 2018, 3:00pm

Quick example

defmodule Foo do
  @derive [Poison.Encoder]
  defstruct [:bar]
end

defmodule Example do
  def test do
    s = "{\"bar\":\"baz\"}"
    Poison.decode!(s, as: %Foo{})
  end
end

iex(1)> Example.test
%Foo{bar: “baz”}

There are probably more modern ways to do it, but this is was I have been using. You can also nest structs this way.

The number of json encode/decoders seems to grow everyday. https://package-rank.com/wp/hex/poison/-vs-/hex/jason

LostKobrakai · May 23, 2018, 3:06pm

Map.new(%Foo{}, fn {key, _} -> {key, json[Atom.to_string(key)]} end)

axelson · May 23, 2018, 5:58pm

You can also always decode manually and completely explicitly. It is more typing but it also allows you to change the struct or the parameters independently. It is not always the best way but sometimes is.

defmodule Foo do
  defstruct [:bar, :baz]
end

defmodule Example do
  def test do
    s = '{"bar":"abc", "baz": 42}'
    json = Poison.decode!(s)
    %Foo{
      bar: json["bar"],
      baz: json["baz"],
    }
  end
end

stefanchrobot · May 24, 2018, 7:12am

Thanks for the suggestions. I was considering switching from Poison to Jason, hence the question. Looks like I’ll stay with Poison. I’d consider the the manual decoding if this was something bigger, but in this case it’s just a very simple one-to-one mapping.

Gee-Bee · March 28, 2020, 9:14pm

As of Jason version 1.2.0 decode/2 now supports keys option.

It’s worth to mention though it can lead to DoS attack when json data is user controlled.

namxam · October 25, 2021, 7:43am

I don’t know when it was introduced, but I guess it is somewhat relevant if someone stumbles upon it. :keys options offers :atoms! which only convert to already known atoms… therefore mitigating the DoS risk.

Sebb · October 25, 2021, 7:49am

that’s nice.

Say I have this typespec:

@type foo :: :bar | :baz

is :bar an exisiting atom now? Yes it is:

_a = :foo
String.to_existing_atom("foo")
String.to_existing_atom("bar")
String.to_existing_atom("non_exsisting")

 * 1st argument: not an already existing atom

    :erlang.binary_to_existing_atom("non_exsisting", :utf8)
    (temp_atomtest 0.1.0) lib/temp_atomtest.ex:10: TempAtomtest.test/0

Adzz · October 25, 2021, 1:22pm

You can also use an ecto schema then cast to the struct. you can also use helpers to make the casting more generic.

That means the atoms will always exist because they’ll be in the function definition, but requires that you know the shape of the json ahead of time a bit. Which may or may not work for your use case.

Sebb · October 25, 2021, 6:13pm

Just tried that and noticed, that Ecto does not know a :atom type: Ecto.Schema — Ecto v3.11.1

Also I can’t explicitly set a :id field (:id is already set on schema) but each element in my JSON-collection has an id.

Never really used Ecto, so I’m a little lost here.

But it seems like the right approach, eg I can load the JSON-objects in a changeset an perform some extra checks that JSON-schema can’t, eg if a reference is not a dead link.

stefanchrobot · October 25, 2021, 6:17pm

I think that @primary_key false would to the job. By default, every schema has an :id primary key.

al2o3cr · October 25, 2021, 6:21pm

If you’re casting JSON into something like the @type foo :: :bar | :baz from a previous post, a specific Ecto.Type will be safer and clearer than the non-existing :atom field type.

Sebb · October 25, 2021, 7:12pm

Thanks guys, works like a charm.
Am I doing it right?

In my data I have static stuff (here: hobbies and jobs which can be referenced in a list or as a single atom, this works). Also I have data (here person) that has an integer ID. These objects may

reference each other (here friends)
reference other objects (say we’d have a pets field that references pets by [Pet.id_t()]

I see how I could first load all pets and then check in a person-changeset-validation if the referenced pets exist. But I can’t do that with persons referencing other persons, because they may not be loaded yet. So I’d need a second run, right?

defmodule Person do
  use TypedEctoSchema

  @type id_t() :: non_neg_integer()

  @primary_key false
  typed_schema "person" do
    field(:id, :integer)
    field(:name, :string, null: false)
    field(:age, :integer) :: non_neg_integer()
    field(:job, EctoAtom) :: Job.id_t()
    field(:hobbies, {:array, EctoAtom}) :: [Hobby.id_t()]
    field(:friends, {:array, :integer}) :: [Person.id_t()]
  end
end

defmodule EctoAtom do
  use Ecto.Type

  def type, do: :atom

  def cast(data), do: {:ok, String.to_atom(data)}
  def load(data), do: {:ok, String.to_atom(data)}
  def dump(atom), do: {:ok, Atom.to_string(atom)}
end

defmodule Job do
  @type id_t() :: :job_mechanic | :job_doc | :job_programmer
end

defmodule Hobby do
  @type id_t() :: :hobby_painting | :hobby_freeclimbing | :hobby_stampcollecting
end

iex> data = %{id: 1, name: "Bob", age: "18", job: "job_programmer", friends: [2, 4711], hobbies: ["hobby_freeclimbing", "hobby_painting"]}
...
iex> p = 
...>  Ecto.Changeset.cast(%Person{}, data, Map.keys(data)) 
...>  |> Ecto.Changeset.apply_changes()  
%Person{
  __meta__: #Ecto.Schema.Metadata<:built, "person">,
  age: 18,
  friends: [2, 4711],
  hobbies: [:hobby_freeclimbing, :hobby_painting],
  id: 1,
  job: :job_programmer,
  name: "Bob"
}

EDIT: I created a behaviour for the atom-types and I like it. I think I’ll use this.

moogle19 · October 25, 2021, 8:31pm

Quick question: Why not use Ecto.Enum for the job and {:array, Ecto.Enum} for the hobbies?

Sebb · October 25, 2021, 8:37pm

I need a real Atom-Type (which Ecto does not offer). I another context, hobbies may be used as a single atom. (like job).

al2o3cr · October 25, 2021, 10:36pm

Conceptually, you can’t determine if a field like friends has valid values in it without a larger context than the single Person.

When persisting things to the database, the database acts as that context.

If you aren’t persisting the data, then that context is the whole group of Person structs being decoded.

That’s not exactly a “second run”, but it’s similar:

decode each Person
pass the whole list to a function that checks friends for internal consistency: refers to people that exist, is properly reflexive (if desired)

Re: EctoAtom - String.to_atom still makes people pretty worried. What about specific types for the various things:

defmodule Job do
  @type id_t() :: :job_mechanic | :job_doc | :job_programmer

  use Ecto.Type

  def type, do: :atom

  def cast("job_mechanic"), do: {:ok, :job_mechanic}
  def cast("job_doc"), do: {:ok, :job_doc}
  def cast("job_programmer"), do: {:ok, :job_programmer}
  def cast(_), do: :error

  def load("job_mechanic"), do: {:ok, :job_mechanic}
  def load("job_doc"), do: {:ok, :job_doc}
  def load("job_programmer"), do: {:ok, :job_programmer}
  def load(_), do: :error

  def dump(:job_mechanic), do: {:ok, "job_mechanic"}
  def dump(:job_doc), do: {:ok, "job_doc"}
  def dump(:job_programmer), do: {:ok, "job_programmer"}
  def dump(_}, do: :error
end

Then the fields are more specific about what they contain:

    field(:job, Job) :: Job.id_t()
    field(:hobbies, {:array, Hobby}) :: [Hobby.id_t()]

Sebb · October 26, 2021, 8:57am

I addressed that with the EctoAtomID-behaviour: Create a behaviour that uses Ecto.Type - #3 by Sebb.

Would be nice if I could generate the id-type:

@type id_t() :: :job_mechanic | :job_doc | :job_programmer

from the ids list in the macro also. I’ll give that a try.

I’ll try to abstract the reference-integrity-checks also. Maybe I’ll just dump everything into a database to get better checks, but this seems weird because I don’t really need that, I just want a safe way to get some related collections from json into Elixir structs and then torture that data with some pipes.

Adzz · October 26, 2021, 9:11am

This is a great start!

If you keep what you have I would change your EctoAtom module to use String.to_existing_atom though to avoid the risk of atom table exhaustion as an attack vector (especially relevant as we are dealing with JSON parsing).

As mentioned rather than having an EctoAtom type you could leverage the Ecto.Enum type like so:

field(:job, Ecto.Enum, values: [:job_mechanic, :job_doc, :job_programmer])
field(:hobbies, {:array, Ecto.Enum},
  values: [
    :hobby_painting,
    :hobby_freeclimbing,
    :hobby_stampcollecting
  ]
)

Possibly even calling out to a canonical list somewhere if you want a Hobby module:

field(:hobbies, {:array, Ecto.Enum}, values: Hobby.values)

This will successfully cast to atom values if given a string that matches, but will fail the casting if the field is not a string or atom in the given values list. It also means you don’t risk the whole “atom exhaustion” thing too because the atoms are declared as the enum values and String.to_existing_atom is used internally.

Finally you might look at ecto_morph as sugar around casting too.

Sebb · October 26, 2021, 5:51pm

I didn’t know about values maybe I really should RTFM.

EDIT: OK I did RTFM, but I can’t find values option for field