Pattern match on map keys that can be atoms or strings

darwin67 · May 24, 2024, 9:29am

Hi folks,

I’m building a library and I have an interesting case when it comes to pattern matching with maps.
Would appreciate any help or pointing me in some directions.

Goal

I’d like to be able to have a map that’s deserialized over the wire from JSON, but can still pattern match with atoms if specified.

Here’s an simplified snippet of the library, where run is the function I exposed from the library, and the anonymous function is provided by the user.

%{foo: value} = run("do something", fn ->
  %{foo: "bar"}
end)

As someone reading this, without knowing what run is doing, the anonymous function just returns a map %{foo: "bar"}, which will be the return value of run itself as well, and a user would want to pattern match against it.

Now here’s the interesting part. The reason run is a wrapper in the first place is it do some extra things to make sure this function is idempotent, and it communicates elsewhere to store the state of the returned function.

So from my perspective of the library author, when I get the result of do something again over the wire and deserializes it, it results in %{"foo" => "bar"} instead, and I have no way to really know beforehand what shape of the data the user is expecting it in.
Which means the attempted pattern match will raise an error.

Things I’ve thought of

If the pattern match errored, catch it and use String.to_existing_atom to attempt again.
Might not be a big deal if the map is flat, but if the map is nested, doing that for each key iteration is likely going to slow things down unnecessarily.

Then obviously I don’t want to just do String.to_atom because we all know atoms are not GC’d. Also just preemptively converting all keys in a map to atoms is also likely to be waste of CPU cycles if they’re not utilized.

Technically speaking, this can also apply to the values of maps as well since someone would want to declare a map that have atoms for both key and values for some reason.
Then that’s even worst.

So rephrasing the question again. How would I take a deserialized JSON map, but will be able to pattern match against a user defined map regardless of the type being a String type or :atom type.

Thanks in advance!

Note

I care less about other data types atm since they aren’t like String and :atom, where essentially they’re string literals in different presentations with some different characteristics, but are somehow not fully compatible with each other.

If you’ve used Ruby/Rails before. I basically want something like HashWithIndifferentAccess

require "active_support/hash_with_indifferent_access"

framework = ActiveSupport::HashWithIndifferentAccess.new
framework[:name] = 'Ruby on Rails'

puts framework[:name]   # Ruby on Rails
puts framework['name']  # Ruby on Rails

And yes, I know it’s generally frowned upon in Elixir, but as a library author, I don’t have control over the data a user might be putting into it.

Hope this helps with the context.

mudasobwa · May 24, 2024, 9:44am

I am a bit lost about which part of the code above comes from the library, and which is expected to be written by the user.

If you want run/1 to return something that would happily match to both %{foo: :bar} and %{"foo" => :bar}, it’s impossible (unless you override Kernel.=/2 which I would strongly rule out.)

Could you please clarify?

darwin67 · May 24, 2024, 9:49am

Only run/2 is the library function, everything else is written by the user.

mudasobwa · May 24, 2024, 9:50am

Then I am not sure how were you going to “if the pattern match errored, catch it and use.”

mudasobwa · May 24, 2024, 9:54am

Generally speaking, in such a case the approach would be to accept the options in a call to run/3 as for instance jason does.

# def run(name, options \\ [keys: :strings], fun)

%{foo: value} = run("do something", keys: :atoms, fn ->
  %{foo: "bar"}
end)

%{"foo" => value} = run("do something", keys: :strings, fn ->
  %{foo: "bar"}
end)

# the above is default
%{"foo" => value} = run("do something", fn ->
  %{foo: "bar"}
end)

Or simply document run/2 as returning binary keys always.

fuelen · May 24, 2024, 9:54am

JSON is a very limited and is not 100% compatible with elixir terms.
If you want to use JSON, then you have to expose this internal detail to the end user, since you can’t convert {:ok, %{{:a, :b} => ~D[2021-01-01]}} to JSON and back without additional efforts. There will be some limitations and user should know about them.
So, I’d suggest adding an option to run/2, so the end user can specify which keys he’s interested in. Like this is done in Jason.decode/2 with :keys option.

Probably, the easiest way to receive the same data which user passes is by using :erlang.term_to_binary and :erlang.binary_to_term for serialization, instead of JSON.

darwin67 · May 24, 2024, 9:59am

Huh, didn’t thought about the options path.
That’s a good idea.

Then it’s possible to have configurable library defaults + specific overrides as well.
I think that works.
I should’ve mentioned this earlier, but the key thing is what is returned is what the user expects, so as long as the user have the controls to what they want, then it’d work for me.

Thanks a lot.

al2o3cr · May 24, 2024, 4:39pm

+1 to @fuelen’s point about JSON. There are a LOT of terms that won’t cleanly round-trip through that process, so you’d be better off explicitly documenting what will - or switching to a serialization format that’s higher-fidelity.

D4no0 · May 24, 2024, 4:50pm

One warning is that if the data comes from an untrusted source, using binary_to_term is potentially dangerous as it is possible to define executable code inside of a data structure.

darwin67 · May 24, 2024, 4:58pm

So now that I have a path forward with maps, I have a follow up question.
What if the returned object is a struct?

Syntax will look like this now.

%Foobar{foo: value} = run("do something", fn ->
  %Foobar{foo: "bar"}
end, opts)

This can be in any combination as well, since you can also do something like

%{foobar: %Foobar{foo: "bar"}}

When thinking in Golang or Rust, you can specify the return type of the anonymous function passed to run/2 (or run/3) as something like map[string]Foobar or maybe just Foobar.
This would actually simplify things if I have a target type to deserialize the data into.

Sample golang code will look like

type Foobar struct {
  foo string `json:"foo"`
}

foobar := &Foobar{} // instantiate a struct
err := json.Unmarshal(byt, &foobar) // deserialize the json bytes into the struct

Any ideas if there are ways to accomplish something similar in Elixir?

Adzz · May 24, 2024, 5:44pm

If the user implements the anonymous function passed to run, are they not in charge of what is returned? In which case they will know whether the function returns a map with string keys or atom keys because they implemented the function to do that? Or am I misunderstanding?

darwin67 · May 24, 2024, 8:21pm

Yes they do. But

Now here’s the interesting part. The reason run is a wrapper in the first place is it do some extra things to make sure this function is idempotent, and it communicates elsewhere to store the state of the returned function.

So run/2, run/3 will be doing stuff around the user provided function, which would cause a round trip over the wire.
That means when the data comes back as JSON and is deserialized, the type information is basically lost.

Note

I’m skipping a lot of internal details of why making the round trip over the wire, but that’s basically a requirement (at least at this point in time).

darwin67 · May 24, 2024, 8:30pm

The source of the data is trusted in this case

darwin67 · May 24, 2024, 8:31pm

The bin serialization is new to me. I’m gonna give it a try!
Thank you!

Adzz · May 24, 2024, 8:34pm

In which case you could use a JSON parser that decodes to atoms keys in maps when the caller returns a map with atom keys, if you know that