IndiffererentAccess - Adaptation of HashWithIndifferentAccess to Elixir Maps/Plug

benwilson512 · July 11, 2019, 12:09am

Oh the things people have to deal with when not using GraphQL APIs…

bglusman · July 11, 2019, 12:15am

Thanks! And just to see what it looked like/for anyone following along at home and still interested despite all the consensus against this, I went and added (in PR) a struct with access behavior not relying on the atom lookup hack… https://github.com/bglusman/indifferent_access/pull/2 (EDIT - and then for the hell of it, more config, merged, and new published version, because, I dunno, it took 5 mins and let’s people play with it all… Occurs to me the approaches are hard to compare because the struct approach at least currently only changes top level hash in params, but I suppose when getting a value that’s a map it could wrap it in a new param struct making it sort of lazily recursive/allowing get_in to work at > 1 depth… Maybe I’ll do that or make that another config just for fun! )

NobbZ · July 11, 2019, 5:18am

Exactly.

When I get input from the user, its usually that I validate it and typeconvert to elixir types as necessary. Then its either used for funcalls directly as bare value, or to create some other data. For what I have done so far, this other data is about 90% some struct that existed already before I decided to get some user input.

And in case of projects using ecto, well, at least 50% of them have a schema and a changeset function which makes validation/normalization very easy and straightforward.

Of course, sometimes other data structures make sense, then use it. And of course, if there are user provided relationships, just don’t use atom keys. But either way, deeper in your codebase nothing should access a map by a literal string, as either you know upfront what you expect or you don’t. And if you use a literal string, it feels as if you knew what you should get, but forgot to normalize. (And as always, there are exceptions to this “rule”, case by case)

bglusman · July 11, 2019, 12:17pm

Also released iteration this morning (only took another 10-20 mins so why not) to add option/default for the recursive wrap support to the Params struct… Haven’t tried in my project yet but welcome feedback! This may be more what you had in mind @mhanberg and I think I agree this is a better direction but the old options are still there also.

OvermindDL1 · July 11, 2019, 6:32pm

Heh, it’s like Protocols in elixir, but instead of working on a few limited set of types it works on matchers instead, so it can be exceedingly complex. It supports @priority definitions for matching order, testing callbacks (so you can enforce that things that implement a behaviour follow your contracts at compile-time), etc… etc… Lot of useful features that I’ve kept adding for my own use and a couple by request of others.

If anything you’d want to keep the strings, not the atoms. Converting strings to atoms can be quite costly since they are never GC’d.

<soapbox>
PROPER STATIC TYPE SYSTEMS!
</soapbox>
/me coughs

^.^

I think its mostly a ‘can it be done’ kind of thing, not a ‘should’, a lot of my play projects are just ‘seeing if I can’ too after all. ^.^

ProtocolEx started out as that, just a ‘see if I can’ replacement of Protocol fixing all the faults that I saw in Protocol, it is one of the things that ended up being rather generically useful that I actually do use.

dimitarvp:

Structs are mostly enforced at compile time. They save you from a part of errors you’d otherwise get at runtime.

Structs blow up at runtime if you touch them the wrong way (as your colleague did). And they give you a very clear error message.

Structs convey the message that you really want a certain piece of data to be shaped this way or that way. A good programmer will stop and think “why do they disallow removing of keys from this thing?” and maybe go and ask their colleague in charge of that code – this fosters communication in the team which is always a good thing.

Structs will slap you on the wrists if you introduce a new field and make it mandatory (the @enforce_keys [:field_a, :field_b, ...] annotation just above the defstruct expression) and you’ll have an exact list of compile errors to work through, which accelerates your work (as opposed to getting paranoid and adding more and more tests which might not even catch the problems that can come into existence by introducing the new field).

This this, structs are as close as we get to static typing in Elixir.

+1

I still say this is significantly false, at least for me it makes refactoring and feature development easier and faster as it is much easier to catch the places where things need to be updated.

Same thing in OCaml, it’s considered as being super fast and flexible to iterate/change while being safe about it. Rust is pretty easy to do this in as well though its syntax slows that down a good bit in ways that OCaml doesn’t (but once you ‘get’ Rust it is near as fast, though ‘getting’ rust can be a large hurdle for a lot of people due to how different it is from other things).

I really wish it did. I wish it was a struct of GET params, POST/field params, and arguments from the path, in addition to perhaps another field for plugs to inject things in to.

Soooo much this. Easily 95%+ of my errors in my Elixir apps are things that would have been caught by a static typing system before a compilation ever completed successfully and I wish it did. Sure tests catch things but you have to know ‘what’ to catch, and there is so much a typing system will catch just automatically, definitely not everything but easily that 95% of ‘stuff’ (at least my stuff).

Definitely for me. In C++, OCaml, Rust, etc… the number of times a compile failed and it’s something that Elixir or Python or so would have just let go through to crash at runtime is astounding. I don’t feel reliable coding with Elixir. I feel reliable on the BEAM in the ways its designed, but Erlang/Elixir-the-language I feel like I fight with way too much.

Uh, PHP has a static type system for a while now too. It can give quite a performance boost in addition to the compile-time safety it adds. ^.^

/me still has to deal with PHP for people at times, it’s not near as bad as it used to be…

Yeah I would not be happy with that either. In OCaml to just ‘Get a feature in now’ kind of thing I can scatter failwith "blah" all over the place, that will make the type system happy (it returns ‘anything’, since it crashes instead of returning) and it is super easy to grep for (make the CI not allow it in production builds for example). Scala has the ??? operator for that, C++ has all kinds of ways to die, Rust has panic!(...), etc… There are ways to ‘make the type system happy’ but in a safe way that you can force not to be allowed in prod builds in every strongly typed language that I know that makes iterative development really fast and yet still significantly safer.

Eh, except right now phoenix is kind of ‘munging’ multiple parameter types together. Like just from mind if a field named id is passed in via a GET, a POST, and a router argument (and who knows from what else) or in any combination thereof, then what gets priority, which gets set in this mapping? This is why they should be distinct, they should not be ‘combined’ into one map, that was such a huge PHP error and vulnerability for such a long time that they finally got rid of in like PHP5 or so (though thankfully the routes themselves help ‘clean’ it a bit compared to the PHP horrors, lol).

Likewise, it’s best to think through ideas by debating for each side of the debate. ^.^

I entirely agree with this, at least for my most common class of bugs by far static typing makes impossible just by virtue of how they work (assuming you don’t start stringly-typing everything or so ^.^;).

Eh, it’s still fun to make things just to make them though. ^.^

GraphQL and Absinthe are awesome. ^.^

You can make a new Access function that can be used in things like get_in and so forth to do indifferent access, I think one of the other libraries has one.

“option/default”?

bglusman · July 11, 2019, 7:23pm

Yup! I’ve looked at a few times when browsing your github, just never found/remembered a good excuse to play with yet.

Well, no, I never create atoms dynamically, as much as this may be for fun I wouldn’t want to create something that added actual attack vulnerability. I guess you didn’t look at the code, but it gets initialized with a map of all atoms in system keyed by string so it only ever replaces strings that already have a corresponding atom.

That’s the default behavior of the new version of the library, it converts params to an IndifferentAccess.Params struct with either recursive or static/flat behavior for maps pulled out as values. It also still supports two variants of the old behavior using the atoms_map. I doubt I’ll ever actually use it myself but I think the result is fun and maybe reasonable for some people to use! Maybe we can extend it with your idea for seperate area for different actions or something, I can’t quite visualize that but it sounds interesting, but might not belong in the same library.

OvermindDL1 · July 11, 2019, 8:42pm

Don’t actually need to do that, String.to_existing_atom already handles that.

bglusman · July 11, 2019, 8:53pm

Well, yes but it errors if it doesn’t exist, I don’t want to rescue, I just leave the string in place in that case.

OvermindDL1 · July 11, 2019, 9:02pm

Heh, well it will use a lot less memory, plus it’s missing something big:

Atoms can be dynamically added, which is very common as modules are loaded into the running system, so when the atom list is captured may be very incomplete until ‘later’.

So rescue’ing is better, just wrap it in a wrapper function. ^.^

bglusman · July 11, 2019, 9:05pm

True, but thats an acknowledged limitation… though I suppose that might be a good additional option that wouldn’t require the option_map, but different time/space complexity tradeoffs, the overhead of rescuing when wrong is a lot higher than nil map lookup but agree its more reliable and less memory intensive… nothing wrong with either but there’s some overhead for using this and didn’t want ti to be too terrible but maybe that’s the new third config option!

benwilson512 · July 11, 2019, 9:06pm

Sounds like it’s time to break out benchee

OvermindDL1 · July 11, 2019, 9:36pm

It can potentially be a lot of memory though, tens of thousands to even hundreds of thousands of atoms on some systems, plus you are copying that entire massive map on every call to Application.get_env since it’s not static (a persistent term would fix that though, but still eat memory, and no point putting atoms in the persistent term registery directly as that’s just duplicating the atom registry).

Did you call? ^.^

╰─➤  tail -n 22 bench/atom_map_or_existing_bench.exs 

  def actions(_cla, _setup),
    do: %{
          "map_access" => fn key ->
            atom = atoms_map()[key]
            if atom, do: atom, else: key
          end,
          "map_get" => fn key ->
            Map.get(atoms_map(), key, key)
          end,
          "persistent_mapterm_access" => fn key ->
            Map.get(:persistent_term.get(:all_atoms_map), key, key)
          end,
          "to_existing" => fn key ->
            try do
              String.to_existing_atom(key)
            rescue ArgumentError ->
              key
            end
          end,
    }
end

╰─➤  mix bench atom_map_or_existing
Operating System: Linux"
CPU Information: AMD Phenom(tm) II X6 1090T Processor
Number of Available Cores: 6
Available memory: 15.67 GB
Elixir 1.8.1
Erlang 21.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
parallel: 1
inputs: $blah, true
Estimated total run time: 48 s


Benchmarking map_access with input $blah...
Benchmarking map_access with input true...
Benchmarking map_get with input $blah...
Benchmarking map_get with input true...
Benchmarking persistent_mapterm_access with input $blah...
Benchmarking persistent_mapterm_access with input true...
Benchmarking to_existing with input $blah...
Benchmarking to_existing with input true...

##### With input $blah #####
Name                                ips        average  deviation         median         99th %
persistent_mapterm_access        7.98 M       0.125 μs    ±16.96%       0.130 μs        0.22 μs
to_existing                      4.45 M        0.22 μs   ±119.53%        0.21 μs        0.33 μs
map_get                       0.00118 M      846.05 μs    ±75.88%         478 μs     2972.98 μs
map_access                    0.00113 M      886.78 μs    ±87.96%         488 μs     3203.40 μs

Comparison: 
persistent_mapterm_access        7.98 M
to_existing                      4.45 M - 1.79x slower
map_get                       0.00118 M - 6753.50x slower
map_access                    0.00113 M - 7078.61x slower

Memory usage statistics:

Name                         Memory usage
persistent_mapterm_access           136 B
to_existing                         136 B - 1.00x memory usage
map_get                             192 B - 1.41x memory usage
map_access                          192 B - 1.41x memory usage

**All measurements for memory usage were the same**

##### With input true #####
Name                                ips        average  deviation         median         99th %
persistent_mapterm_access        7.29 M       0.137 μs    ±12.94%       0.130 μs        0.20 μs
to_existing                      7.28 M       0.137 μs    ±11.35%       0.130 μs        0.20 μs
map_access                    0.00116 M      859.70 μs    ±78.67%         474 μs     3089.70 μs
map_get                       0.00116 M      865.42 μs    ±77.40%         474 μs     3133.01 μs

Comparison: 
persistent_mapterm_access        7.29 M
to_existing                      7.28 M - 1.00x slower
map_access                    0.00116 M - 6265.69x slower
map_get                       0.00116 M - 6307.36x slower

Memory usage statistics:

Name                         Memory usage
persistent_mapterm_access           136 B
to_existing                         136 B - 1.00x memory usage
map_access                          192 B - 1.41x memory usage
map_get                             192 B - 1.41x memory usage

**All measurements for memory usage were the same**

(I do an assert to verify $blah doesn’t exist as an atom of course)

In all cases the map from the application environment is the slowest by monumental amounts.

When the atom does exist then then the persistent mapterm and the to_existing_atom are about the same speed.

When the atom does NOT exist then the to_existing _atom is a bit better than half the speed of the persistent mapterm, which at those speeds (millions per second) is not worth caring about.

/me loves benchmarks

EDIT: The atoms_map() and the initialize_atoms_map() calls are just copy/pasted from this IndifferentAccess module and initialized only once on setup. ^.^

bglusman · July 13, 2019, 3:05am

Awesome! I’ll kill the map then and do a rescue! Thanks for running that @OvermindDL1 and thanks for suggesting @benwilson512!

bglusman · July 16, 2019, 7:27pm

Thanks again for the Benchee stats! Hex and GIthub are updated, no longer using atoms_map! Good reminder to always benchmark to test assumptions!

jtompl · March 15, 2023, 4:32pm

For anyone meeting this problem, we’ve just published map_with_indifferent_access library.

It gives you functions like MapWithIndifferentAccess.get/3 MapWithIndifferentAccess.put/3, that mimic Map API, but:

You need to use atom keys when calling the functions. (Even if the map uses string keys.)
If the map uses string keys, key argument will be converted to a string, and only then called with a respective Map function.

For guessing whether the map uses string or atom keys, it looks only at a key of random element of it. It does it the same way Ecto currently does it.

We find it useful when working with Ecto.Changeset , as Ecto.Changeset.cast/4 forbids passing maps with mixed key types.

For example, we have a domain layer code (that uses ecto changesets underneath), that’s called either by a controller (with string keyed maps) or by some other domain layer code (with atom keyed maps).

The MapWithIndifferentAccess allow us to interact with the map (e.g. get or put elements of it), without having to “guess” if it currently uses string or atom keys.