Choosing between maps, structs, records etc

fcheung · September 27, 2016, 10:03pm

Hi,

As I mentioned in an earlier topic, I’m playing with building an elixir client for the AWS apis from api definitions they provide. These apis have lots of different return types, also described by these api definitions. I’m wondering what data types the elixir implementations should return.

They could just return maps (and the values inside by further maps/lists/strings/numbers etc.) but I’m worried about this being error prone from the point of view of the user. It’s very easy to typo a map key (I swap US and UK spellings all the time for example).

I could define structs for all the various return types, but there are a lot of those (since there are lots of apis, and some of the return / input types are themselves made up of many types) and this seems to make compilation slow. To give an order of magnitude, when I tried this with just the EC2 api, I ended up with 536 structs and compiling a file that just defines those structs takes 18 seconds on my machine. (As an aside are there other costs / overheads associated with having hundreds of modules that just define a struct? Across all the aws apis there would be many thousands of these structs).

I read through José’s post on some the rational for structs and for this use case a lot of it isn’t necessary - I do just want some compile time checks & easy discoverability of what these apis return.

I’ve come across records via the post I linked earlier, but they don’t seem to get a lot of use (the Programming Elixir book doesn’t even mention them). Using either structs or records feels a bit like I might be trying too hard to recreate what I might do in ruby. Posts such as https://engineering.appcues.com/2016/02/02/too-many-dicts.html do seem to encourage the use of structs.

Lastly I’ve seen type specifications. “Programming Elixir” says “type specifications are not currently in wide use in the Elixir world” and José’s post does say that typespec support for maps is lacking. Both of these are from 18+ months ago though - the release notes for erlang 19 do say that dialyzer support for maps is “very much extended” & http://elixir-lang.org/docs/stable/elixir/typespecs.html certainly seems to list some of the features I’d want - required keys, optional keys etc.

The process of writing this nudged me slightly in the direction of maps + type specs, but I would love to hear more informed opinions!

OvermindDL1 · September 27, 2016, 10:42pm

As a long time erlang user I like Records, but maps/structs have supplanted their use.

Just note, a record is just a tuple with some compile-time names given. I’d have to know more about your use-case and why maps are slow for you, and given that I’m not sure records would be much faster. Can you give an abbreviated example of what you are trying to accomplish and why there would be so many storage things?

epailty · September 28, 2016, 2:55am

AWS API responses are XML

E.g.

<DescribeKeyPairsResponse xmlns="http://ec2.amazonaws.com/doc/2015-10-01/">
  <requestId>7a62c49f-347e-4fc4-9331-6e8eEXAMPLE</requestId>
  <keySet>
    <item>
      <keyName>gsg-keypair</keyName>
      <keyFingerprint>
         00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
      </keyFingerprint>
    </item>
  </keySet>
</DescribeKeyPairsResponse>

Which I think could easily be expressed as structures.
Some good discussion about records and structs:
https://groups.google.com/forum/#!msg/elixir-lang-talk/6kn7J2XnFg8/I5poTNCEHwAJ

fcheung · September 28, 2016, 6:01am

The number of structures is dictated by the number of data types that the api description files define - 276 for the rds api , 536 for the ec2 one, 225 for IAM etc. Across 72 apis / api versions that adds up to a lot of structs.

I’m not sure about runtime slowness - I haven’t written enough of this api client that you can actually use it to do anything. I’m just noticing that the large numbers of structures I’m creating is resulting in these slow compile times (like I said, 18 seconds for just one of the was apis), which is annoying on its own, and possibly a warning that I’m not doing things right.

josevalim · September 28, 2016, 11:42am

The sheer amount of data types seems to make the structs road unfeasible. I would go with maps because they are still typo safe: map.foo will raise if the field foo does not exist, as well as pattern matching on %{foo: foo}. It does suffer a bit on discoverability when compared to structs.

fcheung · September 28, 2016, 11:44am

That makes sense. I can populate the maps with nils for fields that are optional in api responses.

fcheung · September 28, 2016, 12:59pm

One other thing, aside from what is the correct interface, why is it that defining lots of structs causing slow compile times?

Is it because each struct is module and there is a cost to pay for each module? Is this this something to worry about normally or is only edge cases with hundreds of modules per file that are affected?

josevalim · September 28, 2016, 1:12pm

Yes.

Only when defining hundreds.

jlevy · February 19, 2017, 1:47pm

It looks like Postgrex makes heavy use of records for a similar use case. Is there a distinction that I’m missing? Or would you recommend this be built differently if done today?

OvermindDL1 · February 19, 2017, 8:24pm

Maps are a fairly recent development in the BEAM world, most uses of Records are better served by Structs nowadays, and Postgrex has been around a long while.

Records are still technically better in some ways, like accessing a field in it is ‘slightly’ faster than a map, but inserting an update in a map ‘might’ be slightly faster than a record ‘if’ the record is decently sized.

josevalim · March 4, 2017, 9:26pm

Think of records as glorified tuples. When you need to handle multiple different tuples, which are private to a module, records work great. Maps would also work in the example above, but they wouldn’t give the compile time guarantee of records. Structs would be too wasteful though for those cases. Those data structures are never really “exported”, so using multiple modules for representing them is quite unnecessary.

jlevy · March 9, 2017, 10:36pm

Thanks, @josevalim. That’s helpful!