Help with structs benchmarks

jeanGeorge · May 5, 2022, 4:39pm

Hi to everyone!

I am looking for some Elixir benchmarks that use structs. I would appreciate it if you could share some with me.

Why? I’m working on a project (my undergraduate thesis) whose goal is to improve the performance of the map lookup operation in BEAM. Now, I’m at a stage where I need to perform tests to analyze the behavior of the current BEAM implementation when using Elixir structs.

al2o3cr · May 6, 2022, 2:12am

Here’s some prior discussion that seems like a good starting point:

A big change worth learning about for this specific case (Elixir structs) is the new_map_lit machinery that landed in OTP 21:

github.com/erlang/otp

Introduce new_map_lit operation in the loader

erlang:master ← michalmuskala:new_map_lit

opened 10:06AM - 03 Jul 17 UTC

michalmuskala

+180 -0

Optimise construction of maps when all keys are literals but building a literal …tuple in the loader. This benefits especially small maps since they basically consist of two tuples - one for the keys and one for the values. This change should benefit especially Elixir, because of the record replacement - structs are always built with all literal keys. The change effectively means significantly reducing memory use (almost by a half) of programs using maps to replace records.

josevalim · May 6, 2022, 5:31am

Thank you @al2o3cr!

In particular, we are looking for any function or library that makes heavy use of structs, especially under a loop.

So for example, if you have code or library that is traversing a large list and modifying a struct as it goes, that would be very helpful.

mat-hek · May 6, 2022, 7:56am

In Membrane, we use nested structs and (usually small) maps for element state, which is basically a GenServer state. There are rather no heavy loops over it, but we figured out that access to structs impacts performance significantly. So we replaced Elixir’s Access by macros that construct pattern matches in the compile time. That turned out to provide significant speedup in some of our benchmarks. Not sure if that’s what you’re looking for, but I guess speeding up structs/maps themselves could give Membrane another boost.

stefanluptak · May 6, 2022, 8:20am

I think rebuilding Commanded’s aggregate from a lot of events (when there’s no snapshot) could be a good example of that.
We’re doing that in our company too. I can provide you with some test code and data or maybe even a Benchee script to run, if you want.

josevalim · May 6, 2022, 8:58am

@stefanluptak that would be perfect! An Elixir script with Mix.install at the top that installs the required depsand then proceeds to aggregate would be perfect!

@mat-hek if that can be isolated, that would be perfect too! Especially knowing the code was a bottleneck in the past.

stefanluptak · May 6, 2022, 9:01am

OK, I will do my best. But since it’s Friday already, I will send you that on Monday.

Adzz · May 7, 2022, 8:41pm

Oh!

Here’s a thing that might help. I’ve been (experimenting) with XML parsing. One iteration I did was creating a Saxy handler that parses the XML into a struct, something like the following,

Given this xml:

<ParentNode price="1">
  <MyNode>Stuff</MyNode>
  <AnotherNode>
    <Child>Stuff</Child>
  </AnotherNode>
</ParentNode>

The Saxy handler returns:

 %XMLNode{
   attributes: [%XMLAttr{name: "price", value: "1"}],
   content: [
     %XMLNode{attributes: [], content: ["Stuff"], name: "MyNode"},
     %XMLNode{
       attributes: [],
       content: [%XMLNode{attributes: [], content: ["Stuff"], name: "Child"}],
       name: "AnotherNode"
     }
   ],
   name: "ParentNode"
 }

THEN we have a library called data_schema that queries into that representation to create a different struct from it. It does that by reducing through a schema and querying into the result from Saxy, so if you have large XML you’ll be doing lots and lots of Map.gets into the result of the Saxy handler to get the values for your new struct.

I have created a branch on the data_schema repo called xpath_experiment that has the Saxy handler and a data accessor that you can use to parse XML.
See this example from the tests:

Doing this will result in a lot of Map.fetch into a struct - 1 per node and attr.

If you think this will be useful to you I can get together a large XML (~15mb) example to use, along with schemas for that.

josevalim · May 7, 2022, 10:03pm

Yes, I would love to use this benchmark! However, note that we won’t be able to optimize Map.fetch but we should be able to optimize pattern matching:

case map do
  %{^key => value} -> ...
  %{} -> ...
end

Or map.field.

Adzz · May 8, 2022, 11:50am

Nice, yea no problem it is trivial to turn the Map.fetch! into a map.field

I will get something together today for you.

Adzz · May 9, 2022, 11:56am

Sorry for the delay, took a bit longer to get the schemas in order than I expected.

Here is a benchark that you can run with mix run bench.exs at the root of the project. It will parse a large XML in the manner I mentioned.

Let me know if you can’t access it for whatever reason.

Adz

stefanluptak · May 10, 2022, 11:49am

Sorry, it’s a bit later…

I realized that the core what we’re doing is working with a tree which is inspired by this GitHub - japplegame/avl_tree: Pure Elixir AVL tree implementation

But that one above is using tuples, so I changed it to use structs. Here’s the fork: GitHub - stefanluptak/avl_tree at use-structs-instead-of-tuples

There’s even a benchmark script already, so you can just run mix run bench/run.exs Just do not forget to use that particular branch on my fork.

I hope it will be helpful somehow.

mat-hek · May 18, 2022, 9:23am

It’s been some time, but the benchmark we had wasn’t straightforward to run, so I made another one: membrane_benchmarks/core_beamchmark at master · membraneframework-labs/membrane_benchmarks · GitHub

Hope it’s useful