Very slow compile times when frequently referencing large module attributes

Hi,

As a learning project I’m trying to write an aws client in elixir. I’m using the api definitions from the ruby sdk to auto generate elixir source files (I subsequently found aws-codegen, but half the fun at this point is figuring out how to do these things)

Some of the source files I’m generating are taking a long time to compile - the one that prompted me to write this post took 20m. Partly this is due to memory usage - it routinely gets to 9G and above, swaps and may get killed. Smaller files which don’t induce swapping can still take 20-30s to compile (this is on a machine with 16G of ram)

My issue persists if I cut back the generated code to something like this

defmodule Aws.Dynamodb do
  @shapes %{
    "SomeInput" => %{type: "structure", required: ["Foo", "Bar"], members: %{"Foo" => {"shape" => "AShapeName"}, "Bar" => "AnotherShapeName"},
    "AnotherShapeName" => %{ ... },
    ... #many other shapes
  }

  def some_api_method(data) do
    @shapes["SomeInput"]
  end
 
 ... #more methods, each referencing @shapes
end

The api definition files provide a list of definitions of the inputs each api method expects. These shapes usually reference other shapes, for example a structure shape gives the shapes of each of its members. My pathological sample has about 800 of these shapes (see gist:69e163947ef170c31388518b6616f334 · GitHub ) . The api methods will eventually check input data against the correct shape, make the http request and then use another shape to decode the response, however in the gist, all they do is reference @shapes.

If I delete all the methods that reference @shapes then the file compiles in under a second. For every method I add back, compilation gets slower - with 5 methods it takes 3 seconds, with 20 takes about 14s. At some number of methods swapping kicks in and the compile times explode.

However, I have a single get_shapes method that just returns @shapes, and all the other methods use get_shapes instead of @shapes directly then the file compiles in 1s. I also noted that the generated .beam file is a lot smaller

http://elixir-lang.org/getting-started/module-attributes.html#as-constants says module attributes are used as constants, and indeed that’s what I’m trying to do. It also says

Notice that reading an attribute inside a function takes a snapshot of its current value

Does this mean that each time i declare a function that references the attribute, elixir is actually creating / storing a new copy of the attribute? This would explain the size difference in the .beam file when funelling access via @shapes, although I don’t understand why it makes it so much slower. Am I horribly misusing module attributes?

Thanks,

Fred

1 Like

At a very basic level you can imagine this module attributes as a “search’n’replace”, so your

  def some_api_method(data) do
    @shapes["SomeInput"]
  end

does read as

  def some_api_method(data) do
    %{
      "SomeInput" => %{type: "structure", required: ["Foo", "Bar"], members: %{"Foo" => {"shape" => "AShapeName"}, "Bar" => "AnotherShapeName"},
      "AnotherShapeName" => %{ ... },
      ... #many other shapes
    }["SomeInput"]
  end

So depending on the actual size of your attribute, you might see yourself where the huge impact on memory usage comes from.


edit

This is a very drastic simplification, but should be valid for this case!

1 Like

Indeed, and instead of using a @shapes attribute it might be better to put it in a function (even if that function uses @shapes then use that function call everywhere instead, that way it is only compiled-in once and stored in memory only once, should be significantly faster to compile (and faster in execution too due to not needing to load/reload the data all the time I’d imagine).

1 Like

Yes, that’s what my experiments showed. Doing a search & replace of @shapes with the actual Map literal yielded similar results to the initial very slow to compile file, wrapping access to @shapes in one function makes everything fast.

I guess just a bit of a gotcha if you do use them exactly as I would a constant in languages such as ruby.

Thanks both of you for your input!

2 Likes

Also keep in mind that, if you have large (attribute or literal) binaries, they are slow to compile due to a compiler bug but it will be fixed in Erlang 20.

I have also changed the guides to make it clear that every time an attribute is read inside a function, a snapshot of its current value is taken.

Great discussion.

2 Likes

Brilliant - Thanks a lot.