Module with too many functions exausts erlang atoms

Background

We are currently testing a module with 10 millions functions. This is an automatically generated module that we (pesky humans) can’t touch. Ever.

Problem

Upon compiling said module BEAM blows saying that we have gone over the atoms limit for erlang. This is surprising. Following is the code sample used to generate the automated module (here simplified), which will cause you the same problems:

defmodule PocManyClauses do
  @list (1..10000000) |> Enum.map(fn n -> :"fn_#{n}" end)

  Enum.map(@list, fn n ->
    def unquote(n)(), do: unquote(n)
  end)

end

Questions

Are function names considered atoms in erlang?
Or is the example we are using malformed?

This line here clouds what you’re trying to prove. This first generates a list of 10 million atoms. This will blow out the atom table long before you try to actually make any functions.

iex(1)> defmodule PocManyClauses do
...(1)>   @list (1..10000000) |> Enum.map(fn n -> :"fn_#{n}" end)
...(1)> end
no more index entries in atom_tab (max=1048576)

Crash dump is being written to: erl_crash.dump...done

Regardless, it still probably isn’t possible to generate a function with 10 million clauses even if you could get past the atom thing. In Absinthe we generate a LOT of functions and compile time starts becoming an issue orders of magnitude below 10 million.

9 Likes

Ahh, this is crucial. How long does it take to compile? 2 hours? 3 hours? maybe a day?

If all goes well we will only need to compile this file once and then use it as is forever. What is your longest compile time and how many functions do you think you have? (a estimation would be great, nothing precise is needed).

The default limit for atoms is 1,048,576 (see here). You could increase it by providing the +t n option, e.g. iex --erl "+t 20000000".

However, 10 million functions seem quite extreme, so not sure if you’ll bump into some other problems. Perhaps considering other options, such as multiclauses, or statically generated maps, or some such would work better. In any case, I’m curious to read more about your experiences with it.

5 Likes

Absinthe experienced non linear compilation time that made having a huge number of functions impossible. However, that version of Absinthe also had a lot more going on in the module body than just defining functions, and this seemed to have a large effect on compilation time.

Doing:

:observer.start()

defmodule PocManyClauses do
  @list 1..10_000_000 |> Enum.map(fn n -> :"fn_#{n}" end)

  Enum.map(@list, fn n ->
    def unquote(n)(), do: unquote(n)
  end)
end

and then compiling via elixir --erl "+t 20000000" huge.ex indicates you also will run into a memory problem:

Notably, compilation does not seem to be generating atoms, as the atom count remains steady.

In any case, generating hard coded modules as a lookup table has its uses, but for your situation (Complexity of a search withing a range in a ets? - #9 by benwilson512) it feels like a proper data structure or data storage engine (ets) will be better.

3 Likes

Very accurate!

We are already running into memory problems - to “fix” this we are now compiling the code in a machine with over 30GB of RAM. So far it has stabilized around 25GB so we are waiting.

The problem is that using an interval tree will give a complexity of O(log n + m ). With an ETS table we will have (best case scenario) O(log n).
But with a module with multiclause functions, we will have O(1).

We are talking about tens of thousands of requests per second, going up to hundreds of thousands during Christmas and similar events. For us, speed really does matter.

If we power through the problem of atoms and compilation time, what disadvantages would this approach have compared to a slower alternative? (such as ETS or interval trees).

Is this true? The O() behaviour of pattern matching depends on the complexity and kind of pattern. For integer literals and atom literals (which boil down to integers) it’s my understanding that the beam can produce O(1) lookup tables. You’re trying to generate range check clauses though and I’m pretty sure that the theoretical best for that is O(log(n)).

More to the point, the O here doesn’t really matter cause you have a fixed size lookup space. What you need to determine is how fast a given lookup is, and how many lookups the system can handle concurrently. If that meets your needs then the O doesn’t really matter. If it’s too slow then yeah, looking for something with a better O can be a good guide, but I’d make sure your problem space actually permits O(1).

8 Likes

Are your intervals regular? (Such that you could “round down”, then use a hash lookup?)

The intervals are not regular afaik, but that would be a cool idea, I admit!

Do you really mean 10 million functions or one function with 10 million clauses?

I think it would be useful if you could provide an example more similar to how your real code would look like.

3 Likes

If you’re mapping a value to another value, you can consider building a compile-time map. Here’s a sketch:

defmodule Test do
  @squares 1..10_000_000
           |> Enum.map(&{&1, &1 * &1})
           |> Map.new()

  def square(x), do: Map.fetch!(@squares, x)
end

Testing on my machine, it takes about 2 minutes to compile this thing. Peak memory usage during compilation is about 10 gb, and the generated beam is about 200 mb, which is also roughly the memory overhead when the module is loaded at runtime. Invoking Test.square/1 takes around 5 microseconds (after the module has been loaded). This huge map should reside in a constant pool, so you should be able to safely access it from multiple processes without creating extra copies of it.

6 Likes

So unless you have the special case of a bounded range with known discrete values, I think you’re stuck with O(log(n)) no matter where the dispatch is done. If you do have that special case, you could build the complete hash table for all values. If not, something like a B-tree would be most efficient, but I think you’d need a NIF to do that well.

We tested 2 scenarios:

  1. Module with 100_000 functions
  2. Module with 1 function that has 100_000 clauses

Since we decreased the order of magnitude, both samples worked. The second test (1 function with many clauses) was faster but the main point is that both work.

So, picking on the results of the second test, we increased the order of magnitude by 1 ( 1_000_000 ). Instead of taking a minute or so to compile, it puts the RAM on fire and stays compiling forever.

It should have taken 10 times more time to compile, but instead it seems it just gets stuck. We are now testing the following approach: Several modules, each with a function containing 100_000 clauses.

Will let you guys know how it goes.

Hi, we encountered a similar situation and the solution we went for was to define functions in dynamically generated modules (Foo_132).

The sweet spot for our use case was around 800 function clauses per module/file. At the same time we kept track of where each clause was defined, and created a central module that delegated to the dynamically generated ones, to give ourselves a sane interface. The final generated code looked something like that

defmodule Foo do
  def foo(132), do: Foo_132.foo(132)
end
4 Likes

I don’t really know how to answer to this in a reasonable way, but why would modules with 10,000,000 functions be the appropriate tool for the job?

Particularly if you need special build machines just to accommodate potentially compiling the thing and keep running into limitations of the runtime environment you’re using, I would be led to believe that you need a different tool than the one that keeps breaking and failing to do what you want it to do.

Is there a different approach that you could take, or maybe you found a problem that’s just not a good fit for the VM? This sounds like a nightmare.

12 Likes

Yeah, fair to ask why not a tree of anon functions–two reasons I can think of: a belief that compiler machinery will result in something that dispatches faster, or a need for fast startup times…

The compiler machinery does result in something that dispatches faster – but you are hitting its limits.

I’d go for what @bottlenecked did: use a tree with a quick dispatch mechanism. If you insist on using Erlang/Elixir for this task then that’s what’s going to work best. Have 500-1000 functions per module and you should be set.

2 Likes

I really second this.

If no “normal” data structure can satisfy your needs maybe it’s time to investigate other tools?

This small component seems super critical from what you say - have you looked at doing it in Rust/something bare metal you could then call out to via rustler/NIFs?

1 Like

This is a good question. We are dealing with searching for IPs within given ranges, so we have a couple of options:

  1. Put the ips into a DB and make requests to such DB
  2. Have this information in another tool/binary and have Elixir interact with it

Solution 1 has long query times for our needs. No only that, it also adds uncertainty (connections to the DB can fail) and an extra technology to our stack which we will have to maintain on the long run.

Solution 2 is better in that the query times can be dozens of times faster (no network traffic, no need to ask for a DB engine to perform a query) but it would force us to use the Port module to communicate with the given extra binary and it would still force us to keep and update a different technology on the long run.

Having this in Elixir would offer the following benefits:

  1. faster access than previous solutions
  2. we keep the stack we are already using small and consise
  3. easy to maintain and use

Yes, compilation is an issue, but we only need to do it once and then we just add this as a project dependency to whoever needs using (we have something like private Hex for our libraries and packages, so we have some level of versioning and releases ).

We are working on the assumption several modules, each with thousands of functions, would give us faster results than using the appropriate data structure for the problem (interval trees). How do we know this is the case? We don’t, we simply assume, based on our current knowledge, this will beat the complexity of having an interval tree. We could be totally off, but we won’t know until we benchmark it.

And as I previously stated, we want to keep our technology stack small. This is critical part of our system that we will have to maintain in years to come. Rust and NIFs look cool but if we don’t have the personal to keep it oiled and running we will run into issues. It is, in the end, a company decision, which I understand.

In the end even the modules with their functions is also some kind of data structure that just points into the right spot. One could probably build this by hand or attempt the same thing in another language and see if it fares better at the task than elixir.

That’s the key point here imo. Even for a smaller/somewhat manageable number I’d advise to write benchmarks to see how different solutions fare.

I obviously don’t know your system, my assumption is that it’s a small performance relevant component of your system. If so, doing it in another language (be it via NIF, ports or HTTP calls micro service style) shouldn’t be too much of a hassle. My assumption also being that the code wouldn’t need to change too often. Rust, as an example, is very much focussed on backwards compatibility so it shouldn’t incur too much friction.

In the end it’s a trade off as always, rather a thing in language X that works or jump through a lot of hoops to maybe make it work on the BEAM?