tme_317

tme_317

Very slow compiles loading large data as module attributes for escript

I have an escript CLI application where I am processing large lists (in the millions) of street addresses and matching them against government databases of valid zip codes, city names, counties, and states. To get great matching performance I’ve converted these databases into maps such as:

%{"47401" => ["BLOOMINGTON,IN", "WOODBRIDGE,IN"], "47402" => ["BLOOMINGTON,IN"], ...

and

%{"BLOOMINGTON,IN" => %{county: "MONROE", fips: "18105", lat: "39.165325", long: "-86.5263857"}

Then saved the maps as .etf files in the /priv directory.

When processing the millions of addresses I run a lot of code such as Map.get(zip_city_map(), "47401") to validate the city name matches. It’s very fast and works perfectly!

I don’t mind loading them into memory for good runtime performance (at the expense of compile-time performance) and also this is a command-line escript app (which can’t read /priv at runtime) so I think I have to load the lookup tables into compiled .beam files like this:

defmodule ZipCityData do
  @external_resource Path.join(__DIR__, "../../../priv/zip_cities.etf")
  @external_resource Path.join(__DIR__, "../../../priv/city_states.etf")
  @external_resource Path.join(__DIR__, "../../../priv/gnis_civil_pop.etf")

  # 725kb file
  @zip_cities File.read!(Path.join(__DIR__, "../../../priv/zip_cities.etf")) |> :erlang.binary_to_term()
  def zip_city_map(), do: @zip_cities

  # 308kb file
  @city_states File.read!(Path.join(__DIR__, "../../../priv/city_states.etf")) |> :erlang.binary_to_term()
  def usps_city_state_map(), do: @city_states

  # 8.4mb file
  @gnis_civil_pop File.read!(Path.join(__DIR__, "../../../priv/gnis_civil_pop.etf")) |> :erlang.binary_to_term()
  def gnis_city_state_map(), do: @gnis_civil_pop
end

The problem is my compiles became very slow (from 2 secs to 30 secs). This is no big deal if it only happened when I change the ZipCityData module (very rarely) but it happens on recompile no matter what unrelated module in my project I edit. I’ve searched around and can’t find a better way to do it that works with compiled escripts.

Most Liked

LostKobrakai

LostKobrakai

If the data doesn’t have interdependencies then I’d suggest using multiple modules (each in it’s own file). This way you can leverage the compiler of elixir, which allows you to compile modules in parallel. You could still have one central module, which delegates to the actual implementations in the modules with the data compiled into.

Another step I’d consider here is actually compiling data not as a blob into the modules, but if possible compile it into multiple functions with different function heads, which could lessen the runtime load on iterating big chunks of data over and over again. You can look at ex_cldr for inspiration, which does exactly that with the cldr database.

E.g. for your zips compile into function like this:

@external_resource Path.join(__DIR__, "../../../priv/zip_cities.etf")
zip_cities = File.read!(Path.join(__DIR__, "../../../priv/zip_cities.etf")) |> :erlang.binary_to_term()
for {zip, city} <- zip_cities do
  def zip_city(unquote(zip)), do: unquote(city)
end
NobbZ

NobbZ

As far as I remember, mix escript.build requires to strip the modules as well as compressing them every time into the binary. Depending on the physical size of the modulefiles this can of course take a long time.

As a rule of thumb, building the escript will always take longer than tarring the _build/$env/lib/*/ebin folders.

tme_317

tme_317

Thanks for the advice… very good idea to put each data file in a different module to leverage the parallel compiler!

I’ve seen threads on here talking about generating thousands of functions in the same module which I thought were interesting but I’ve never benchmarked that. If I get some time I can compare that to my Map.get based approach. One of my maps has ~44,000 keys (zip codes in US) and another has ~180,000 keys (unique city/st in US) and the Map.get performance is still quite good.

I solved my initial problem by extracting the ZipCityData into a separate umbrella app/library and including it as a dependency. It rarely changes so my IEx recompiles/tests when not changing that module are super fast again. Of course mix escript.build is slow but that’s OK as I don’t release new versions very often.

Where Next?

Popular in Questions Top

greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
electic
Hi, I am new to Elixir. I am trying to use the DateTime component to insert a date into MySQL however the there seems to be no way to fo...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
vrod
I am using the Starship cross-shell prompt – it seems pretty nice, but I get some errors: [WARN] - (starship::utils): Executing command ...
New
alice
Hey, Just curious what are the main benefits of Elixir compared to Clojure? When is Elixir more useful than Clojure and vice versa? Th...
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
nobody
Hi! In PHP: $SERVER['SERVERADDR'] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
sergio_101
I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor...
New
JDanielMartinez
Hi! May someone helps me, please! I have two apps into an umbrella project: the first one is Database, which manages queries, and the se...
New

Other popular topics Top

chrismccord
As promised, the first release candidate of Phoenix 1.3.0 is out! This release focuses on code generators with improved project structure...
New
msaraiva
Surface is an experimental library built on top of Phoenix LiveView and its new LiveComponent API that aims to provide a more declarative...
564 43591 214
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
grych
Hi folks, Few months ago I have announced the proof-of-concept of the library to manipulate the browsers DOM objects directly from Elixi...
639 52238 488
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New
AstonJ
Seen any cool LiveView demos, sample apps or examples? Please post them here! :003:
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New
lanycrost
Hi everyone! I need implement if…else if…else condition from my elixir code, and anymore of this control flow structures not work proper...
New

We're in Beta

About us Mission Statement