I have an escript CLI application where I am processing large lists (in the millions) of street addresses and matching them against government databases of valid zip codes, city names, counties, and states. To get great matching performance I’ve converted these databases into maps such as:
%{"47401" => ["BLOOMINGTON,IN", "WOODBRIDGE,IN"], "47402" => ["BLOOMINGTON,IN"], ...
and
%{"BLOOMINGTON,IN" => %{county: "MONROE", fips: "18105", lat: "39.165325", long: "-86.5263857"}
Then saved the maps as .etf files in the /priv
directory.
When processing the millions of addresses I run a lot of code such as Map.get(zip_city_map(), "47401")
to validate the city name matches. It’s very fast and works perfectly!
I don’t mind loading them into memory for good runtime performance (at the expense of compile-time performance) and also this is a command-line escript app (which can’t read /priv
at runtime) so I think I have to load the lookup tables into compiled .beam files like this:
defmodule ZipCityData do
@external_resource Path.join(__DIR__, "../../../priv/zip_cities.etf")
@external_resource Path.join(__DIR__, "../../../priv/city_states.etf")
@external_resource Path.join(__DIR__, "../../../priv/gnis_civil_pop.etf")
# 725kb file
@zip_cities File.read!(Path.join(__DIR__, "../../../priv/zip_cities.etf")) |> :erlang.binary_to_term()
def zip_city_map(), do: @zip_cities
# 308kb file
@city_states File.read!(Path.join(__DIR__, "../../../priv/city_states.etf")) |> :erlang.binary_to_term()
def usps_city_state_map(), do: @city_states
# 8.4mb file
@gnis_civil_pop File.read!(Path.join(__DIR__, "../../../priv/gnis_civil_pop.etf")) |> :erlang.binary_to_term()
def gnis_city_state_map(), do: @gnis_civil_pop
end
The problem is my compiles became very slow (from 2 secs to 30 secs). This is no big deal if it only happened when I change the ZipCityData
module (very rarely) but it happens on recompile no matter what unrelated module in my project I edit. I’ve searched around and can’t find a better way to do it that works with compiled escripts.