I don’t know about links and I am not a computer scientist by education but as a seasoned programmer I can offer you the following explanations.
— Eager loading —
Everything gets loaded in memory right away. Example 1:
"~/Downloads/countries.json"
|> File.read!()
|> CSV.parse_string()
|> do_stuff_with_records()
File.read!
loads the entire file into a single binary (string) and then passes it to CSV.parse_string
which also works with an entire binary. That’s eager loading – everything is there to be used in one go.
Example 2:
list_of_1000_elements
|> Enum.map(& &1 * 3)
|> Enum.map(& &1 / 7)
|> Enum.filter(& &1 / 2 == 0)
Here you don’t just work with one list of 1000 elements; you work with four of them in total since each Enum.map
or Enum.filter
produces a new list that’s also loaded entirely in memory. Meaning the memory for all four lists will be allocated and used until they are thrown away.
Something like 95% of the time you don’t care and it’s fine. But every now and then this is an awful idea because you don’t know beforehand how many records do you have to process in advance. Which brings us to…
— Lazy loading —
Stuff that you work with gets loaded in memory in chunks / batches. You never load the entire thing in memory.
Let’s take the Example 1 from above and turn it into lazy-loading code.
"~/Downloads/countries.json"
|> File.stream!()
|> CSV.parse_stream()
|> do_stuff_with_records()
Notice how we replaced File.read!
with File.stream!
and CSV.parse_string
with CSV.parse_stream
. You should read quickly on these functions but basically they operate with an Enumerable
that allows them to pull data on demand (in batches). OK, maybe not the best example because you have to defer to documentation for an external library so let’s go to Example 2:
list_of_1000_elements
|> Stream.map(& &1 * 3)
|> Stream.map(& &1 / 7)
|> Stream.filter(& &1 / 2 == 0)
|> Enum.to_list()
The Stream
functions are usually identical with those with the same names from Enum
and do the same thing, only they never operate with the entire list given. In this case you only work with two lists in total: the original one and the resulting one which is produced by feeding a stream to Enum._to_list
(NOTE: you can merge the last two steps by just doing Enum.filter(& &1 / 2 == 0)
and it will have the same effect, but I opted for slightly longer code for illustrative purposes).
The very good thing about this approach is that the original list doesn’t even have to be loaded into memory as well. Example 3 and that one is much closer to real-life scenarios:
{:ok, list_of_results} =
Repo.transaction(
fn ->
an_ecto_query
|> Repo.stream(max_rows: 1000)
|> Stream.map(...)
|> Stream.filter(...)
|> Stream.flat_map(...)
# etc. processing steps for each record
|> Enum.to_list()
end,
timeout: :infinity
)
I and many others have successfully used code like the above to process dozens of millions of DB records, while the code never loads more than 1000 at the same time.
Now this is not super formal or strictly adhering to the scientific definitions, surely, but is more like an answer to the question: “What does eager / lazy loading means when programming [in Elixir]?”.
TL;DR – it’s usually a protection from bursty memory loads. And it can sometimes slow down a competing Enum
implementation if you go too micro (on my machines I never use Stream
unless I have to operate with more than 3000-4000 records at a time).