Eager vs Lazy Loading? (Elixir getting started guide)

miguelszerman · December 5, 2022, 3:16pm

In the Enumerable chapter of the Elixir getting started guide there is a section called Lazy vs Eager. However, it never explains what Lazy or Eager means in the Elixir context.

There is an external mini-explanation on educative.io:

Enum, being eager, produces a whole list of numbers after each operation in the script until the result is reached. Conversely, Stream, being lazy, creates a stream that represents a function without executing it straight away.

A better explanation with the trade-offs, common use cases, and links to further resources would be nice to include in the guide.

dimitarvp · December 5, 2022, 5:11pm

I don’t know about links and I am not a computer scientist by education but as a seasoned programmer I can offer you the following explanations.

— Eager loading —

Everything gets loaded in memory right away. Example 1:

"~/Downloads/countries.json"
|> File.read!()
|> CSV.parse_string()
|> do_stuff_with_records()

File.read! loads the entire file into a single binary (string) and then passes it to CSV.parse_string which also works with an entire binary. That’s eager loading – everything is there to be used in one go.

Example 2:

list_of_1000_elements
|> Enum.map(& &1 * 3)
|> Enum.map(& &1 / 7)
|> Enum.filter(& &1 / 2 == 0)

Here you don’t just work with one list of 1000 elements; you work with four of them in total since each Enum.map or Enum.filter produces a new list that’s also loaded entirely in memory. Meaning the memory for all four lists will be allocated and used until they are thrown away.

Something like 95% of the time you don’t care and it’s fine. But every now and then this is an awful idea because you don’t know beforehand how many records do you have to process in advance. Which brings us to…

— Lazy loading —

Stuff that you work with gets loaded in memory in chunks / batches. You never load the entire thing in memory.

Let’s take the Example 1 from above and turn it into lazy-loading code.

"~/Downloads/countries.json"
|> File.stream!()
|> CSV.parse_stream()
|> do_stuff_with_records()

Notice how we replaced File.read! with File.stream! and CSV.parse_string with CSV.parse_stream. You should read quickly on these functions but basically they operate with an Enumerable that allows them to pull data on demand (in batches). OK, maybe not the best example because you have to defer to documentation for an external library so let’s go to Example 2:

list_of_1000_elements
|> Stream.map(& &1 * 3)
|> Stream.map(& &1 / 7)
|> Stream.filter(& &1 / 2 == 0)
|> Enum.to_list()

The Stream functions are usually identical with those with the same names from Enum and do the same thing, only they never operate with the entire list given. In this case you only work with two lists in total: the original one and the resulting one which is produced by feeding a stream to Enum._to_list (NOTE: you can merge the last two steps by just doing Enum.filter(& &1 / 2 == 0) and it will have the same effect, but I opted for slightly longer code for illustrative purposes).

The very good thing about this approach is that the original list doesn’t even have to be loaded into memory as well. Example 3 and that one is much closer to real-life scenarios:

{:ok, list_of_results} =
  Repo.transaction(
    fn ->
      an_ecto_query
      |> Repo.stream(max_rows: 1000)
      |> Stream.map(...)
      |> Stream.filter(...)
      |> Stream.flat_map(...)
      # etc. processing steps for each record
      |> Enum.to_list()
    end,
    timeout: :infinity
  )

I and many others have successfully used code like the above to process dozens of millions of DB records, while the code never loads more than 1000 at the same time.

Now this is not super formal or strictly adhering to the scientific definitions, surely, but is more like an answer to the question: “What does eager / lazy loading means when programming [in Elixir]?”.

TL;DR – it’s usually a protection from bursty memory loads. And it can sometimes slow down a competing Enum implementation if you go too micro (on my machines I never use Stream unless I have to operate with more than 3000-4000 records at a time).

josevalim · December 5, 2022, 5:37pm

Your description is great. Can you please submit a PR to add it to the guides?

dimitarvp · December 5, 2022, 5:43pm

Swamped with work and this post was a bit of an anxious procrastination, admittedly.

I promise I’ll find a time slot in the next several days and will PR this – do you mind references to external libraries, or you are OK with them?

josevalim · December 5, 2022, 7:55pm

Apologies for the confusion, your description was great. However, I was eyeing @miguelszerman’s summary, because it is small and therefore a perfect fit for an introductory guide. Good news is that it is less work on your plate!

dimitarvp · December 5, 2022, 8:06pm

Hahaha. I got greatness-blocked!