Reusing compile time code

Background

I have to read a CSV and currently this is happening at compile time as the function runs in a module attribute:

# Imagine this csv file has 3 columns "sport, country, league"
@csv_sports_data 
  :my_app
  |> :code.priv_dir()
  |> Path.join("awesome_csv.csv")
  |> File.stream!()
  |> CSV.decode!(headers: false, separator: ?;)
  |> Stream.map(&List.to_tuple/1)
  |> Enum.uniq()

So now, because this runs at compile time (iirc) I have a variable with the data I need in tuple format. So far so good.

Problem

The problem comes when I need to do the same thing, multiple times, with small variations:

# The duplication, IT BURNS !!!

# Imagine this csv file has 3 columns "sport, country, league"
@csv_sports_data 
  :my_app
  |> :code.priv_dir()
  |> Path.join("awesome_csv.csv")
  |> File.stream!()
  |> CSV.decode!(headers: false, separator: ?;)
  |> Stream.map(&List.to_tuple/1)
  |> Enum.uniq()

@sports 
    :my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!()
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq()
    # Always trim data from pesky users!
    |> Stream.map(
      fn {sport, country, league} ->
        {String.trim(sport), String.trim(country), String.trim(league)} 
      end)
    |> Stream.map(fn {sport, _country, _league} -> sport end)
    #No empty sports!
    |> Enum.filter(fn sport -> sport != "" end) 

  @countries 
    :my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!()
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq()
    # Always trim data from pesky users!
    |> Stream.map(
      fn {sport, country, league} ->
         # Always trim data from pesky users!
        {String.trim(sport), String.trim(country), String.trim(league)} 
      end)
    |> Stream.map(fn {_sport, country, _league} -> country end) 
    # We allow empty countries to make the example interesting

As you can see, I have a lot of duplicated code. At the very least I could place

:my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!()
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq()

Into a function or variable and then re-use it in @sports and countries. The trimming function is also another candidate. And then there are the little differences for @sports and @countries where I select only the values I want.

Things I tried

So, my first try was to use the @csv_sports_data inside the @sports and @countries attributes. Obviously this didn’t work, as I can’t use something that was not yet compiled into an attribute that is itself being generated at compile time.

# This wont work
@sports 
   @csv_sports_data
    # Always trim data from pesky users!
    |> Stream.map(
      fn {sport, country, league} ->
        {String.trim(sport), String.trim(country), String.trim(league)} 
      end)
    |> Stream.map(fn {sport, _country, _league} -> sport end)
    #No empty sports!
    |> Enum.filter(fn sport -> sport != "" end) 

My second try was to consider Macros. According to my understanding, I could create a Macro that reads the CSV file at compile time and then have @sports and @countries use it. However, I personally am a believer of the saying:

“The first rule about Macros - don’t use Macros”

And I feel the usage of a Macro for this specific situation would be quite overkill. So I would like to avoid it.

And then there is also the trim function:

Stream.map(
      fn {sport, country, league} ->
         # Always trim data from pesky users!
        {String.trim(sport), String.trim(country), String.trim(league)} 
      end)

Which I cannot place inside a def or defp for the sake of reuse.

What now?

Surely I am missing something. Perhaps the solution I was given to work with the CSV is flawed, or perhaps I am forgetting some mechanism that would reduce the amount of duplicated code I have.

  • How can I remove all the duplication?
1 Like

You can operate on values (not module attributes) in a module body, do some computations and only then assign them to attributes:

csv_sports_data = :my_app
  |> :code.priv_dir()
  |> Path.join("awesome_csv.csv")
  |> File.stream!()
  |> CSV.decode!(headers: false, separator: ?;)
  |> Stream.map(&List.to_tuple/1)
  |> Stream.uniq()
  |> Enum.map(fn {sport, country, league} ->
    {String.trim(sport), String.trim(country), String.trim(league)}
  end)

  @csv_sports_data csv_sports_data

And re-use the already computed value to declare other module attributes:

  @sports csv_sports_data
  |> Enum.map(fn {sport, _country, _league} -> sport end)
  |> Enum.filter(fn sport -> sport != "" end)

  @countries csv_sports_data
  |> Enum.map(fn {_sport, country, _league} -> country end)
4 Likes

You can also move logic to other module that will become a dependency so it will get compiled earlier, where you can split your logic in functions however you like, for example:

defmodule SportsCsvReader do
  def read_sports_data() do
    :my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!()
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq()
    |> Enum.map(fn {sport, country, league} ->
      {String.trim(sport), String.trim(country), String.trim(league)}
    end)
  end

  def extract_sports(sports_data) do
    sports_data
    |> Enum.map(fn {sport, _country, _league} -> sport end)
    |> Enum.filter(fn sport -> sport != "" end)
  end

  def extract_countries(sports_data) do
    sports_data
    |> Enum.map(fn {_sport, country, _league} -> country end)
  end
end

and then you’ll be able to use it directly in your other module:

  csv_sports_data = SportsCsvReader.read_sports_data()
  @csv_sports_data csv_sports_data
  @sports SportsCsvReader.extract_sports(csv_sports_data)
  @countries SportsCsvReader.extract_countries(csv_sports_data)

edit: or re-use this logic in any other module

3 Likes

I think what you are trying to achieve defining attributes is what you should be doing but defining macros.

Using a macro to optimize things that can be calculated at compiled time is a perfectly use for a macro.

You just need to figure out what is available at compile time and what is not, reuse and move into functions the rest.

in addition if calculating countries and sports is an expensive operation, build a really simple caching system with ETS.
I don’t think you can get any faster than this :wink:

exactly! but i would convert read_sports_data/0 into a macro

That‘s not really needed. Macros would only make things more complex, as at no point AST has to be modified.

3 Likes

Does this work for Elixir 1.5?

Currently this code is not working:

  csv_data =
    :my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq
    # Always trim data from pesky users!
    |> Enum.map(fn {sport, country, league} ->
      {String.trim(sport), String.trim(country), String.trim(league)}
    end)

  @compiled_csv_data csv_data

  @sports
    @compiled_csv_data
    |> Stream.map(fn {sport, _country, _league} -> sport end)
    |> Enum.filter(fn sport -> sport != "" end)

  # I suck at sports, so I need to know which ones I can play!
  def hard_sport?(sport) when sport in @sports, do: false

Error

== Compilation error in file lib/my_app/csv_reader.ex ==
** (ArgumentError) invalid args for operator “in”, it expects a compile-time list or compile-time range on the right side when used in guard expressions, got: nil

I believe this happens because of csv_data, which is not executed at compile time (I think).

You must remove the = here. This is a common mistake I do all the time :smiley:

I agree with @LostKobrakai , you don’t need macros here as you are not creating code by generating AST.

You should create a helper module specialized in reading your CSVs with all the required variations and parameters.
Those functions will be available at runtime, obviously. Then, in your main module you would require this helper module, so you can also call the functions at compile time.

1 Like

Yes, as mentioned by @lud you need to fix the module attribute declaration. The error message is not intuitive, though!

I believe you don’t need to require a module to use its functions at compile time :slight_smile:

Sorry, that was a mistake on my side when creating the reply, the code itself does not suffer from this ailment.
(meaning the error you see is the error I get without the =)

Is the only option here to create a helper module ?

You can use anonymous functions as well, but a module cannot use it’s own functions at compile time, as those functions are not available before the complete module is compiled. It’s a classic chicken-egg type problem.

1 Like

@Fl4m3Ph03n1x Another issue in your code is that you can’t do

@sports
  @compiled_csv_data
  |> ...

you need to do

@sports
  csv_data
  |> ...

And turns out that in Elixir 1.5 also when declaring the @sports module attribute, to be able to use it in guards you need to do it in two steps. This works:

  sports =
    csv_data
    |> Stream.map(fn {sport, _country, _league} -> sport end)
    |> Enum.filter(fn sport -> sport != "" end)

  @sports sports

  def hard_sport?(sport) when sport in @sports, do: false
1 Like

Yes I checked, it is indeed only needed for macros.

Sorry, you are right.
It is absolutely not needed, you can get away with it by storing the value at compile time in an attribute and invoke it inside a function.

Something like this based on Michal’s code.

defmodule SportsCsvReader do
  @sports_data         :my_app
    |> :code.priv_dir()
    |> Path.join("awesome_csv.csv")
    |> File.stream!()
    |> CSV.decode!(headers: false, separator: ?;)
    |> Stream.map(&List.to_tuple/1)
    |> Stream.uniq()
    |> Enum.map(fn {sport, country, league} ->
      {String.trim(sport), String.trim(country), String.trim(league)}
    end)

  def sports_data(), do: @sports_data

  def extract_sports(sports_data) do
    sports_data
    |> Enum.map(fn {sport, _country, _league} -> sport end)
    |> Enum.filter(fn sport -> sport != "" end)
  end

  def extract_countries(sports_data) do
    sports_data
    |> Enum.map(fn {_sport, country, _league} -> country end)
  end
end

My point is that you only need to parse the CSV once, and that is at compile time.

1 Like