How Would You Separate Pure and Impure Code

StevenXL · February 5, 2017, 5:59pm

Hi Folks,

Part of writing “good” functional code is to separate pure code from impure code. I’ve heard this many times, the latest being in Tomasz Kowal’s talk at ElixirLive2016.

Many libraries, such as Ecto do exactly this. We feed data - i.e., changeset, a struct, etc - into our Repo, and it is our Repo that performs side-effects with that data.

I tried following that idea in an application I am writing, but I ended up with side-effects in different areas of my code.

For example, in one use case, my end-user uploads a CSV file to the application. The requirements are for the application to parse the CSV file, filter the improper records, and then do something with the remaining records. However, if a record is filtered out, I want that to log that that is the case. Glossing over the parsing of the file, the code basically looks like this:

defmodule CSV.Parser do
  def process_csv_file(%{path: file}) do
    file
    .
    .
    .
    |> Enum.filter(&ensure_consistent_data/1)
    .
    .
    |> do_stuff_with_good_records()
  end

  def ensure_consistent_data(%{employee_number: ""} = data) do
    Logger.info "Employee lacks employee_number: #{inspect data}"
    false
  end

  def ensure_consistent_data(%{employee_type: "doctor", license_number: ""} = data) do
    Logger.info "Doctor lacks license number: #{inspect data}"
    false
  end

  def ensure_consistent_data(_), do: true
end

(This is not my actual domain but the use case is the same). The key point being that the ensure_consistent_data/1 function has a side-effect, while other transformations of the data (not shown) do not. So I’m mixing my pure code and my impure code.

How would you folks with more experience change this code in order to separate out the pure and impure parts? Would you even do it in a case like this?

My Naive Solution Off the cuff, I would probably have ensure_consistent_data return a map, or some other collection, with one field being a list of errors that I would like logged, and another field being a list of the “goods” rows. Then, after going through all the functions in the pipeline that don’t have side-effects, I’d take those list of errors and invoke another function, maybe in another module to make the separation clearer, to create the side-effects. Because I don’t like changing the type of the data in the middle of a pipeline, I’d probably end up not exctracting the file variable from the map, and changing all the functions in the pipeline to accept a map. This starts feeling a lot like a Plug.Conn, where our entire universe of what we need is in one data structure. I guess this goes a little into the data modeling aspect of things (as in the shape of the data, nothing to do with the database yet).

NobbZ · February 6, 2017, 6:31am

Seeing this example I am glad that it is possible to mix pure and impure code like this.

In a world like haskells, you were forced to scan and log in some corner of your code that is totally unrelated to your actual processing of the list.

Your naive solution has a big gotcha… You are collecting your messages at a certain stage of the pipe during a filter operation. The last item makes your process crash. All previously collected entries to log are lost, also you’d need to log the collected messages directly after the collecting operation, to recreate a list of items that you can process in your next step of the pipeline.

peerreynders · February 6, 2017, 4:06pm

Here is my naive observation.

Given the facilities that Elixir (Erlang) possesses, why not simply stick the logger in it’s own process and let it take the “responsibility and risks of logging” on its own terms. Technically sending that message is a “side effect” but for example in the Clojure community it is emphasized that there the benefits of functional programming aren’t realized via monads but because functional programming allows you to “push [mutable state and complexity] to the edge of the system” (twitter).

Topic:

NobbZ · February 6, 2017, 4:47pm

NO! Do not use logger from a centralised process! Logger has some magic to get rid of some unneeded calls during compile time, but when you send messages to a centralised process and then do call Logger from therein, then you will send that message anyway even if that logging level is purged during compile time usually.

peerreynders · February 6, 2017, 4:56pm

That’s a valid point - but you are basically highlighting that the facilities of conditional compilation and logging are actually separate concerns that should be available and used separately and not be conflated into one single capability in a distributed system.

StevenXL · February 6, 2017, 5:09pm

@NobbZ / @peerreynders

Thank you both for the replies. At the end of the day, I think that the main takeaway that I’m getting is that the current code is not a red flag.

I’ll be reading up on the material presented at this thread and this thread - thanks for pointing me in that direction.

peerreynders · February 6, 2017, 5:23pm

Just a personal comment, not trying to be dogmatic of anything - but if this was my own code, the manner in which the Logger is managed as a dependency would take some kind of explicit justification. You have to look at the source code to know that the module is using the Logger dependency. If process_csv_file took an explicit generic logging function (the module is free to decorate that function to its hearts content) as a parameter - well it would be more explicit about the dependency and less coupled as to what Logger actually is or how it operates.