Hi Folks,
Part of writing “good” functional code is to separate pure code from impure code. I’ve heard this many times, the latest being in Tomasz Kowal’s talk at ElixirLive2016.
Many libraries, such as Ecto
do exactly this. We feed data - i.e., changeset
, a struct, etc - into our Repo, and it is our Repo that performs side-effects with that data.
I tried following that idea in an application I am writing, but I ended up with side-effects in different areas of my code.
For example, in one use case, my end-user uploads a CSV file to the application. The requirements are for the application to parse the CSV file, filter the improper records, and then do something with the remaining records. However, if a record is filtered out, I want that to log that that is the case. Glossing over the parsing of the file, the code basically looks like this:
defmodule CSV.Parser do
def process_csv_file(%{path: file}) do
file
.
.
.
|> Enum.filter(&ensure_consistent_data/1)
.
.
|> do_stuff_with_good_records()
end
def ensure_consistent_data(%{employee_number: ""} = data) do
Logger.info "Employee lacks employee_number: #{inspect data}"
false
end
def ensure_consistent_data(%{employee_type: "doctor", license_number: ""} = data) do
Logger.info "Doctor lacks license number: #{inspect data}"
false
end
def ensure_consistent_data(_), do: true
end
(This is not my actual domain but the use case is the same). The key point being that the ensure_consistent_data/1
function has a side-effect, while other transformations of the data (not shown) do not. So I’m mixing my pure code and my impure code.
How would you folks with more experience change this code in order to separate out the pure and impure parts? Would you even do it in a case like this?
My Naive Solution Off the cuff, I would probably have ensure_consistent_data
return a map, or some other collection, with one field being a list of errors that I would like logged, and another field being a list of the “goods” rows. Then, after going through all the functions in the pipeline that don’t have side-effects, I’d take those list of errors and invoke another function, maybe in another module to make the separation clearer, to create the side-effects. Because I don’t like changing the type of the data in the middle of a pipeline, I’d probably end up not exctracting the file
variable from the map, and changing all the functions in the pipeline to accept a map. This starts feeling a lot like a Plug.Conn
, where our entire universe of what we need is in one data structure. I guess this goes a little into the data modeling aspect of things (as in the shape of the data, nothing to do with the database yet).