Hi all,
I often find it tedious to parse CSV files. There are often complex rules for validating and transforming the data, matching columns by their indexes or header values, etc. Sometimes there is also a need to map one row in the CSV file into multiple elements in the output list, or to skip some rows or fields entirely based on their values. On top of that the resulting structure may have to be a nested map (in case of associations).
That’s why I created an Elixir library DataQuacker
which features a simple DSL to describe the output structure along with how it should be mapped from the source and all the validation and transformation rules for each field or row.
You can find the library at Github and the docs at Hexdocs
To get a glimpse of the DSL, take a look at this relatively simple schema example from the docs:
defmodule StudentsSchema do
use DataQuacker.Schema
schema :students do
field :first_name do
source("first name")
end
field :last_name do
source("last name")
end
field :age do
transform(fn age ->
case Integer.parse(age) do
{age_int, _} -> {:ok, age_int}
:error -> {:error, "Invalid value #{age} given"}
end
end)
source("age")
end
field :favourite_subject do
validate(fn subj -> subj in ["Maths", "Physics", "Programming"] end)
source("favourite subject")
end
end
end
There are many more features, like arbitrarily nesting fields, matching columns with regex and custom functions, skipping rows, outputting multiple rows from one source row, injecting support data to validators, transformers, etc., using metadata to give error messages with detailed information about where an error occurred, and so on.
All of those are described in the docs along with examples and detailed explanations.
Another nice thing about the DSL is that it will give the user helpful errors at compile time if something is not right.
For now only CSV files and Elixir data is supported as the source out of the box, but anyone can write an adapter for the source they want to use, e.g. Google Sheets. It’s easy to write one, to learn more take a look at the Adapter behaviour in the docs).
Any feedback is greatly appreciated. Please tell me what you think, especially if you find any bugs, missing functionality or unclear / incomplete documentation.