How to parse CSV into typed terms?

Hi!

I’d like to parse a CSV rows into typed elements, without knowing what is the schema (column order) of the CSV.

In ruby the CSV library will parse strings into types. In Elixir, both csv and nimble_csv libraries produce string values, and the caller is responsible for casting them into other types (integer, decimal, date-times).

For integer and decimal columns, I can detect if they contain just numbers and decimal point – with regexp or maybe binary matching?
For dates I have no idea how to proceed. I’m aware that date time format is not specified by rfc4180 – files I am parsing have dates in format: “2016-12-01 10:36:29.772554”.

What would you recommend? I am surprised this seemingly simple problem doesn’t have an obvious solution!

m.

Probably because it’s only “seemingly simple”. I can remember cursing at PHP’s strtotime a few times, because parsing dates without knowing the format is hardly more then guesswork and hoping for the best. If you know the format though you should be able to use timex to parse it.

5 Likes
iex> Timex.parse!("2016-12-01 10:36:29.772554", "%Y-%m-%d %H:%M:%S.%f", :strftime)
~N[2016-12-01 10:36:29.772554]

Not sure if you are receiving datetimes formatted in other ways but this should sort you up as a start.

Am I missing something?

iex(1)> NaiveDateTime.from_iso8601("2016-12-01 10:36:29.772554")
{:ok, ~N[2016-12-01 10:36:29.772554]}

That date can then be “enhanced” with DateTime.from_naive/3.

Nope, you aren’t missing anything. I simply prefer being explicit – plus I can never remember what passes for which standard.

I guess if you already need Timex for something else.

I’d be willing to massage the string a bit to be able to use the standard functionality and ditch a dependency - and ISO8601 has pretty good coverage (am/pm could be a bit awkward).