NimbleCSV - a small and fast CSV parsing and dumping library for Elixir

Hello everyone,

We have just released NimbleCSV which is a small and fast CSV parsing library for Elixir. It allows developers to define their own parsers so we can rely on binary patterns for efficiency reasons. It also supports dumping and data streaming. We hope this will be an excellent companies along side the efforts we have put on GenStage (and GenStage.Flow):

Docs: https://hexdocs.pm/nimble_csv
Source: https://github.com/plataformatec/nimble_csv

17 Likes

Amazing, how do you find the time to do everything? :slight_smile:

3 Likes

Initial results indicate that NimbleCSV is 10x faster than CSV when parsing from a file stream, and 10x faster than ExCSV when parsing a literal string (ExCSV doesn’t do streams, CSV doesn’t do raw strings).

Will post full results in a bit, along with CSV writing benchmarks.

2 Likes

Oh that is great timing! I’m just about to need to parse CSV files from another system. :slight_smile:

1 Like

I tried looking what would be the performance difference if the separators weren’t bound at compile-time. The difference looks to be more-or-less 1.6x, which is quite surprising - i though it would be much worse.

The code is here: https://github.com/michalmuskala/nimble_csv/tree/no-macro-magic

While the difference for decoding is significant, the difference for encoding seems to be completely negligible.

1 Like

Just looking at how to use it, it seems you build a module via: https://github.com/plataformatec/nimble_csv/blob/master/lib/nimble_csv.ex#L98

I am curious if there is planned to be a way to use it into an existing module to turn that module into a parser? It would be useful to add helper functions and parsing module all in one without needing to delegate functions otherwise?

Yeah the Beam VM has a lot of interesting things like that. I ended up making a math module about 6 years ago that got pretty fast after a lot of testing, a lot of weird things ended up being fast, and compile-time generation of structures was absolutely necessary for speed.

1 Like

It seems like a relatively bad idea. Each time you would use it you’d generate all those functions leading to code bloat and long compilation times.

1 Like

True, though how many projects have many different styles of CSV parsing all at the same time?

1 Like

But that’s what the library does now anyway, it’s just hidden behind another macro. Using use would improve composability and you wouldn’t have to pass in the moduledocs as an option.

2 Likes

I think I misunderstood the initial question. Yes I think it might make sense to provide a “more traditional” use-based interface.

1 Like

You have no idea… Never worked a lot with Perl dev haven’t you ?

Sadly I have, but still the type of parser separators tends to be limited to a few, not hundreds. ^.^

Can I dynamically define Parser depending on multi-tenant clients. Since each client can have same or different separator/delimiter etc?

@pkrawat1: Yes, you can!
I suggest you write something like:

defmodule MyApp do
  def parser_for(name) do
    camelized_name = Macro.camelize(name)
    Module.concat(MyApp.Parsers, camelized_name)
  end
end

# ...

parsers = [] # ...
for {name, separator, escape} <- parsers do
  parser = MyApp.parser_for(name)
  NimbleCSV.define(parser, separator: "\t", escape: "\"")
end

# ...

parser_name = "parser_name"
parser = MyApp.parser_for(parser_name)
parser.parse_string "..."
1 Like