Crude JSON parsing benchmarks for Elixir, Ruby, Golang

minhajuddin · May 24, 2018, 12:07pm

Benchmark large json parsing

A few crude benchmarks for Elixir, Golang and Ruby

Processor Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
OS: Ubuntu 16.04

10mb.json

Golang [jq]

$ time jq '.' data/10mb.json > /dev/null
real    0m0.567s
user    0m0.551s
sys     0m0.016s

Elixir

Time taken [Poison]: 1218.831ms
Time taken [Jason]: 508.461ms

Ruby

time ruby app.rb

real    0m0.220s
user    0m0.203s
sys     0m0.017s

100mb.json (sf-city-lots-json/citylots.json at master · zemirco/sf-city-lots-json · GitHub)

Golang [jq]

$ time jq '.' data/citylots.json > /dev/null

real    0m14.436s
user    0m13.992s
sys     0m0.420s

Elixir

  Time taken [Poison]: 32_640.87ms
  Time taken [Jason]: 11_602.128ms

Ruby

  $ time ruby app.rb

  real    0m4.738s
  user    0m4.498s
  sys     0m0.240s

amnu3387 · May 24, 2018, 12:49pm

Does this mean Ruby San, is the fastests fasters ?

(btw shouldn’t this be done with benchee or so?)

brightball · May 24, 2018, 12:51pm

Seems about right. Would be nice to have Jiffy in there as well just for comparison sake.

One important thing to consider when looking at those numbers: the Go and Elixir solutions are written in pure Go and pure Elixir respectively.

I don’t know what they’re using on the Ruby side but most efficient code in Ruby/Python/PHP/Perl is just directly calling C.

michalmuskala · May 24, 2018, 12:54pm

The Ruby parser is implemented in C, so it’s more like comparing C to Elixir. Obviously C is going to be faster We’re not doing that bad actually. It would be interesting to compare with Jason compiled with HiPE. In my benchmarks this makes it at least twice as fast.

The data is also quite different from what you’d face in a regular HTTP app - the JSON is pretty-printed, most JSON flying on the wire is not. This can have significant difference in the actual performance. Depending on what you want to learn from this, it can be important.

Shikada · May 24, 2018, 12:54pm

There’s exactly 0 chance that Ruby is not just calling some C lib for this
Still, for native Elixir this is reassuringly fast.

brightball · May 24, 2018, 1:48pm

Don’t suppose you’d be up for tweaking the Techempower benchmarks for Elixir to use Jason with HiPE. It’s the basis for the results of so much of those tests I think it would go a long way with the report.

OvermindDL1 · May 24, 2018, 5:38pm

Let’s have some fun, here is how they compare to C++/rapidjson:

# Golang
$ time jq '.' data/10mb.json > /dev/null

real    0m0.607s
user    0m0.603s
sys     0m0.004s

$ time jq '.' data/citylots.json > /dev/null

real    0m14.348s
user    0m13.839s
sys     0m0.504s

# Elixir
## 10mb.json
Time taken [Jason]: 521.655ms
Time taken [Poison]: 1358.531ms
## citylots.json
Time taken [Jason]: 12224.44ms
Time taken [Poison]: 33350.239ms

# Ruby
## 10mb.json
$ time ruby app.rb

real    0m0.350s
user    0m0.250s
sys     0m0.020s
## citylots.json
$ time ruby app.rb

real    0m5.632s
user    0m5.393s
sys     0m0.236s

# C++
$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/10mb.json 

real    0m0.035s
user    0m0.034s
sys     0m0.000s

$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/citylots.json 

real    0m0.531s
user    0m0.503s
sys     0m0.028s

Admittedly this benchmark is flawed because jq is outputting to stdout and elixir/ruby/C++ are just kind of blackholing the data after it is parsed, so jq/go is artificially limited here. In addition the elixir version is actually instancing a tree to hold the whole structure, which is wasted work as well (unsure about ruby). The C++ version is fully parsing and performing callbacks for every parse (standard sax parsing).

I can PR the C++ one in it if you want, it only needs the normal C++ compiler and cmake installed, nothing else needed (not even rapidjson, it acquires it itself).

OvermindDL1 · May 24, 2018, 5:45pm

Just to make sure, here is the C++ version as both a sax parser, and as an elixir-style-structure-building document parser (yay eating memory):

$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/10mb.json 

real    0m0.038s
user    0m0.034s
sys     0m0.004s

$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/citylots.json 

real    0m0.529s
user    0m0.485s
sys     0m0.044s

$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/10mb.json 

real    0m0.037s
user    0m0.036s
sys     0m0.000s

$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/citylots.json 

real    0m0.533s
user    0m0.504s
sys     0m0.028s

Not much of a difference, honestly the C++ compiler is so good that it is probably being optimized out, hmm…

EDIT: And I added some code to print out some details about the structure to ensure it is compiled and parsed in full and it somehow got a few milliseconds faster… so yeah those are accurate, C++ is just fast as always…

minhajuddin · May 24, 2018, 5:53pm

Would love a PR

minhajuddin · May 24, 2018, 5:57pm

You are right about jq doing more work, I’ll use the standard json parser and make it match the ruby and elixir versions.

Not sure you mean by this:

In addition the elixir version is actually instancing a tree to hold the whole structure, which is wasted work as well (unsure about ruby).

Elixir would parse the data and have it in memory.

The C++ version seems insane, I’d love to add a Rust benchmark too. I know this is all crude benchmarking but it gives you a sense of comparison.

minhajuddin · May 24, 2018, 5:58pm

I didn’t use Benchee because the run times are fairly large. I’ll see if I can add benchee with small json loads.

OvermindDL1 · May 24, 2018, 6:33pm

I could foresee a Rust one outperforming a C++ one to be honest, but I doubt the current pure-rust libraries would at this time (though still plenty fast).

Sure, I’ll clean it up and PR it into a cpp/rapidjson directory or something.

Do you want a readme.md or INSTALL file in that directory, or do you want me to edit the root readme to add instructions on how to compile/run it?

I was thinking of adding a statistical benchmarker to the C++ version instead of just using time, do you want me to do that pre-PR?

minhajuddin · May 24, 2018, 6:35pm

You can add it to the README at the bottom

I was thinking of adding a statistical benchmarker to the C++ version instead of just using time, do you want me to do that pre-PR?

I don’t know anything about C++, so whatever you feel is the best.

michalmuskala · May 24, 2018, 6:36pm

You’re all making me want to implement Jason.Native again!

OvermindDL1 · May 24, 2018, 6:37pm

It will take longer to benchmark (as it performs lots of tests to get a statistical accuracy), but it would be more detailed (if you don’t mind it taking potentially many minutes (or more) to run)? I leave it up to you as it will make it take substantially longer, but it would also be substantially more accurate, but nothing else is using a statistical benchmarker so it seems kind of useless right now. ^.^;

Should probably leave it out for now, at least unless a real parsing benchmarker was setup across all the languages or something.

OvermindDL1 · May 24, 2018, 6:37pm

Make it with rapidjson (though adding error handling and such would slow it down a bit) and you’d blow everything else away.

michalmuskala · May 24, 2018, 6:42pm

Does rapidjson by default validate UTF-8 when decoding? I know it has an option, but I don’t think it does it by default. This should have significant performance implications, if it does not - it’s comparing apples to oranges without this.

OvermindDL1 · May 24, 2018, 7:18pm

I’m using the UTF<> argument so I’d hope so? Let me check the docs… Hmm, I’m unsure if it is default or not for parsing but I found where to set the flag to force it on regardless, results now:

$ time _builds/rapidjson-sax < ../../data/citylots.json

real    0m0.549s
user    0m0.533s
sys     0m0.016s

$ time _builds/rapidjson-structure < ../../data/citylots.json

real    0m0.542s
user    0m0.509s
sys     0m0.032s

Not seeing much of a difference, so it probably is on by default?

/me has never used rapidjson before, so feel free to check the code, PR incoming in a minute…

EDIT:

OvermindDL1 · May 24, 2018, 10:14pm

You know it would help if I didn’t compile the sax version for both names… >.>

Fixed, results make more sense now:

$ time _builds/rapidjson-sax < ../../data/10mb.json 

real    0m0.039s
user    0m0.035s
sys     0m0.004s
$ time _builds/rapidjson-structure < ../../data/10mb.json 

real    0m0.048s
user    0m0.040s
sys     0m0.008s

$ time _builds/rapidjson-sax < ../../data/citylots.json

real    0m0.545s
user    0m0.516s
sys     0m0.028s

$ time _builds/rapidjson-structure < ../../data/citylots.json

real    0m0.742s
user    0m0.657s
sys     0m0.084s

There it goes, the structure form should be slower than the sax form, that makes MUCH more sense!

OvermindDL1 · May 24, 2018, 10:18pm

Ooo I have to say this is a nice small json library too:

34K May 24 16:12 _builds/rapidjson-sax
43K May 24 16:12 _builds/rapidjson-structure

I’m not even compiling for a minimum size release, just a standard release.