The Ruby parser is implemented in C, so it’s more like comparing C to Elixir. Obviously C is going to be faster We’re not doing that bad actually. It would be interesting to compare with Jason compiled with HiPE. In my benchmarks this makes it at least twice as fast.
The data is also quite different from what you’d face in a regular HTTP app - the JSON is pretty-printed, most JSON flying on the wire is not. This can have significant difference in the actual performance. Depending on what you want to learn from this, it can be important.
Don’t suppose you’d be up for tweaking the Techempower benchmarks for Elixir to use Jason with HiPE. It’s the basis for the results of so much of those tests I think it would go a long way with the report.
Let’s have some fun, here is how they compare to C++/rapidjson:
$ time jq '.' data/10mb.json > /dev/null
$ time jq '.' data/citylots.json > /dev/null
Time taken [Jason]: 521.655ms
Time taken [Poison]: 1358.531ms
Time taken [Jason]: 12224.44ms
Time taken [Poison]: 33350.239ms
$ time ruby app.rb
$ time ruby app.rb
$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/10mb.json
$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/citylots.json
Admittedly this benchmark is flawed because jq is outputting to stdout and elixir/ruby/C++ are just kind of blackholing the data after it is parsed, so jq/go is artificially limited here. In addition the elixir version is actually instancing a tree to hold the whole structure, which is wasted work as well (unsure about ruby). The C++ version is fully parsing and performing callbacks for every parse (standard sax parsing).
I can PR the C++ one in it if you want, it only needs the normal C++ compiler and cmake installed, nothing else needed (not even rapidjson, it acquires it itself).
Just to make sure, here is the C++ version as both a sax parser, and as an elixir-style-structure-building document parser (yay eating memory):
$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/10mb.json
$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/citylots.json
$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/10mb.json
$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/citylots.json
Not much of a difference, honestly the C++ compiler is so good that it is probably being optimized out, hmm…
EDIT: And I added some code to print out some details about the structure to ensure it is compiled and parsed in full and it somehow got a few milliseconds faster… so yeah those are accurate, C++ is just fast as always…
It will take longer to benchmark (as it performs lots of tests to get a statistical accuracy), but it would be more detailed (if you don’t mind it taking potentially many minutes (or more) to run)? I leave it up to you as it will make it take substantially longer, but it would also be substantially more accurate, but nothing else is using a statistical benchmarker so it seems kind of useless right now. ^.^;
Should probably leave it out for now, at least unless a real parsing benchmarker was setup across all the languages or something.
Does rapidjson by default validate UTF-8 when decoding? I know it has an option, but I don’t think it does it by default. This should have significant performance implications, if it does not - it’s comparing apples to oranges without this.
You know it would help if I didn’t compile the sax version for both names… >.>
Fixed, results make more sense now:
$ time _builds/rapidjson-sax < ../../data/10mb.json
$ time _builds/rapidjson-structure < ../../data/10mb.json
$ time _builds/rapidjson-sax < ../../data/citylots.json
$ time _builds/rapidjson-structure < ../../data/citylots.json
There it goes, the structure form should be slower than the sax form, that makes MUCH more sense!