Slow json decoding for postgres field

In our project I have entity MeasurementData and it has json column in Postgres. In ecto schema:

schema "measurement_data" do
    field :data, :map
    ...
end

This field takes ~ 15kb in each record. Then I want to load 60 entities of this records and in ecto it takes ~400ms. Rails does it for ~16ms. I almost sure that this difference refers to parsing of json. But why it so slow in Elixir and are there ways to improve this time? (e.g. if I completely know all fields in this map).

P.S I’m using Jason library for decoding and I also tried erlang jiffy library for it.

@glebzer Any time you’re saying something takes a certain amount of time, please show how you’re determining that. How are you loading these? Can you show any debug SQL logs that show an SQL query and timing information?

What other fields are on the schema?

2 Likes

Sure, no problem. I select only data in query
rails:


elixir:

Well, Ecto gives you the final data and fetches everything from the database.

Rails tries to be smart here, and only fetches an iterator which returns each item when actually accessed. At least this was the case back in the days when I used rails.

I think you are talking about find_each which operates with batches. But this is not my case, cause rails gave me all 71 records too

Have you seen them printed out? Or accessed their values?

Another thing that might kick in here: Ruby can defer parsing the JSON until values in the JSON are accessed or inspected, ecto has to do this eagerly, before handing the data back to you.

1 Like

nice catch. Actually rails allow to see the first pack as a json and … then. When I measure query itself with iteration over collection in rails it shows ~120ms what comes near elixir’s 170ms

t = Time.now
collection = measurement.measurement_datum.where("measured_at BETWEEN ? AND  ?", started_date, finished_date);
collection.map { |md| md.data.keys };
p Time.now - t
p "benchmark"
# 0.120287658

Nope, he isn’t:

Ecto doesn’t do streaming by default as it tends to put more pressure on the messaging system, rather it tries to get everything in bulk (you only want to stream if you know that you will have a LOT of entries) and send it at once, this may slightly up the initial response time by a few milliseconds (as you see) but elixir can also be doing many many other things at the same time, handling other connections, receiving other data from the DB, rendering templates, etc… Thus that makes it overall throughput and scaling far far better. :slight_smile:

1 Like