Yes, fetching 17k rows can be a problem. Question is if you need to fetch them all at once? What do you do with them after that?
You mean to select using filter on date but not include date in the resulting data set?
Try using this query:
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: struct(m, [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code])
and let us know if it goes faster
edit: sorry I don’t think the original query I gave you will run
well, it’s definitely 2-6x faster:
def transactions_after_date(date) do
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code]
end
iex(20)> :timer.tc(fn -> transactions = MerchantTransaction.transactions_after_date(str_hour_ago) |> Markers.Repo.all end) |> elem(0)
[debug] QUERY OK source="merchant_transactions" db=21.5ms decode=81.0ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [{{2018, 2, 20}, {12, 29, 39, 482418}}]
112730
iex(21)> :timer.tc(fn -> transactions = MerchantTransaction.transactions_after_date(str_hour_ago) |> Markers.Repo.all end) |> elem(0)
[debug] QUERY OK source="merchant_transactions" db=21.1ms decode=31.4ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [{{2018, 2, 20}, {12, 29, 39, 482418}}]
54239
Seems the dates interpolation is taking too much time!
Thanks for the clue!
we have separate tables for transactions and their locations and we do grouping and aggregating of transaction amounts in code. generally, it should be done with joins
and SUM()
in one DB query if I get your point, and we should definitely do it this way! But so far we were prototyping and 0.5 sec results versus ms in psql put us into doubts.
Thanks!
Yeah, so as you can see from the Ecto logs, the original data reports around 250ms for database and 100-300ms for decode. The db
as reported by ecto also includes the network roundtrip. If your database is on another host, you need to transfer all of that data - with more data it takes longer. This time is not reported by psql - it only reports the internal database time.
The second measurement - the decode
time is how long it takes for ecto to go from the bytes on the wire into the elixir data structures. The more data you have, the more it takes.
It seems that the issue is just the amount of data you’re pushing over the network or then decoding on the elixir side.
Also - what versions of ecto and postgrex are you using?
ecto 2.2.10
postgrex 0.13.5
interestingly, I have almost same results for
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code]
[debug] QUERY OK source="merchant_transactions" db=26.5ms decode=69.8ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [{{2018, 2, 21}, {14, 49, 47, 275557}}]
159374
and
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code, :trans_date]
[debug] QUERY OK source="merchant_transactions" db=80.9ms decode=144.9ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code", m0."trans_date" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [{{2018, 2, 21}, {14, 49, 47, 275557}}]
230789
Even all fields in select is faster than no select
key:
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code, :trans_date, :trans_created_at, :trans_updated_at]
gives
:timer.tc(fn -> transactions = MerchantTransaction.transactions_after_date(str_hour_ago) |> Markers.Repo.all end) |> elem(0)
[debug] QUERY OK source="merchant_transactions" db=142.2ms decode=291.2ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code", m0."trans_date", m0."trans_created_at", m0."trans_updated_at" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [{{2018, 2, 21}, {14, 25, 45, 460508}}]
441323
Three fields are added if I don’t specify select
: id
, inserted_at
, updated_at
. They add the delay.
@peerreynders Calecto
doesn’t help too much. it gives results between 69-126 ms while Timex
is between 78-143 ms.
Ecto 3 should offer much better decode performance for the datetime fields with the native types - once they are decoded in the driver they are not shuffled anymore.
In Ecto 2 the drivers would return the Erlang-style tuples and then Ecto would parse them into structs. In Ecto 3 everything is using the built-in structs.
db
is significantly more than network round trip time, it will include decoding from bytes to elixir terms in postgrex. decode
time is the time to go from rows (list of lists) to ecto preprocessed result. Ecto will also do post processing not including in this time. Therefore should think of db
more like time using the connection.
Ecto 3:
from m in MerchantTransaction,
where: m.trans_date > ^date,
select: [:merchant_id, :transaction_id, :usd_amount, :trans_amount, :trans_currency_code, :trans_date]
[debug] QUERY OK source="merchant_transactions" db=96.3ms decode=1.6ms
SELECT m0."merchant_id", m0."transaction_id", m0."usd_amount", m0."trans_amount", m0."trans_currency_code", m0."trans_date" FROM "merchant_transactions" AS m0 WHERE (m0."trans_date" > $1) [#DateTime<2018-02-24 15:26:32Z>]
147457
decoding is significantly faster!