Performance issue on a phoenix + absinthe + ecto + postgresql web app

setop · July 23, 2021, 9:49am

Hi all,

I have a classical phoenix / absinthe / ecto web app, providing GraphQL API to the front-end.
I use PostgreSQL as a database.

The main table has 200k rows and the API is providing search feature on it.
The API call are quite slow : 7 seconds.

Since the app is composed of elixir code and pg db, I tried to identify how time spreads among them.

Thanks to incendium, I profiled the resolver responsible for the search :

I think I get what append : the main process launch two async tasks : one for the result page (select where order by offset limit) and one for the count. When tasks are over, results are serialized back to the caller.

From this flamegraph, it seems obvious that elixir code is doing nothing but waiting for the DB. So I activated the duration report for long queries in postgres.

I see that both queries are taking around 2" each. Even if they were done in sequence (hard to tell if it is the case or not), it tops to 4". So there is still 3" that I don’t understand where they are spent. I checked in the web browser, network part is only few ms.

Since I was not pleased with the time taken in the DB, I added some indexes. The time is now “1 for the page 2” for the count. The overall time decrease accordingly but I still have one to two sec that I don’t know where they are spent.

Any advise on how to further investigate ?

Regards

LostKobrakai · July 23, 2021, 10:10am

Times as seen on the elixir side are not the same as seen from postgres, given there is a network between them. The query needs to be sent to postgres and more importantly any results need to be sent back to elixir. I’m not sure though this is (all) you’re looking for.

benwilson512 · July 24, 2021, 7:20pm

How many rows are you returning to the front end?

setop · July 25, 2021, 8:26am

10 elements per call (1 page), that is about 4KB.

al2o3cr · July 25, 2021, 4:08pm

2s for a COUNT against a table with 200k rows seems very high; no amount of Elixir optimization is going to correct for misbehaving SQL. For instance, I once found a query in production that used a subquery that joined a large table to itself (creating N^2 intermediate rows) then immediately filtered out 99.999% of those - moving the filter into the subquery caused a 100x performance boost!

For a next step, I recommend extracting the SQL that that COUNT is running and running EXPLAIN on it.