Why is Phoenix Project using 100% CPU of Digital Ocean droplet?

Sure!

I’m assuming you’re running this in MIX_ENV=prod. If so, in your prod.exs file add:

config :logger, level: :debug

You may already have a config :logger entry, in which case just change the level to debug.

Thanks. Additionally, I will drop all parallel inserts in favour of insert_all. That will take me a day or two perhaps. Then i will use the query time analysis skills that I learned from you today and see if my problem persists. I will say good bye for now but can’t thank you enough.

You could also change slightly the query that @benwilson512 recommended to get more output:

SELECT pid, state, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
ORDER BY query_start desc;

It really seems that something is blocking the parallel inserts (table lock?), and they all then complete in milliseconds after this lock is released.

Are you running something periodically every minute or so? Or maybe the delete_all right before the inserts triggers vacuuming or something? It still should not cause such a delay, but TRUNCATE is a lot more efficient than DELETE when deleting all rows from a table, especially if complex indexes are involved (not sure, maybe Ecto already creates a TRUNCATE when using delete_all with no conditions).

Noted. Yes in the post I did mention I am calling a few APIs and then inserting them in tables every 2 minutes. i dont think Ecto has any truncate function, so using delete_all.

:wave:

How do you truncate a table in ecto? might be useful.

1 Like

Thanks but is it better than Repo.delete_all ?

It is, but for so few rows this shouldn’t be an issue. Were you ever able to get logs from phoenix?

What’s interesting is I was going to reply to him by linking to the source code of Repo.delete_all since it’s linked to in the docs. Because I was curious to see what it did under the hood too.

But the link to the source code lead to a callback and many other levels of abstraction to the point where I couldn’t determine what query is even run when Repo.delete_all is executed.

Do you (or anyone) have any tips on how to trace the source code to quickly get to the “real” code being run?

Not really. Ecto.Repo isn’t the implementation, but MyApp.Repo is. And it’s code is put there by a bunch of macro logic of use Ecto.Repo.

Yes I did. The log files are quite long so, pasting the last few lines.

DELETE FROM "dashboard" AS d0 []
^[[0m^[[36m13:15:45.385 [debug] QUERY OK source="coinbasebitbns" db=0.5ms idle=22.1ms
SELECT c0."id", c0."coin", c0."coinbase_quote", c0."bitbns_quote", c0."coinbase_price", c0."bitbns_bid_price", c0."bitbns_ask_price", c0."bid_difference", c0."ask_difference", c0."bitbns_volume", c0."inserted$
^[[0m^[[36m13:15:45.386 [debug] QUERY OK source="coinbasewazirx" db=0.4ms idle=22.2ms
SELECT c0."id", c0."coin", c0."coinbase_quote", c0."wazirx_quote", c0."coinbase_price", c0."wazirx_bid_price", c0."wazirx_ask_price", c0."bid_difference", c0."ask_difference", c0."wazirx_volume", c0."inserted$
^[[0m^[[36m13:15:45.387 [debug] QUERY OK source="coinbasecoindcx" db=0.4ms idle=21.1ms
SELECT c0."id", c0."coin", c0."coinbase_quote", c0."coindcx_quote", c0."coinbase_price", c0."coindcx_bid_price", c0."coindcx_ask_price", c0."bid_difference", c0."ask_difference", c0."coindcx_volume", c0."inse$
^[[0m^[[36m13:15:45.388 [debug] QUERY OK source="coinbasezebpay" db=0.3ms queue=0.1ms idle=20.2ms
SELECT c0."id", c0."coin", c0."coinbase_quote", c0."zebpay_quote", c0."coinbase_price", c0."zebpay_bid_price", c0."zebpay_ask_price", c0."bid_difference", c0."ask_difference", c0."zebpay_volume", c0."inserted$
^[[0m^[[36m13:15:45.389 [debug] QUERY OK source="binancebitbns" db=0.9ms idle=19.3ms
SELECT b0."id", b0."coin", b0."binance_quote", b0."bitbns_quote", b0."binance_price", b0."bitbns_bid_price", b0."bitbns_ask_price", b0."bid_difference", b0."ask_difference", b0."bitbns_volume", b0."inserted_a$
^[[0m^[[32m13:15:45.403 [debug] QUERY OK db=10.2ms idle=22.4ms
INSERT INTO "dashboard" ("ask_difference","bid_difference","coin","ex1","ex1_price","ex1_quote","ex2","ex2_ask_price","ex2_bid_price","ex2_quote","ex2_volume","inserted_at","updated_at") VALUES ($1,$2,$3,$4,$$
^[[0m

As far as I can see, the DB operations are happening in a few milliseconds. Also, the your psql query results are

  pid  | state  |       age       | usename  |

                                                                                                                                         query



------+--------+-----------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  912 |        |                 | postgres |
  910 |        |                 |          |
  908 |        |                 |          |
  907 |        |                 |          |
  909 |        |                 |          |
 5054 | active | 00:00:00.000853 | postgres | SELECT pid, state, age(clock_timestamp(), query_start), usename, query




                        +
      |        |                 |          | FROM pg_stat_activity




                        +
      |        |                 |          | ORDER BY query_start desc;
 5024 | idle   | 00:00:00.067315 | tac      | /*pga4dash*/




                        +
      |        |                 |          | SELECT 'session_stats' AS chart_name, row_to_json(t) AS chart_data




                        +
      |        |                 |          | FROM (SELECT




                        +
      |        |                 |          |    (SELECT count(*) FROM pg_stat_activity WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16386)) AS "Total",




                        +
      |        |                 |          |    (SELECT count(*) FROM pg_stat_activity WHERE state = 'active' AND datname = (SELECT datname FROM pg_database WHERE oid = 16386))  AS "Active",




                        +
      |        |                 |          |    (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle' AND datname = (SELECT datname FROM pg_database WHERE oid = 16386))  AS "Idle"




                        +
      |        |                 |          | ) t




                        +
      |        |                 |          | UNION ALL




                        +
      |        |                 |          | SELECT 'tps_stats' AS chart_name, row_to_json(t) AS chart_data




                        +
      |        |                 |          | FROM (SELECT




                        +
      |        |                 |          |    (SELECT sum(xact_commit) + sum(xact_rollback) FROM pg_stat_database WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16386)) AS "Transactions",




                        +
      |        |                 |          |    (SELECT sum(xact_commit) FROM pg_stat_database WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16386)) AS "Commits",




                        +
      |        |                 |          |    (SELECT sum(xact_rollback) FROM pg_stat_database WHERE datname = (SELECT datname FROM pg_database WHERE oid = 16386)) AS "Rollbacks"




                        +
      |        |                 |          | )
 2206 | idle   | 00:00:04.780014 | tac      | SELECT c0."id", c0."amount", c0."pair", c0."inserted_at", c0."updated_at" FROM "currencies" AS c0 WHERE (c0."pair" = $1)
 2205 | idle   | 00:00:04.783775 | tac      | DELETE FROM "coinbase" AS c0
 2203 | idle   | 00:00:04.786723 | tac      | INSERT INTO "currencies" ("amount","pair","inserted_at","updated_at") VALUES ($1,$2,$3,$4) ON CONFLICT ("pair") DO UPDATE SET "amount" = EXCLUDED."amount","updated
_at" = EXCLUDED."updated_at" RETURNING "id"
 2200 | idle   | 00:02:35.142986 | tac      | SELECT b0."id", b0."coin", b0."binance_quote", b0."bitbns_quote", b0."binance_price", b0."bitbns_bid_price", b0."bitbns_ask_price", b0."bid_difference", b0."ask_di
fference", b0."bitbns_volume", b0."inserted_at", b0."updated_at" FROM "binancebitbns" AS b0
 2202 | idle   | 00:02:35.144024 | tac      | SELECT c0."id", c0."coin", c0."coinbase_quote", c0."zebpay_quote", c0."coinbase_price", c0."zebpay_bid_price", c0."zebpay_ask_price", c0."bid_difference", c0."ask_
difference", c0."zebpay_volume", c0."inserted_at", c0."updated_at" FROM "coinbasezebpay" AS c0
 2201 | idle   | 00:02:35.144974 | tac      | SELECT c0."id", c0."coin", c0."coinbase_quote", c0."coindcx_quote", c0."coinbase_price", c0."coindcx_bid_price", c0."coindcx_ask_price", c0."bid_difference", c0."a
sk_difference", c0."coindcx_volume", c0."inserted_at", c0."updated_at" FROM "coinbasecoindcx" AS c0
 2199 | idle   | 00:02:35.146087 | tac      | SELECT c0."id", c0."coin", c0."coinbase_quote", c0."wazirx_quote", c0."coinbase_price", c0."wazirx_bid_price", c0."wazirx_ask_price", c0."bid_difference", c0."ask_
difference", c0."wazirx_volume", c0."inserted_at", c0."updated_at" FROM "coinbasewazirx" AS c0
 2197 | idle   | 00:02:35.147666 | tac      | SELECT c0."id", c0."coin", c0."coinbase_quote", c0."bitbns_quote", c0."coinbase_price", c0."bitbns_bid_price", c0."bitbns_ask_price", c0."bid_difference", c0."ask_
difference", c0."bitbns_volume", c0."inserted_at", c0."updated_at" FROM "coinbasebitbns" AS c0
 2198 | idle   | 00:02:35.150304 | tac      | DELETE FROM "dashboard" AS d0
 2204 | idle   | 00:02:35.15318  | tac      | INSERT INTO "binancebitbns" ("ask_difference","bid_difference","binance_price","binance_quote","bitbns_ask_price","bitbns_bid_price","bitbns_quote","bitbns_volume"
,"coin","inserted_at","updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11),($12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22),($23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33),($34,$35,$36,$37,$38,$39,$40,$41,$42,
$43,$44),($45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55),($56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66),($67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77),($78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88),($89,$90,$91,$92
,$93,$94,$95,$96,$97,$98,$99),($100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110),($111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121),($122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132),($133,$1
34,$135,$136,$137,$138,$139,$140,$141,$142,$143),($144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154),($155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165),($166,$167,$168,$169,$170,$171,$172,$173,$174,
$175,$176),($177,$178,$1
 5026 | idle   | 00:07:53.968991 | tac      | SELECT n.nspname, r.relname




                        +
      |        |                 |          | FROM pg_class r




                        +
      |        |                 |          |     LEFT JOIN pg_namespace n ON (r.relnamespace = n.oid)




                        +
      |        |                 |          | WHERE r.oid = 11750;




                        +
      |        |                 |          |
 5023 | idle   | 00:08:05.233892 | tac      | SELECT




                        +
      |        |                 |          |     db.oid as did, db.datname as name, ta.spcname as spcname, db.datallowconn,




                        +
      |        |                 |          |     has_database_privilege(db.oid, 'CREATE') as cancreate, datdba as owner




                        +
      |        |                 |          | FROM




                        +
      |        |                 |          |     pg_database db




                        +
      |        |                 |          |     LEFT OUTER JOIN pg_tablespace ta ON db.dattablespace = ta.oid




                        +
      |        |                 |          | WHERE db.oid > 13090::OID




                        +
      |        |                 |          |




                        +
      |        |                 |          | ORDER BY datname;
(19 rows)

What I can make out from the above is all the DB operations are happening quickly, coz they are shown as idle, right ?

I also replaced all parallel inserts with insert_all, but to no avail. Still at 100% CPU

Why do you think age(clock_timestamp(), query_start) means the time taken to complete the transaction ? I think it shows time passed since the query started. I checked the logs, and all my queries get done in a few milliseconds.

I think the query I gave you is too broad, as now you see all the idle queries too, which are not very informative. Maybe something like:

SELECT pid, state, age(clock_timestamp(), query_start), xact_start, waiting, query
FROM pg_stat_activity
WHERE state IN ('active', 'idle in transaction')
ORDER BY query_start desc;

This should show you queries that are either active or stuck in a transaction. It also shows now if the query is waiting on a lock.

That said, you should also investigate in other directions, as your code could be optimized, but should not cause such a big CPU utilization given the numbers. You said only you are using the app. Do you see other queries being issued to the database, apart from the ones coming from the app? Is there anything else connecting to the same database?

Ran your query. Nothing is stuck in transaction

 pid  | state  |       age       |          xact_start           |                                   query
------+--------+-----------------+-------------------------------+---------------------------------------------------------------------------
 5054 | active | 00:00:00.000806 | 2020-04-17 14:04:19.192267+00 | SELECT pid, state, age(clock_timestamp(), query_start), xact_start, query+
      |        |                 |                               | FROM pg_stat_activity                                                    +
      |        |                 |                               | WHERE state IN ('active', 'idle in transaction')                         +
      |        |                 |                               | ORDER BY query_start desc;
(1 row)

My app is pretty simple one , calls a few APIs and saves in DB (all of this happens in millliseconds), then it waits for 2 mins and just repeats it. It is the sole app running in my droplet. So, this is the only app talking to the DB. Apart from PGadmin on my client, nothing else is connected.

Also the two postgres PIDs that are consuming 100% CPU are nowhere to be seen in pg_stat_activity

Try again a few times, maybe also adding the waiting column in the SELECT, as I did in my previous post (I edited it), to see if you catch anything. The other day it was reporting the INSERT queries taking a long time, but we did not know if they were waiting on a lock. Also, we were not including queries idle in transaction.

Another question: your first screenshot shows two processes from the postgres user. What are they? Can you check with ps aux | grep postgres (PostgresSQL modifies the command as shown by ps in order to make it easier to identify which process is what)?

Column waiting doesn’t exist in the pg_stat_activity table.

I myself don’t know what are these two processes ? These are the culprit who are taking 100% CPU.

And this is the result for ps aux | grep postgres

postgres   902  0.0  2.6 320860 27192 ?        S    10:55   0:00 /usr/lib/postgresql/11/bin/postgres -D /var/lib/postgresql/11/main -c config_file=/etc/postgresql/11/main/postgresql.conf
postgres   907  0.0  1.1 321008 11348 ?        Ss   10:55   0:00 postgres: 11/main: checkpointer
postgres   908  0.0  0.7 320860  7624 ?        Ss   10:55   0:00 postgres: 11/main: background writer
postgres   909  0.0  0.8 320860  8992 ?        Ss   10:55   0:01 postgres: 11/main: walwriter
postgres   910  0.0  0.7 321396  7472 ?        Ss   10:55   0:00 postgres: 11/main: autovacuum launcher
postgres   911  0.0  0.4 176044  4488 ?        Ss   10:55   0:02 postgres: 11/main: stats collector
postgres   912  0.0  0.6 321260  6548 ?        Ss   10:55   0:00 postgres: 11/main: logical replication launcher
postgres  2197  0.0  2.5 327760 26100 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(38499) idle
postgres  2198  0.0  2.6 328856 26684 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(39825) idle
postgres  2199  0.0  2.4 327404 24628 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(43823) idle
postgres  2200  0.0  2.4 327448 25044 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(57623) idle
postgres  2201  0.0  2.5 327596 25592 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(34135) idle
postgres  2202  0.0  2.5 327520 25888 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(50855) idle
postgres  2203  0.0  2.4 327764 25060 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(38189) idle
postgres  2204  0.0  2.5 327608 25328 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(59471) idle
postgres  2205  0.0  2.5 327844 25372 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(60961) idle
postgres  2206  0.0  2.5 327688 25728 ?        Ss   11:01   0:00 postgres: 11/main: tac arbit_prod 127.0.0.1(39093) idle
postgres  2614  0.0  0.0   2256    76 ?        Ss   11:06   0:00 Y5kubC
postgres  3099 92.8 26.4 305032 267012 ?       Ssl  11:25 172:18 OHQ6mh
postgres  5023  0.0  1.6 322132 16820 ?        Ss   13:13   0:00 postgres: 11/main: tac postgres 122.163.228.61(60374) idle
postgres  5024  0.6  2.4 332404 25212 ?        Ss   13:13   0:29 postgres: 11/main: tac arbit_prod 122.163.228.61(9897) idle
postgres  5026  0.0  2.0 324128 20632 ?        Ss   13:13   0:00 postgres: 11/main: tac arbit_prod 122.163.228.61(60435) idle
root      5033  0.0  0.4  64208  4332 pts/0    S    13:13   0:00 sudo -i -u postgres
postgres  5036  0.0  0.4  23116  4844 pts/0    S    13:13   0:00 -bash
postgres  5046  0.0  0.9 118636  9272 pts/0    S+   13:14   0:00 /usr/lib/postgresql/11/bin/psql arbit_prod
postgres  5054  0.0  1.9 329352 19452 ?        Ss   13:14   0:00 postgres: 11/main: postgres arbit_prod [local] idle
tac       6277  0.0  0.1  14856  1144 pts/2    S+   14:30   0:00 grep --color=auto postgres

I can notice pid 3099 is present in both, but don’t knwo what to make out of it.

You can see that the process with PID 3099 is the one using all the CPU. It is not the Postgres master, and also not one of the processes handling client connections from your app (those are the ones with postgres: 11/main: tac arbit_prod). You need to figure out what that process is (the command field set to some random characters, OHQ6mh, is definitely weird).

At this point though, it is very unlikely that the problem is in your Phoenix app logic.

Thanks for stating that my Phoenix app is not the issue. I knew it cant be but a second opinion does make me feel better. Yes, that wierd command doesnt return any google results too . I dont know how can I find out anything about it. When I restart my droplet, CPU is at 1%, then after a few minutes this process appears and takes my CPU at 100%, happens always. Should I try reinstalling postgres and see if it changes anything ?

In your investigation, consider that ps or htop is just telling us that this process executes with the postgres user. It does not necessarily imply that it is actually related to PostgreSQL. The strange command field makes me suspect it is something else, running with the postgres user.

Interesting. Noted. I always create a new user whenever I install postgres anywhere. Standard practice. I never do anything with the postgres user. Anyway, thanks for your insights.