Calculating Length: Postgres or Elixir?

Codball · November 28, 2019, 4:01am

If I want to count the records in a table which is more efficient as records scale? ie:

count = from(p in Post, select: count(p.id))

vs

posts = from(p in Post, select: p.id)
count = length(posts)

Allowing postgres to do the calculations or elixir? Is there any notable difference?

benwilson512 · November 28, 2019, 4:03am

It will be more efficient to do the count in postgres. If you do the length evaluation in Elixir you have to transfer all those ids over, if you do the count on postgres you don’t.

Codball · November 28, 2019, 4:04am

At what point would it begin to matter? Thousands of records?

kwando · November 28, 2019, 4:11am

Doing it in Postgres has to be faster, but COUNT is not a cheap operation in Postgres.

benwilson512 · November 28, 2019, 4:14am

I mean, it isn’t worse than SELECT over the same range of data.

You’ll need to benchmark your use case because what matters depends on the use case. For less than a thousand records performance doesn’t tend to matter, for more than a thousand it can start to.

kwando · November 28, 2019, 4:14am

I would suggest you try and measure it yourself.

kwando · November 28, 2019, 4:17am

Right, it is not worse than SELECT

Fl4m3Ph03n1x · November 28, 2019, 10:13am

If you want to consider benchmarking, I would recommend you consider the following library:

Here is a tutorial on how to use it:

hauleth · November 28, 2019, 11:59am

Let do a simple experiment. Assume that you have 1 Gbps connection between your DB and application, row has BIGSERIAL id (8 bytes). Each row in the Postgres wire protocol has 7 + 4x(fields count) bytes overhead, which mean that each row in our example will be 7 + 4 + 8 = 51 bytes (aka 408 bits) long on the wire. This in the end mean, that we will be able to send at most 2631720 rows per second. To that add the fact that BEAM need to traverse whole list to count number of the elements in the table, so in the end, you are traversing list of all rows twice (once in the DB to fetch and send them) and then second time in your application to count number of elements. So in the end, COUNT(*) in the DB will always be faster, especially if you have WHERE somewhere in your query which will allow Postgres to use index scan instead of table scan.

dimitarvp · December 1, 2019, 9:36am

Postgres will always be more efficient but I thought this was the recommended way of counting in Ecto:

Repo.aggregate(Post, :count, :id)

hauleth · December 1, 2019, 1:48pm

Just important note - do not run COUNT(*) queries without WHERE as this requires table scan.