Calculating Length: Postgres or Elixir?

If I want to count the records in a table which is more efficient as records scale? ie:

count = from(p in Post, select: count(p.id))

vs

posts = from(p in Post, select: p.id)
count = length(posts)

Allowing postgres to do the calculations or elixir? Is there any notable difference?

It will be more efficient to do the count in postgres. If you do the length evaluation in Elixir you have to transfer all those ids over, if you do the count on postgres you don’t.

6 Likes

At what point would it begin to matter? Thousands of records?

Doing it in Postgres has to be faster, but COUNT is not a cheap operation in Postgres.

3 Likes

I mean, it isn’t worse than SELECT over the same range of data.

You’ll need to benchmark your use case because what matters depends on the use case. For less than a thousand records performance doesn’t tend to matter, for more than a thousand it can start to.

5 Likes

I would suggest you try and measure it yourself.

Right, it is not worse than SELECT :slight_smile:

If you want to consider benchmarking, I would recommend you consider the following library:

Here is a tutorial on how to use it:

2 Likes

Let do a simple experiment. Assume that you have 1 Gbps connection between your DB and application, row has BIGSERIAL id (8 bytes). Each row in the Postgres wire protocol has 7 + 4x(fields count) bytes overhead, which mean that each row in our example will be 7 + 4 + 8 = 51 bytes (aka 408 bits) long on the wire. This in the end mean, that we will be able to send at most 2631720 rows per second. To that add the fact that BEAM need to traverse whole list to count number of the elements in the table, so in the end, you are traversing list of all rows twice (once in the DB to fetch and send them) and then second time in your application to count number of elements. So in the end, COUNT(*) in the DB will always be faster, especially if you have WHERE somewhere in your query which will allow Postgres to use index scan instead of table scan.

4 Likes

Postgres will always be more efficient but I thought this was the recommended way of counting in Ecto:

Repo.aggregate(Post, :count, :id)
2 Likes

Just important note - do not run COUNT(*) queries without WHERE as this requires table scan.

3 Likes