Programming Phoenix 1.4, Writing Queries with Keyword Syntax

WestKeys · March 8, 2021, 2:42pm

In Chapter 7 - Diving Deeper into Ecto Queries - Writing Queries with Keyword Syntax, we have:

Repo.one from u in User,
         select: count(u.id),
         where: ilike(u.username, "j%") or
                ilike(u.username, "c%")

Is there any reason in particular we are using count(u.id) rather than count(u)?

cenotaph · March 8, 2021, 3:41pm

basically 1:1 translation from SQL

When you work with multiple tables you need to shorthand them to specify the table if you don’t want to write the full name every time.

select u.* from users u;

translated here with really useful scopes

edit: let me get into details count(u) would ask to count instances of users table in sql, such statement doesn’t exist in SQL of course.

WestKeys · March 8, 2021, 3:58pm

Sorry I’m not following. I think you’re answering the question "what’s the purpose of u" ? I’m asking why count ids rather than just the rows

cenotaph · March 8, 2021, 4:01pm

u is not the row in that instance, as I have said it is the table if you had to map.

when you execute select count(*) you can but you should not in raw sql.

WestKeys · March 8, 2021, 4:11pm

Ok, so this is misleading?

Programming Phoenix 1.4 - Chapter 7 - Associating Videos and Categories:

iex> query = from c in Category,
…> select: c.name
iex> Repo.all query

First, we create a query. In this case:

from is a macro that builds a query.
c in Category means we’re pulling rows (labeled c) from the Category schema.
select: c.name means we’re going to return only the name field.

Why not?

cenotaph · March 8, 2021, 4:20pm

That is the author’s description and metaphor, more fitting for an ActiveRecord ORM… I don’t think I am gonna further comment on that.

because you might have BLOBs TEXTS in your columns or have 1000s of columns in a table, why load all into memory to count 1 of it?

you want to use the primary key to count the instance as quick as possible with minimal footprint.

WestKeys · March 8, 2021, 4:24pm

Pretty sure the author is the author of ecto

Makes perfect sense (Edit : see below)

WestKeys · March 8, 2021, 4:27pm

Although I am surprised mapping layers or even db engines aren’t intelligent enough to infer that a count on a table does not require loading entire row into memory…

WestKeys · March 8, 2021, 5:04pm

Decided to do some quick research and it appears count(*) does not, in fact, load all the columns into memory

PostgreSQL Link 1

The “*” in SELECT * FROM ... is expanded to all columns. Consequently, many people think that using count(*) is inefficient and should be written count(id) or count(1) instead.

But the “*” in count(*) is quite different, it just means “row” and is not expanded at all (actually, that is a “zero-argument aggregate”). Writing count(1) or count(id) are actually slower than count(*), because they have to test if the argument IS NULL or not (count, like most aggregates, ignores NULL arguments).

PostgreSQL Link 2

A note about count(1) vs count(*). One might think that count(1) would be faster because count(*) appears to consult the data for a whole row. However the opposite is true. The star symbol is meaningless here, unlike its use in SELECT *. PostgreSQL parses The expression count(*) as a special case taking no arguments. (Historically the expression ought to have been defined as count().) On the other hand count(1) takes an argument and PostgreSQL has to check at every row to see that ts argument, 1, is indeed still not NULL.

SQL Server

For the other two queries, SQL Server would use the following rule. To perform a query like SELECT COUNT(*), SQL Server will use the narrowest non-clustered index to count the rows. If the table does not have any non-clustered index, it will have to scan the table.

MySQL

InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index.

I’m def no db expert, just trying to understand the rationale behind count(u.id) rather than count(u)

cenotaph · March 8, 2021, 5:20pm

aha wow all dbs have taken precautions against our errors, that’s great to know

WestKeys · March 8, 2021, 5:40pm

Not really, why would a db engine load every column in a row to count number of rows in a table? I’d be surprised if count(*) was ever implemented this way