Select rows which are not present in another table

I have two tables, “questions” and “test_questions”:

schema "questions" do
  has_one    :test_question,  TestQuestion
  field      :question_type,  :string, null: false
  field      :body,           :string, null: false
end

schema "test_questions" do
  belongs_to :test,           Test
  belongs_to :question,       Question
  field      :position,       :integer, null: false # question no. in the test
end

Is there any efficient way to retrieve all questions that are not test questions? I tried something like the following, according to a StackOverflow answer (https://stackoverflow.com/a/19364694):

Repo.all(from q in Question,
  where: q.question_type == "long question"
  and not ^Repo.exists?(from tq in TestQuestion, where: tq.question_id == q.id),
  limit: 20
)

But I got the compilation error that q is an unbounded variable. I also tried to change q.id to ^q.id, but the compiler complained that q/0 is an undefined function.

Edit. It seems that the following query works:

Repo.all(from q in Question,
  left_join: tq in assoc(q, :test_question),
  where: q.question_type == "long"
  and is_nil(tq.id),
  limit: 20
)

but I’m not sure if this is really a correct or good way to solve the problem.

This can be accomplished by a RIGHT JOIN if I’m not mistaken.

1 Like

The issue with your implementation is that it will execute the subquery for every question. This quickly gets very slow when you start scaling.

Currently on mobile so its hard to write code, but your best bet is probably to do an outer join from question to test_question based on question_id. After that outer join you can use a WHERE to check which rows have tq.id IS NULL.

Hopefully thats clear :slight_smile:

Edit: Sfusato’s version is definitely the way to go

Thanks, but I don’t understand what A.Key and B.Key should be. The “test_questions” table essentially contains pointers to the test and the question. Its schema has only three entries:

schema "test_questions" do
  belongs_to :test,     Test
  belongs_to :question, Question
  field      :position, :integer, null: false # question no. in the test, not question ID
end

The only relevant field that can be used as the key is question_id, which is never null.

The “test_questions” table references the “questions” table, but not vice versa. The “test_questions” table essentially contains pointers to tests and questions. At present, I can retrieve all test questions first, and perform another query to obtain all questions whose IDs do not appear on the test questions list:

ids = Repo.all(from tq in TestQuestion, select: tq.question_id)

questions = Repo.all(from q in Question,
  where: q.question_type == "long question"
  and q.id not in ^ids,
  limit: 20
)

but this is too inefficient.

A is your test_questions table
B is questions table

When you right join the tables, if there’s a question that doesn’t appear in test_questions, then test_questions.question_id in the “computed join table” will be null. This is the whole idea.

Try this:

from tq in TestQuestion,
  right_join: q in Question,
  on: tq.question_id == q.id,
  where: is_nil(tq.question_id)

ps: w3schools has some examples and you can also play with the queries.

3 Likes