Here is the query I’m using to fetch news with some tags:
query =
from(n in News,
join: t in assoc(n, :tags),
where: like(t.name, ^"%#{tag}%"),
order_by: [desc: n.published_at],
limit: ^limit,
preload: [:tags]
)
Repo.all(query)
I don’t know if I’m doing something wrong or not. But the result contains duplicate result. I’m using binary IDs and checking result of query, the duplicates are really duplicates with the same ID.
I also googled, and found some issues on Github but they seemed unrelated to my quite simple query.
Any help is appreciated.
EDIT: I just noticed that some result are repeated 3 times.
EDIT2: If you are here because you have the same problem, read until the end of topic. It contains some good insights.
When this happens to me, I usually assume that something fishy is going on with a join somewhere… If you’ve got multiple matches (eg. a news article matches multiple tags), you’ll get a row per match.
If that’s NOT the case… Well, then it’s harder to say without seeing more of the code
This is because you are selecting all new and it seems that every news row can have multiple tags.
This means that if one news article has 3 tags, ot will be 3 rows.
So, I think it might be as simple as remove the join and just use preload. That should spawn 2 queries. One for getting all the news, and one for getting all tags belonging to the list of news ids.
I can confirm that this works. Just tested. However, I don’t like it since the query, if you run it in a sql-terminal returns duplicate rows.
I would have gone with an EXISTS-subquery
This surely was the solution to dedup results, but another thing happens. The limit in query is not applied as one would think.
The query will return 10 rows for example, but then Ecto dedups them and makes the number smaller.
Also, when doing preload: [:tags] Ecto performs two queries, while using preload: [tags: t] results in one query.
I think I need to refactor my query.
That was what I meant with that I don’t like the solution. You can’t use it with limit and offsets (pagination) since Ecto does magic behind the scenes. I would instead to:
where: fragment("EXISTS(SELECT 1 FROM tags WHERE tags.news_id = n0.id AND tags.name = ?)", ^"%#{tag}%")
Right I mean this is how the raw SQL works too. Limit 10 wont’ do 10 tags per news item, it limits the total result set to 10 rows. Ecto’s querying is basically 1:1 with the underlying SQL.
And that is why you should avoid join on a has_many relationship. That is almost never the outcome you want. In that case, you would query from tags and join to news.
If I understand correctly, you mean to perform two separate queries, first find tags with some criteria, and then use their IDs to select the news in another query.
This would make 3 queries in my case (many-to-many).
I am currently trying to implement the exists with an Ecto subquery, but it is failing (maybe obviously).
Yes, thanks. I already got that working. Here is what is working for me now.
query =
from(n in News,
where: fragment("EXISTS(select id from tags as t join news_tag nt on t.id = nt.tag_id where nt.news_id = n0.id and t.name like ?)", ^"%#{tag}%"),
order_by: [desc: n.published_at],
limit: ^limit,
preload: [:tags]
)
I already marked the solution, the conversation showed me that I was doing something wrong. I think the marked solution is the answer of the topic. I also added a note to first post to show that this topic contains more information.