Help filtering many-to-many associations with Ecto

I have an Entry with a many-to-many relationship to Tag. Like so:

defmodule Tag do
  ...
  schema "tags" do
    field :name, :string
  end
end

defmodule Entry do
  ...
  schema "links" do
    ...
    many_to_many :tags, Tag, join_through: "entries_tags",
  end
end

Join table :entries_tags, has
  :entry_id, references(:entries)
  :tag_id, references(:tags)

Now, I want to filter based on a list of tags. I’m using this query

from entry in Entry,
  preload: [:tags],
  distinct: entry.id,
  join: tag in assoc(entry, :tags),
  where: tag.name in ^tags #tags is a list of strings passed as a param

The the query above gives me all the entries that have at least
one tag in the filter list. So for example, if I have 3 entries like so:

  1. Entry.tags = [“a”, “b”, “c”]
  2. Entry.tags = [“a”, “d”]
  3. Entry.tags = [“d”]

and I filter with tags [“a”, “b”], it will match both #1 and #2.

My problem is that I want only to get the entries that have the filter tags as a subset of
their associated tags, meaning that using the same values as in the example above, the
returned list should only contain the entry #1

Any idea how can I construct such a query?

4 Likes

I’d start with Tag and not the entry, select needed tags with OR and then join their entries

2 Likes

Thanks, I tried your approach I started by getting the tags that include the filters:

tags = from tag in Tag, where: tag.name in ^filter_tags

but now I get stuck on how to join with the entries.

Using once more the example from my post, the query above with filters [“a”, “b”] gives me 2 Tags:

Tag.name = "a" (related to entry #1) 
Tag.name = "b" (related to entries #1 and #2)

I cannot just join and get all the related entries to those 2 tags since it would return entries #1 and #2. It should instead only return entry #1 as is the one that belongs to all the filter tags.
Not sure how to check if the entries related to those tags contain all the filter_tags

2 Likes

ok it is indeed somewhat trickier then I thought at first. Here is how it might work, you might have to tweak this logic - I don’t have anything with ecto and many to many, so no code:

  • fetch tags as stated above, you’d have their ids from entries_tags
  • select all entries and join entries_tags
  • for each of the tag ids from the first query add a separate where clause that checks entries_tags for the given tag id

this way you’d only get entries that have all the tags. This thing might also have a simpler solution that I can’t think of at the moment :slight_smile:

2 Likes

Thanks for the help! sorry about the delayed answer, did not get to work on this yesterday (side learning project).

I wasn’t able to make it work, I’m clearly missing something out. I tried doing the following

tags = from tag in Tag, where: tag.name in ^filter_tags

from entry in Entry,
  join: tag in subquery(tags),
  join: entries_tags in "entries_tags", on: entries_tags.link_id == link.id and entries_tags.tag_id == tag.id,
  select: link

but this effectively selects all the entries that have one of those tags. I’m missing a where clause that checks if all the tags are a subset of entry.tags, but I found nothing that would help with this in the Ecto.Query docs. Any idea how would I be able to filter like that?

2 Likes

you’d have to (at least at first) use separate where clauses, they work with and by default. Something like this

from entry in Entry,    
join: entries_tags ...,
# you might reduce over the query to add these dynamically
where: entries_tags.tag_id == ^tag1_id,
where: entries_tags.tag_id == ^tag2_id
select: link
1 Like

Quite probably I misunderstood, but I don’t think that would work, since entries_tags has a reference to a unique tag. So there won’t be any entries_tags that has a reference to both tag1 and tag2 at the same time.

1 Like

I’d probably just do this (I don’t use invisible many-to-many joins, I like explicit tables, so I’ll imagine your entries_tags join table exposed as EntriesTags, I’ll also imagine you have a unique index field as a composite of the entry and tag foreign keys as I always do for many-to-many joins of this form, I would also add a virtual _rank field on Entry, I add that field to almost all my models just for purposes like this since Ecto has no good way to convert a struct return table to a map while adding a column (*hint*hint*, we need that)):

tags = ["a", "b"]

# Grab all entries that have these tags
squery =
  from entrytag in EntriesTags,
  join: tag in Tag, on: entrytag.tag_id == tag.id and tag.name in ^tags,
  join: entry in Entry, on: entrytag.entry_id == entry.id,
  select: %{entry | _rank: fragment("rank() OVER (PARTITION BY ?)", entrytag.tag_id)}

query =
  from s in subquery(squery),
  where: s._rank == ^length(tag), # Then only grab entries that have 'all' the listed tag
  join: tag in assoc(entry, :tags), # Now let's preload the tags in the same query, assuming you want them, or leave out this and the next line
  preload: [:tags]

results = Repo.all(query)

Another alternative is to just aggregate all the tags into an array via array_agg, however that would require adding yet another virtual field on your schema or to manually specify everything you want returned in a map, assuming you do the latter:

tags = ["a", "b"]

# Aggregate all the tags on an entry
squery =
  from entry in Entry,
  join: tag in assoc(entry, :tags),
  group_by: entry.id,
  select: %{
    blah: entry.blah,
    others: entry.others,
    tag_names: fragment("arrray_agg(?)", tag.name),
  }

# Then query that list to grab what you want:
query =
  from s in subquery(squery),
  where: fragment("? <@ ?", ^tags, s.tag_names)
  
  results = Repo.all(query)

Or something like that…

4 Likes

Quite probably I misunderstood, but I don’t think that would work, since entries_tags has a reference to a unique tag. So there won’t be any entries_tags that has a reference to both tag1 and tag2 at the same time.

entries_tags is a join table, right? You have entries with id, tags with id and entries_tags that handles many to many relation? In this case your entries_tags will have unique pairs of ids, eg

entry_id: 1, tag_id: 1
entry_id: 1, tag_id: 2
entry_id: 2, tag_id: 1
...

to fetch entries that have both tags 1 and 2 you’d be able to use multiple where clauses

also what @OvermindDL1 said, if you have EntriesTags model for your join table you can use it directly

1 Like

Thank you @OvermindDL1 ! your second suggestion is what ultimately made it work.
I was trying to avoid to use raw sql as I was sure that there would be a way to accomplish this with Ecto’s DSL, but I guess the option it is there to deal with things that the DSL cannot cover.

My final query looks like so

squery =
  from entry in Entry,
    join: tag in assoc(entry, :tags),
    group_by: entry.id,
    select: %{id: entry.id, tag_names: fragment("array_agg(?)", tag.name)}

from sq in subquery(squery),
  join: entry, on: entry.id == sq.id,
  where: fragment("? <@ ?", ^tags, sq.tag_names),
  select: entry

I had to add the join in the final query to be able to return the entry based on the schema.
Selecting the fields manually as you suggested also worked but it seems tedious and would fail to add any eventual new field, although it probably is faster

I tried to avoid the join by using your suggestion of a virtual field in Entry, and making the query like so:

field :tag_names, {:array, :string}, virtual: true # Entry schema

squery =
  from entry in query,
    join: tag in assoc(entry, :tags),
    group_by: entry.id,
    select: %{entry | tag_names: fragment("array_agg(?)", tag.name)}

from sq in subquery(squery),
  where: fragment("? <@ ?", ^tags, sq.tag_names)

but that raised an error (originated from the second query)
** (Ecto.SubQueryError) the following exception happened when compiling a subquery.

     ** (Ecto.SubQueryError) the following exception happened when compiling a subquery.
     
         ** (FunctionClauseError) no function clause matching in anonymous fn/1 in Ecto.Query.Planner.subquery_fields/2
 ...

Not sure what that error means, I’m guessing due to the related tags field.

3 Likes

Hmm, fantastic question. Can @michalmuskala or so assist? I’ve been hitting a lot of things that Ecto gives that a similar (though not exact) error at this location and I am really curious as to at least the reason for this one…

1 Like

I think something like this should work (not tested):

from entry in Entry,
  preload: [:tags],
  join: tag in assoc(entry, :tags),
  group_by: entry.id,
  having: fragment("? <@ array_agg(?)", ^tags, tag.name)
9 Likes

Your solution worked perfectly as well @michalmuskala but with a much more cleaner query, thank you very much!

2 Likes

@OvermindDL1 I’m trying to do something similar with a many to many relationship but instead of filtering, I want to calculate the virtual field based on other relationships.

I have many to many relationships of User-Transactions-Merchant, where the user has many Merchants through Transactions and the Merchant has many customers through Transactions. Each Transaction has a value, and the sum of transactions associated between a specific user and a specific merchant is the balance for that users relationship with that merchant.

I use this query to calculate the balance to for user based on the merchant

def customer_balance(user_id: user_id, merchant_id: merchant_id) do
  q = from t in Transaction,
    select: fragment("SUM(CASE WHEN ? = 'credit' THEN (?) ELSE - (?) END)", t.type, t.amount, t.amount),
    where: t.user_id == ^user_id and t.merchant_id == ^merchant_id
  balance = Repo.one(q) || 0
  do_balance(balance, "asset")
   |> Money.new(:USD)
end

I’ve been trying to do something like this but I haven’t had success.

squery =
  from u in User,
    join: m in assoc(merchant, :merchants),

from sq in subquery(squery),
  select: %{ sq | balance: fragment("SUM(CASE WHEN ? = 'credit' THEN (?) ELSE - (?) END)", t.type, t.amount, t.amount)},
    where: sq.user_id == ^user_id and sq.merchant_id == ^merchant_id

I think I may may need to preload the transactions prior to the outer query. When I look at just raw SQL nested subqueries, it looks right but I think I’m over thinking it.

In the end I should have a list of users for a specific merchant with the users balance calculated for that merchant. Am I going the right direction here?

What issues or errors? Have you tried to formulate your SQL in raw SQL before converting it to Ecto?

Looks right on a cursory look so far yep?

@OvermindDL1 - @mbuhot answered my question on StackOverflow a few weeks ago. (Thanks @mbuhot !!)

Here is the response:

Move the fragment into a macro to keep the code clear:

  defmacro balance_amount(transaction) do
    quote do
      fragment("CASE WHEN ? = 'credit' THEN (?) ELSE - (?) END",
        unquote(transaction).type, unquote(transaction).amount, unquote(transaction).amount)
    end
  end

Create a subquery with %{user_id, merchant_id, balance}

  def user_merchant_balance do
    from t in Transaction,
    select: %{user_id: t.user_id, merchant_id: t.merchant_id, balance: sum(balance_amount(t))},
    group_by: [t.user_id, t.merchant_id]
  end

Join to the subquery from the main query, use the map update syntax %{|} to populate the virtual field:

  def merchant_customers(merchant_id) do
    from u in User,
    join: b in subquery(user_merchant_balance()), on: u.id == b.user_id,
    where: b.merchant_id == ^merchant_id,
    select: %{u | balance: b.balance}
  end

This is the final code I ended up using:

  @doc """
  Retrieves all customers with balances for a merchant
  """
  @spec find_merchant_customers_with_balance(struct) :: [%MyApp.User{}]
  def find_merchant_customers_with_balance(%{"merchant_id" => id}) do
    q = merchant_customers(id)
    Repo.all(q)
  end

  @doc """
  User balance calcuation macro
  """
  defmacro balance_amount(transaction) do
    quote do
      fragment("CASE WHEN ? = 'debit' THEN (?) ELSE - (?) END",
        unquote(transaction).type, unquote(transaction).amount, unquote(transaction).amount)
    end
  end

  @doc """
  Query for retrieving user balance for merchant
  """
  @spec user_merchant_balance :: %Ecto.Query{}
  def user_merchant_balance do
    from t in Transaction,
    select: %{user_id: t.user_id, merchant_id: t.merchant_id, balance: sum(balance_amount(t))},
    group_by: [t.user_id, t.merchant_id]
  end

  @doc """
  Query for retrieving all the merchants for a customer with abalance
  """
  @spec merchant_customers(number) :: %Ecto.Query{}
  def merchant_customers(merchant_id) do
    from u in User,
    join: b in subquery(user_merchant_balance()), on: u.id == b.user_id,
    where: b.merchant_id == ^merchant_id,
    select: %{u | balance: b.balance}
  end

I still need to figure out if I can get the balance attribute in the returned query returned as a Money.Ecto struct instead of a raw number. Any ideas on how I might be able to do that?

On the customer_balance function I transform it afterward into Money, but that doesn’t return as a User

def customer_balance(user_id: user_id, merchant_id: merchant_id) do
  q = from t in Transaction,
    select: fragment("SUM(CASE WHEN ? = 'credit' THEN (?) ELSE - (?) END)", t.type, t.amount, t.amount),
    where: t.user_id == ^user_id and t.merchant_id == ^merchant_id
  balance = Repo.one(q) || 0
  do_balance(balance, "asset")
   |> Money.new(:USD)
end
2 Likes

Looks like you can populate a virtual field with a custom type in ecto 2.2:

Edit: Not quite, can’t use type and sum together, but another layer of subquery does the trick:

@spec user_merchant_balance :: %Ecto.Query{}
def user_merchant_balance do
  from record in subquery(
    from t in Transaction,
    select: %{user_id: t.user_id, merchant_id: t.merchant_id, balance: sum(balance_amount(t)},
    group_by: [t.user_id, t.merchant_id])
  select: %{user_id: record.user_id, merchant_id: record.merchant_id, balance: type(record.balance, Money.Ecto.Type)}
end