How to bulk_update a table with dynamic values

Hello all,

Recently I created a migration that adds a new association to a table. So to ensure that all the data I already have(without that association) has that column correctly filled, I want to do a bulk update.

This is the query I have so far:

from(f in Folder,
      join: w in assoc(f, :workspace),
      where:
        is_nil(f.workspace_id) == true and f.organization_id == w.organization_id and w.default == true,
      update: [set: [workspace_id: w.id]]
    )

So what I want to do, is update all my rows in the folder table, where its workspace_id column is nil. And I want to update it with the id from the matching row in the workspace table, where both have the same organization_id and the column default in the workspace table is set to true.
I have looked through the documentation and a bunch of posts but I don’t know what I’m missing.

Thanks in advance for any help.

Are you aware of this resource?

2 Likes

I’m not sure either - the query you posted looks like it would do what you’re describing. What happens when you run it? Does Ecto generate the correct SQL?

1 Like

Hi,

shouldn’t we find a left_join instead of join on your code?

Also, assoc won’t work (I mean the join condition - f.workspace_id == w.id - will be false - because f.workspace_id are currently null) if you try to set the very same foreign key on which it relies on. You’ll need to explicit the join with:

from(
  f in Folder,
  join: w in Workspace,
    on: f.organization_id == w.organization_id,
    on: w.default,
  where: is_nil(f.workspace_id),
  update: [
    set: [
      workspace_id: w.id,
    ]
  ]
)

?

1 Like

Thanks a lot for your answer. My query was wrong. But now if I do it that way, it won’t actually update the data I want.

Repo.update_all(from(
      f in Folder,
      left_join: w in Workspace,
      on: f.organization_id == w.organization_id and w.default,
      where: is_nil(f.workspace_id)), [set: [workspace_id: w.id]])

If I try it that way, it will complain that w does not exist and is being expanded to w()

I was not aware of that resource. Lots of great information! Thanks a lot for sharing

Re-punctuating your query can help clarify the situation:

query =
  from(
    f in Folder,
    left_join: w in Workspace,
    on: f.organization_id == w.organization_id and w.default,
    where: is_nil(f.workspace_id)
  )

Repo.update_all(query, [set: [workspace_id: w.id]])

This makes the error message about w clearer: the binding is out-of-scope.

This is a situation where Ecto.Query.update helps:

query =
  from(
    f in Folder,
    left_join: w in Workspace,
    on: f.organization_id == w.organization_id and w.default,
    where: is_nil(f.workspace_id),
    update: [set: [workspace_id: w.id]]
  )

Repo.update_all(query, [])
2 Likes

Thanks again for your answer. I got this error with your solution:
** (Ecto.QueryError) PostgreSQL supports only inner joins on update_all, got: left in query

I’ll study to see how I can make this query with inner join and post the solution here.

You can do the inner join on organization_id and add the condition and is_nil(f.workspace_id).

If I understand your requirement correctly then you actually want an inner join because you’re only updating when there is a match on organization_id.

1 Like

al2o3cr, cmo, joey_the_snake, t0t0
Thanks a lot for all your help. The code bellow solved my problem:

query =
      from(
        f in Folder,
        inner_join: w in Workspace,
        on: f.organization_id == w.organization_id and w.default,
        where: is_nil(f.workspace_id),
        update: [set: [workspace_id: w.id]]
      )

    Repo.update_all(query, [])

I still have one doubt, is there any difference between where and on? Because the query I got back from Postgresql was the same for both.

There’s no major difference between where and on for inner joins - the DB might do different optimizations, but the resulting rows are ultimately the same.

This isn’t the case for left/outer joins. For instance, imagine you have a schema where Post has many Comments and each comment has an author_id. This query:

from(
  p in Post,
  left_join: c in assoc(p, :comments),
  where: c.author_id = 123
)

will have a result row for every Post that has a comment with the specified author_id. Posts that don’t have a comment from that author will be filtered out.

from(
  p in Post,
  left_join: c in assoc(p, :comments),
  on: c.author_id = 123
)

This produces an ON clause with an AND, as in your example. This produces at least row in the results for every Post row; posts that don’t have comments by the specified author have all NULLs in the columns from comments.

Combining a left join, on and where can be useful to produce “nonexistence” queries. To get only posts that don’t have a comment from a specified author, you could use this query:

from(
  p in Post,
  left_join: c in assoc(p, :comments),
  on: c.author_id = 123
  where: is_nil(c.id)
)
3 Likes