Looking for reviewers: a howto guide on associations

peerreynders · June 26, 2019, 11:51am

Is at the core of the problem:

Enum.at(Regex.run(~r{(.*)\.([^\.]*)}, p.name), 2)

isn’t something that Ecto can “translate” so that the database can do it (i.e. the database can’t run Elixir code).

PostgreSQL does have regex functionality

=# SELECT substring('2018.0044.2' from '\d+$');
 substring 
-----------
 2
(1 row)
=# SELECT substring('2018.0044.2' from '^\d+\.\d+');
 substring 
-----------
 2018.0044
(1 row)

set expects a value. I don’t know if set will accept a fragment (example).

But the idea is to give Ecto the information it needs so that it can generate a single update statement the database can understand that ends up being something like:

UPDATE plants SET
code = substring(name from '\d+$'),
accession_id = id;

Another option is to just execute the raw SQL.

iex(1)> query =
...(1)>   from(p in "plants",
...(1)>     update: [set: [code: fragment("substring(? from \'\d+$\')", p.name)]],
...(1)>     update: [set: [accession_id: p.id]]
...(1)>   )
#Ecto.Query<from p0 in "plants",
 update: [set: [code: fragment("substring(? from '\d+$')", p0.name)]],
 update: [set: [accession_id: p0.id]]>
iex(2)> Ecto.Adapters.SQL.to_sql(:update_all, Repo, query)
{"UPDATE \"plants\" AS p0 SET \"code\" = substring(p0.\"name\" from '\d+$'), \"accession_id\" = p0.\"id\"",
 []}
iex(3)

Filling the accessions table probably looks something like this

INSERT INTO accessions (id, bought_on, bought_from, code)
(SELECT p.id, p.bought_on, p.bought_from, substring(p.name from '^\d+\.\d+') FROM plants AS p
INNER JOIN (SELECT min(t.id) AS id FROM plants AS t GROUP BY substring(t.name from '^\d+\.\d+')) AS u ON p.id = u.id);

I’m not aware of a way to “build” this type of query from Ecto, so execute and query/4 may be options.

Alternately you could use a query to create the structures you need to run in a separate insert:

iex(1)> sub = from(t in "plants", select: %{id: min(t.id)}, group_by: fragment("substring(? from \'^\d+\.\d+\')", t.name))
#Ecto.Query<from p0 in "plants",
 group_by: [fragment("substring(? from '^\d+.\d+')", p0.name)],
 select: %{id: min(p0.id)}>
iex(2)> query =
...(2)>   from(p in "plants",
...(2)>     select: %{id: p.id, bought_on: p.bought_on, bought_from: p.bought_from, code: fragment("substring(? from \'^\d+\.\d+\')", p.name)},
...(2)>     join: u in subquery(sub), on: u.id == p.id 
...(2)> )
#Ecto.Query<from p0 in "plants",
 join: p1 in subquery(from p0 in "plants",
  group_by: [fragment("substring(? from '^\d+.\d+')", p0.name)],
  select: %{id: min(p0.id)}),
 on: p1.id == p0.id,
 select: %{id: p0.id, bought_on: p0.bought_on, bought_from: p0.bought_from, code: fragment("substring(? from '^\d+.\d+')", p0.name)}>
iex(3)> Ecto.Adapters.SQL.to_sql(:all, Repo, query)
{"SELECT p0.\"id\", p0.\"bought_on\", p0.\"bought_from\", substring(p0.\"name\" from '^\d+.\d+') FROM \"plants\" AS p0 INNER JOIN (SELECT min(p0.\"id\") AS \"id\" FROM \"plants\" AS p0 GROUP BY substring(p0.\"name\" from '^\d+.\d+')) AS s1 ON s1.\"id\" = p0.\"id\"",
 []}
iex(4)>

As such your migration plan still doesn’t quite work because of the accession_id. You should probably

First generate the accessions records from the plants table, auto generating the ids in the accessions table
Populate the plants.accession_id by performing a join between accessions.code and substring(plants.name from '^\d+\.\d+'))
Populate plants.code from substring(plants.name from '\d+$'))