Ecto: Incorrect prefix handling in belongs_to (Postgres schemas)

david_ex · September 15, 2019, 6:41pm

I’m having issues with using Ecto prefixes. The example below doesn’t quite reproduce the exact issue I’m having, but it should be close enough for them to be related.

Let’s say that we want to have some employee records with confidential data in a separate Postgres schema so that it’s only accessible to people with elevated access rights. We’ll use schema prefixes in Ecto to help us deal with this.

Let’s use the following migration:

defmodule EctoDemo.Repo.Migrations.AddEmployeeTables do
  use Ecto.Migration

  def change do
    create table("employees") do
      add(:name, :string, null: false)
    end

    create(index("employees", :name, unique: true))

    execute("CREATE SCHEMA gdpr")

    create table("employees", prefix: "gdpr") do
      add(:employee_record_id, references("employees"))
      add(:social_security_number, :string, null: false)
    end
  end
end

And let’s implement the following models to go with them:

defmodule EctoDemo.Gdpr.Employee do
  use Ecto.Schema

  @schema_prefix "gdpr"

  schema "employees" do
    belongs_to(:main_record, EctoDemo.Employee, foreign_key: :employee_record_id)

    field(:social_security_number, :string)
  end
end



defmodule EctoDemo.Employee do
  use Ecto.Schema

  schema "employees" do
    has_many(:gdpr_employees, EctoDemo.Gdpr.Employee, foreign_key: :employee_record_id)

    field(:name, :string)
  end
end

Let’s now play around in IEx:

iex> alias EctoDemo.Employee
iex> alias EctoDemo.Gdpr.Employee, as: GdprEmployee

iex(5)> Repo.insert_all(Employee, [[name: "John Doe"], [name: "Jane Smith"]])

19:56:29.581 [debug] QUERY OK db=4.3ms decode=5.0ms queue=1.6ms
INSERT INTO "employees" ("name") VALUES ($1),($2) ["John Doe", "Jane Smith"]
{2, nil}
iex(6)> Repo.insert_all(GdprEmployee, [[social_security_number: "111"], [social_security_number: "222"]])       

19:57:50.873 [debug] QUERY OK db=11.4ms queue=2.2ms
INSERT INTO "gdpr"."employees" ("social_security_number") VALUES ($1),($2) ["111", "222"]
{2, nil}

iex(7)> import Ecto.{Changeset, Query}

iex(14)> e = Repo.one(from e in GdprEmployee, where: e.id == 1, preload: :main_record)

20:01:17.982 [debug] QUERY OK source="employees" db=0.9ms queue=0.1ms
SELECT e0."id", e0."employee_record_id", e0."social_security_number" FROM "gdpr"."employees" AS e0 WHERE (e0."id" = 1) []
%EctoDemo.Gdpr.Employee{
  __meta__: #Ecto.Schema.Metadata<:loaded, "gdpr", "employees">,
  employee_record_id: nil,
  id: 1,
  main_record: nil,
  social_security_number: "111"
}
iex(15)> c = change(e, %{})                                                           
#Ecto.Changeset<action: nil, changes: %{}, errors: [],
 data: #EctoDemo.Gdpr.Employee<>, valid?: true>
iex(16)> empl = Repo.one(from e in Employee, where: e.id == 2)    

20:01:58.187 [debug] QUERY OK source="employees" db=1.6ms queue=1.6ms
SELECT e0."id", e0."name" FROM "employees" AS e0 WHERE (e0."id" = 2) []
%EctoDemo.Employee{
  __meta__: #Ecto.Schema.Metadata<:loaded, "employees">,
  gdpr_employees: #Ecto.Association.NotLoaded<association :gdpr_employees is not loaded>,
  id: 2,
  name: "Jane Smith"
}
iex(17)> c = put_assoc(c, :main_record, empl)                 
#Ecto.Changeset<
  action: nil,
  changes: %{
    main_record: #Ecto.Changeset<action: :update, changes: %{}, errors: [],
     data: #EctoDemo.Employee<>, valid?: true>
  },
  errors: [],
  data: #EctoDemo.Gdpr.Employee<>,
  valid?: true
> 
iex(18)> Repo.update!(c)

20:02:32.982 [debug] QUERY OK db=0.8ms queue=0.1ms
begin []
 
20:02:32.992 [debug] QUERY OK db=2.5ms
UPDATE "gdpr"."employees" SET "employee_record_id" = $1 WHERE "id" = $2 [2, 1]
 
20:02:32.997 [debug] QUERY OK db=4.9ms
commit []
%EctoDemo.Gdpr.Employee{
  __meta__: #Ecto.Schema.Metadata<:loaded, "gdpr", "employees">,
  employee_record_id: 2,
  id: 1,
  main_record: %EctoDemo.Employee{
    __meta__: #Ecto.Schema.Metadata<:loaded, "gdpr", "employees">,
    gdpr_employees: #Ecto.Association.NotLoaded<association :gdpr_employees is not loaded>,
    id: 2,
    name: "Jane Smith"
  },
  social_security_number: "111"
}

And here’s the unexpected behavior… This works as expected:

iex(26)> Repo.one(from e in Employee, where: e.id == 2, preload: :gdpr_employees)

20:37:18.966 [debug] QUERY OK source="employees" db=3.4ms queue=0.1ms
SELECT e0."id", e0."name" FROM "employees" AS e0 WHERE (e0."id" = 2) []
 
20:37:18.967 [debug] QUERY OK source="employees" db=0.7ms
SELECT e0."id", e0."employee_record_id", e0."social_security_number", e0."employee_record_id" FROM "gdpr"."employees" AS e0 WHERE (e0."employee_record_id" = $1) ORDER BY e0."employee_record_id" [2]
%EctoDemo.Employee{
  __meta__: #Ecto.Schema.Metadata<:loaded, "employees">,
  gdpr_employees: [
    %EctoDemo.Gdpr.Employee{
      __meta__: #Ecto.Schema.Metadata<:loaded, "gdpr", "employees">,
      employee_record_id: 2,
      id: 1,
      main_record: #Ecto.Association.NotLoaded<association :main_record is not loaded>,
      social_security_number: "111"
    }
  ],
  id: 2,
  name: "Jane Smith" 
}

But this doesn’t:

iex(24)> Repo.one(from g in GdprEmployee, where: g.id == 1, preload: :main_record)

20:11:00.710 [debug] QUERY OK source="employees" db=1.9ms queue=0.1ms
SELECT e0."id", e0."employee_record_id", e0."social_security_number" FROM "gdpr"."employees" AS e0 WHERE (e0."id" = 1) []
 
20:11:00.712 [debug] QUERY ERROR source="employees" db=0.0ms queue=1.7ms
SELECT e0."id", e0."name", e0."id" FROM "gdpr"."employees" AS e0 WHERE (e0."id" = $1) [2]
** (Postgrex.Error) ERROR 42703 (undefined_column) column e0.name does not exist

    query: SELECT e0."id", e0."name", e0."id" FROM "gdpr"."employees" AS e0 WHERE (e0."id" = $1)
    (ecto_sql) lib/ecto/adapters/sql.ex:629: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5
    (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (elixir) lib/enum.ex:1336: Enum."-map/2-lists^map/1-0-"/2

It appears that Ecto isn’t navigating the relationship properly, and is trying to incorrectly load the parent record from the “gdpr” prefix/schema (2nd query).

Is this an Ecto bug? Or am I doing something wrong?

aseigo · September 16, 2019, 8:49am

From the docs:

@schema_prefix - configures the schema prefix. Defaults to nil, which generates structs and queries without prefix. When set, the prefix will be used by every built struct and on queries whenever the schema is used in a from or a join. In PostgreSQL, the prefix is called “SCHEMA” (typically set via Postgres’ search_path). In MySQL the prefix points to databases.

So it will use that prefix when used in a query at runtime as well as generating structs at compile time. I wonder if you explicitly set the @schema_prefix in the EctoDemo.Employee module to “public” if that would force it to look in the correct place even when used from EctoDemo.Employee.GDPR? I don’t know (would have to look into the code) if the prefixes “stack” during queries, with the default nil value causing the schema to be “inherited” in the query, or not, but it would be something simple to try?

david_ex · September 17, 2019, 1:03pm

Unfortunately, I tried that but it doesn’t appear to change anything.

In the mean time, I’ve also discovered that there’s unexpected behavior when handling foreign key constraints across schema prefixes. I’ve opened an issue in Ecto for this (https://github.com/elixir-ecto/ecto/issues/3112) and have create a small project reproducing the problem (https://github.com/davidsulc/ecto_demo). Feel free to check out the project’s readme if this intrigues you