Odd binary id mismatch

revati · June 17, 2018, 9:19am

Hello, i have fallowing code in my tests:

source = %MyModule{} |> Repo.insert!()
new = Repo.get!(MyModule, source.id)

assert new.id == source.id

but oddly enough it throws an error, as you can see ids are slightly different.

 Assertion with == failed
 code:  assert new.id() == source.id()
 left:  "3f3fd9a2-3f69-463e-3f3f-153f34003f3f"
 right: "85c9d9a2-a969-463e-a8d5-15f834009ebc"

MyModule schema uses following base schema

defmodule Stats.Schema do
  defmacro __using__(_) do
    quote do
      use Ecto.Schema
      @primary_key {:id, :binary_id, autogenerate: true}
      @foreign_key_type :binary_id
    end
  end
end

How come ecto can find the row in DB by id, but later when comparing those ids, they mismatch?
is it some binary encoding problem?

Im use ecto: 2.2.10, mariaex: 0.8.4, mysql: 5.7.20

Thanks

jeremyjh · June 17, 2018, 1:58pm

Can you please provide the actual schema definition and migration file?

peerreynders · June 17, 2018, 3:17pm

Just an aside; possibly relevant:

1:

MySQL does not support UUID types. Ecto emulates them by using binary(16).

2:

Because MySQL does not support RETURNING clauses in INSERT and UPDATE, it does not support the :read_after_writes option of Ecto.Schema.field/3.

3: TIL: Ecto reading after writes:

by default, Ecto doesn’t read data back from the database, after writing new or updated data.

5.7.1 was released 2013-04-23. According to GUID/UUID Performance Breakthrough

This blog is mostly eliminated in MySQL 8.0

i.e. pre-8.0 UUIDs could be problematic/high maintenance.

See also:

+1

revati · June 18, 2018, 8:53am

Migration:

defmodule Stats.Repo.Migrations.CreateSourcesTable do
  use Ecto.Migration

  def change do
    create table(:sources) do
      add :nth, :integer, null: false
      add :url, :string, null: false
      add :contents, :text
    end

    create index("sources", [:nth], unique: true)
    create index("sources", [:url], unique: true)
  end
end

Schema:

defmodule Stats.Source do
  use Stats.Schema
  import Ecto.Changeset

  schema "sources" do
    has_many(:meetings, Stats.Meeting)

    field(:nth, :integer)
    field(:url, :string)
    field(:contents, :string)
  end
end

peerreynders · June 18, 2018, 1:40pm

revati:

defmodule Stats.Repo.Migrations.CreateSourcesTable do
  use Ecto.Migration

  def change do
    create table(:sources) do
      add :nth, :integer, null: false
      add :url, :string, null: false
      add :contents, :text
    end

    create index("sources", [:nth], unique: true)
    create index("sources", [:url], unique: true)
  end
end

I’m guessing that the above simply, implicitly adds a

field :id, :id, primary_key: true

In the SQL shell, do a DESCRIBE sources; and see what the type of the id column actually is. Given Ecto’s documentation I would expect a binary(16) for a UUID column - a standard :id will be some kind of integer.

If the id column is the wrong type, I’d start with

defmodule Stats.Repo.Migrations.CreateSourcesTable do
  use Ecto.Migration

  def change do
    create table(:sources, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :nth, :integer, null: false
      add :url, :string, null: false
      add :contents, :text
    end

    create index("sources", [:nth], unique: true)
    create index("sources", [:url], unique: true)
  end
end

and go from there - i.e. see if it works or if it simply changes the problem.

defmodule Stats.Schema do
  defmacro __using__(_) do
    quote do
      use Ecto.Schema
      @primary_key {:id, :binary_id, autogenerate: true}
      @foreign_key_type :binary_id
    end
  end
end

only has an effect on Schemas that use it - it doesn’t impact the migration at all - so there needs to be some kind of hint in the migration that the table is dealing with a :binary_id instead of a :id type.

revati · June 18, 2018, 1:54pm

i completely forget to add mysql schema. sorry.

CREATE TABLE `sources` (
  `id` binary(16) NOT NULL,
  `nth` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `contents` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`id`),
  UNIQUE KEY `sources_nth_index` (`nth`),
  UNIQUE KEY `sources_url_index` (`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

in my config i also have:

config :stats, Stats.Repo, migration_primary_key: [id: :uuid, type: :binary_id]

peerreynders · June 18, 2018, 2:35pm

Have your tried:

  source = %MyModule{} |> Repo.insert!()
  new = Repo.get!(MyModule, source.id)
  new2 = Repo.get!(MyModule, new.id)

  assert new2.id == new.id

What is not clear is whether new actually originates from the same record that was created by the previous Repo.insert!. If there are other records in the table, new could be an entirely different record.

Some experiments with iEX and the SQL shell could shed some light on that:

after the insert - is there a record with the new UUID (e.g. 3f3fd9a2-3f69-463e-3f3f-153f34003f3f)
after the retrieve - is there a record with the different UUID (e.g. 85c9d9a2-a969-463e-a8d5-15f834009ebc)
does the remainder of the retrieved record match the former or the latter record content?

This was never resolved:

revati · June 21, 2018, 4:05pm

What is not clear is whether new actually originates from the same record that was created by the previous Repo.insert!. If there are other records in the table, new could be an entirely different record.

Didn’t quite catch. Its internal Ecto function Repo.get! which retrieves a new record by id. It isn’t in any way tied to the previous source record. The table is empty before that and id column is a primary index (must be unique), so in either case, there couldn’t be duplicate ids in the table.

source = %MyModule{} |> Repo.insert!()
new = Repo.get!(MyModule, source.id)
new2 = Repo.get!(MyModule, new.id)

assert new2.id == new.id

tried and this works fine

revati · June 21, 2018, 4:06pm

I updated MySQL to 8.0.11 version, but there still is a problem

revati · June 22, 2018, 10:18am

github.com/elixir-ecto/ecto

binary id odd bug

opened 10:20AM - 22 Jun 18 UTC

closed 12:22PM - 11 Jul 18 UTC

revati

### Environment **Elixir version (elixir -v):** Erlang/OTP 20 [erts-9.2] [so…urce] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace] Elixir 1.6.1 (compiled with OTP 20) **Database and version (PostgreSQL 9.4, MongoDB 3.2, etc.):** MySQL 5.7.22 and 8.0.11 **Ecto version (mix deps):** ecto 2.2.10 (Hex package) (mix) locked at 2.2.10 (ecto) e7366dc8 **Database adapter and version (mix deps):** mariaex 0.8.4 (Hex package) (mix) locked at 0.8.4 (mariaex) 5dd42a60 **Operating system:** Mac OS 10.13.5 (17F77) ### Current behavior There are 2 problems: * Ids mismatch when inserting a new row * I can fetch single row by id using two different ids. ```elixir entity = %MyModule{} |> Repo.insert! entity.id # 38b128b1-ca17-4570-a027-01c81a595db4 new = Repo.get! MyModule, entity.id new.id # 383f283f-3f17-4570-3f27-013f1a595d3f entity.id == new.id # false # I can even do id = Ecto.UUID.generate # 2065b985-6b29-4511-b826-91c5a365f254 entity = %MyModule{id: id} |> Repo.insert! entity.id # 2065b985-6b29-4511-b826-91c5a365f254 new = Repo.get! MyModule, id new.id # 383f283f-3f17-4570-3f27-013f1a595d3f entity.id == new.id # false entity.id == id # true new.id == id # false new2 = Repo.get! MyModule, new.id new2.id == new.id # true ``` Somehow when I insert a new row, id in mysql table is switched to different one. Somehow I still am able to query that row by both ids: initial one and switched one. in all examples `entity`, `new` and `new2` is the same row in DB, as only one row in DB exists. ### Expected behavior ```elixir entity = %MyModule{} |> Repo.insert! new = Repo.get! MyModule, entity.id entity.id == new.id # true ``` I expect when I insert new row either with or without id after `entity = %MyModule{} |> Repo.insert!` `entity` will hold actuall id, that is in db. if this uuid is passed to me from other system (api consumer) i must save row with that id. Other thing i expect is that only one uuid can be used to fetch row from db by id.

peerreynders · June 22, 2018, 1:11pm

please let us know the result of Repo.all

I’d go even further. Once you insert a record in iEX, use the MySQL shell to show you what value in the id column actually is. Something like:

SELECT HEX(id), nth, url, contents FROM sources;

That really is the only way to know what was actually written to the database as the UUID data can just as easily be compromised on the Repo.all return trip.

The table is empty before that and id column is a primary index (must be unique), so in either case, there couldn’t be duplicate ids in the table.

The fundamental issue here is that you have shown that in your environment the result of the query cannot necessarily be trusted. Given that there is only one record in the table, multiple distinct IDs seem to be returning the same record which can only have one ID. So there could be a whole set of ID values capable of selecting this one record.

You write the record with ID A and you retrieve with ID A something with ID B.

If the query works then ID A is in the database and ID B is a version of ID A that is somehow corrupted on the way back from the database.
If the query works - ID A could be corrupted before being written as ID B to the database. ID A could be corrupted in a similar fashion during the query becoming ID B and then retrieving the data of the ID B record.
If the query doesn’t work it simply returns the only row in the table. While ID B is different from ID A it’s not clear whether ID A was corrupted before writing it to the database or after reading it from the database.

So at this point superficially it looks like ID B is in the database but it could be corrupted on read, so ID A could still exist in the database.

Now you retrieve with ID B and get something with ID B.

If you believe the query works and take the previous observations into account then you have to believe the whatever corrupted ID A into ID B doesn’t change the value of ID B - allowing the query to actually seem to work the way it should if ID B is actually in the database.
If you are open to the possibility that the query simply returned the only record in the table then you can’t have any confidence in the ID value being retrieved as it is clearly being corrupted somewhere - before writing, after reading, both?

Given that it seems that the ID value is being corrupted somewhere, there is the possibility that sometimes ID related queries will not work (depending on the nature of the corruption).

To narrow this down you should create a minimal ecto-only project with a single table in a new database with an UUID column and a second (string) column for information that lets you uniquely identify each record created for inspection.

Set up a unit test similar to the one you have. You may have to run the test repeatedly until it fails (or some max is reached) - sometimes these type of defects are intermittent.

Once you have a setup that consistently fails in your environment, put the project up on GitHub and reference the repository in the issue. That way somebody else can easily try it in their (MySQL) environment to see if the problem behaviour persists.

revati · June 22, 2018, 3:19pm

As you suggested, I created separate mix project with ecto and minimal migrations and model to replicate this issue: https://github.com/revati/ecto_mysql_bug

eljorge · July 24, 2018, 10:44pm

this link helps gives some explanation on how to insert an elixir UUID:

eljorge · July 25, 2018, 6:07pm

we have also seen success with using the utf8 character set

revati · July 26, 2018, 10:17am

this isn’t exactly the same as this example. As shown in https://github.com/elixir-ecto/ecto/issues/2602

# insert query
INSERT INTO `my_modules` (`id`) VALUES (?) [<<246, 200, 234, 122, 29, 144, 70, 72, 176, 30, 56, 120, 54, 0, 135, 84>>]

# select query
SELECT m0.`id` FROM `my_modules` AS m0 WHERE (m0.`id` = ?) [<<246, 200, 234, 122, 29, 144, 70, 72, 176, 30, 56, 120, 54, 0, 135, 84>>]

ecto doesn’t use MySQL functions to convert to binary and vice versa.

eljorge · July 30, 2018, 4:51pm

We changed the collation set of our DB from latin1_swedish (defaulton mysql 5.7) to ut8 and it solved our problem