Odd binary id mismatch

Hello, i have fallowing code in my tests:

source = %MyModule{} |> Repo.insert!()
new = Repo.get!(MyModule, source.id)

assert new.id == source.id

but oddly enough it throws an error, as you can see ids are slightly different.

 Assertion with == failed
 code:  assert new.id() == source.id()
 left:  "3f3fd9a2-3f69-463e-3f3f-153f34003f3f"
 right: "85c9d9a2-a969-463e-a8d5-15f834009ebc"

MyModule schema uses following base schema

defmodule Stats.Schema do
  defmacro __using__(_) do
    quote do
      use Ecto.Schema
      @primary_key {:id, :binary_id, autogenerate: true}
      @foreign_key_type :binary_id
    end
  end
end

How come ecto can find the row in DB by id, but later when comparing those ids, they mismatch?
is it some binary encoding problem?

Im use ecto: 2.2.10, mariaex: 0.8.4, mysql: 5.7.20

Thanks

1 Like

Can you please provide the actual schema definition and migration file?

Just an aside; possibly relevant:

1:

MySQL does not support UUID types. Ecto emulates them by using binary(16).

2:

Because MySQL does not support RETURNING clauses in INSERT and UPDATE, it does not support the :read_after_writes option of Ecto.Schema.field/3.

3: TIL: Ecto reading after writes:

by default, Ecto doesn’t read data back from the database, after writing new or updated data.

5.7.1 was released 2013-04-23. According to GUID/UUID Performance Breakthrough

This blog is mostly eliminated in MySQL 8.0

i.e. pre-8.0 UUIDs could be problematic/high maintenance.

See also:

+1

4 Likes

Migration:

defmodule Stats.Repo.Migrations.CreateSourcesTable do
  use Ecto.Migration

  def change do
    create table(:sources) do
      add :nth, :integer, null: false
      add :url, :string, null: false
      add :contents, :text
    end

    create index("sources", [:nth], unique: true)
    create index("sources", [:url], unique: true)
  end
end

Schema:

defmodule Stats.Source do
  use Stats.Schema
  import Ecto.Changeset

  schema "sources" do
    has_many(:meetings, Stats.Meeting)

    field(:nth, :integer)
    field(:url, :string)
    field(:contents, :string)
  end
end

I’m guessing that the above simply, implicitly adds a

field :id, :id, primary_key: true

In the SQL shell, do a DESCRIBE sources; and see what the type of the id column actually is. Given Ecto’s documentation I would expect a binary(16) for a UUID column - a standard :id will be some kind of integer.

If the id column is the wrong type, I’d start with

defmodule Stats.Repo.Migrations.CreateSourcesTable do
  use Ecto.Migration

  def change do
    create table(:sources, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :nth, :integer, null: false
      add :url, :string, null: false
      add :contents, :text
    end

    create index("sources", [:nth], unique: true)
    create index("sources", [:url], unique: true)
  end
end

and go from there - i.e. see if it works or if it simply changes the problem.


defmodule Stats.Schema do
  defmacro __using__(_) do
    quote do
      use Ecto.Schema
      @primary_key {:id, :binary_id, autogenerate: true}
      @foreign_key_type :binary_id
    end
  end
end

only has an effect on Schemas that use it - it doesn’t impact the migration at all - so there needs to be some kind of hint in the migration that the table is dealing with a :binary_id instead of a :id type.

1 Like

i completely forget to add mysql schema. sorry.

CREATE TABLE `sources` (
  `id` binary(16) NOT NULL,
  `nth` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `contents` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`id`),
  UNIQUE KEY `sources_nth_index` (`nth`),
  UNIQUE KEY `sources_url_index` (`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

in my config i also have:

config :stats, Stats.Repo, migration_primary_key: [id: :uuid, type: :binary_id]

Have your tried:

  source = %MyModule{} |> Repo.insert!()
  new = Repo.get!(MyModule, source.id)
  new2 = Repo.get!(MyModule, new.id)

  assert new2.id == new.id

What is not clear is whether new actually originates from the same record that was created by the previous Repo.insert!. If there are other records in the table, new could be an entirely different record.

Some experiments with iEX and the SQL shell could shed some light on that:

  • after the insert - is there a record with the new UUID (e.g. 3f3fd9a2-3f69-463e-3f3f-153f34003f3f)
  • after the retrieve - is there a record with the different UUID (e.g. 85c9d9a2-a969-463e-a8d5-15f834009ebc)
  • does the remainder of the retrieved record match the former or the latter record content?

This was never resolved:

1 Like

What is not clear is whether new actually originates from the same record that was created by the previous Repo.insert!. If there are other records in the table, new could be an entirely different record.

Didn’t quite catch. Its internal Ecto function Repo.get! which retrieves a new record by id. It isn’t in any way tied to the previous source record. The table is empty before that and id column is a primary index (must be unique), so in either case, there couldn’t be duplicate ids in the table.

source = %MyModule{} |> Repo.insert!()
new = Repo.get!(MyModule, source.id)
new2 = Repo.get!(MyModule, new.id)

assert new2.id == new.id

tried and this works fine

I updated MySQL to 8.0.11 version, but there still is a problem :sweat:

please let us know the result of Repo.all

I’d go even further. Once you insert a record in iEX, use the MySQL shell to show you what value in the id column actually is. Something like:

SELECT HEX(id), nth, url, contents FROM sources;

That really is the only way to know what was actually written to the database as the UUID data can just as easily be compromised on the Repo.all return trip.

The table is empty before that and id column is a primary index (must be unique), so in either case, there couldn’t be duplicate ids in the table.

The fundamental issue here is that you have shown that in your environment the result of the query cannot necessarily be trusted. Given that there is only one record in the table, multiple distinct IDs seem to be returning the same record which can only have one ID. So there could be a whole set of ID values capable of selecting this one record.


You write the record with ID A and you retrieve with ID A something with ID B.

  • If the query works then ID A is in the database and ID B is a version of ID A that is somehow corrupted on the way back from the database.
  • If the query works - ID A could be corrupted before being written as ID B to the database. ID A could be corrupted in a similar fashion during the query becoming ID B and then retrieving the data of the ID B record.
  • If the query doesn’t work it simply returns the only row in the table. While ID B is different from ID A it’s not clear whether ID A was corrupted before writing it to the database or after reading it from the database.

So at this point superficially it looks like ID B is in the database but it could be corrupted on read, so ID A could still exist in the database.

Now you retrieve with ID B and get something with ID B.

  • If you believe the query works and take the previous observations into account then you have to believe the whatever corrupted ID A into ID B doesn’t change the value of ID B - allowing the query to actually seem to work the way it should if ID B is actually in the database.
  • If you are open to the possibility that the query simply returned the only record in the table then you can’t have any confidence in the ID value being retrieved as it is clearly being corrupted somewhere - before writing, after reading, both?

Given that it seems that the ID value is being corrupted somewhere, there is the possibility that sometimes ID related queries will not work (depending on the nature of the corruption).


To narrow this down you should create a minimal ecto-only project with a single table in a new database with an UUID column and a second (string) column for information that lets you uniquely identify each record created for inspection.

Set up a unit test similar to the one you have. You may have to run the test repeatedly until it fails (or some max is reached) - sometimes these type of defects are intermittent.

Once you have a setup that consistently fails in your environment, put the project up on GitHub and reference the repository in the issue. That way somebody else can easily try it in their (MySQL) environment to see if the problem behaviour persists.

1 Like

As you suggested, I created separate mix project with ecto and minimal migrations and model to replicate this issue: https://github.com/revati/ecto_mysql_bug

1 Like

this link helps gives some explanation on how to insert an elixir UUID:

we have also seen success with using the utf8 character set

this isn’t exactly the same as this example. As shown in https://github.com/elixir-ecto/ecto/issues/2602

# insert query
INSERT INTO `my_modules` (`id`) VALUES (?) [<<246, 200, 234, 122, 29, 144, 70, 72, 176, 30, 56, 120, 54, 0, 135, 84>>]

# select query
SELECT m0.`id` FROM `my_modules` AS m0 WHERE (m0.`id` = ?) [<<246, 200, 234, 122, 29, 144, 70, 72, 176, 30, 56, 120, 54, 0, 135, 84>>]

ecto doesn’t use MySQL functions to convert to binary and vice versa.

We changed the collation set of our DB from latin1_swedish (defaulton mysql 5.7) to ut8 and it solved our problem

2 Likes