I’m working my way through Programming Phoenix at the moment, and I got to the chapter where the Ecto Schema+Migration is set up. They do something like this:
def change do
create table(:users) do
add :name, :string
add :username, :string, null: false
add :password_hash, :string
create unique_index(:users, [:username])
I was wondering: If you’re going to put a unique index on username anyway, why even use the Ecto :id field and not user :username as primary key for the table?
This discussion really is about the use of “natural” vs artificial ids in databases. There are several reasons why a dedicated
:id column is generally preferred, even when usernames are considered unique. A big part of it boils down to “foreign keys”. That is, when you have other tables, and those tables want to reference your
users table by its primary key, using an integer
:id is both more efficient and more resilient to change. If you use the username column, then if you change a user’s username not only do you have to change it in the users table, you have to change it all over your database to update rows that point to that user by username.
Thanks, Ben, that makes a lot of sense!
Thanks. That helped.
But does it also I apply to my problem? Which is:
I have a list of users. And I list of texts. Each user has written 1+ texts (has_many). Every text has the user_name, but does not know the id of the user. This id only only created during the insertion.
What would I do here? Create a id beforehand from an hash?
The user exists before the text the user writes, so the
users.id column exists before you need to populate the
texts.user_id column. QED
Well, imagine you are doing a mass import from different souces. You have users and there user_names, perhaps you even have user_ids, when these are your own databases, but these ids do colide, because the were autogenerated with an iterator. What do than?
@sehHeiden when doing imports you disregard id values provided from other sources. Instead, bulk insert your users
returning: [:id] and then map those results to the other table.
Generally if I already have some records in a DB with auto-gen IDs, and I’m importing a bunch of new ones then I’ll create an offset and add that to the new records, so say the highest ID I have for my existing records is 1234 and they grow at like 10 per day, then when I import I’ll add something like 5000 to each ID. And at the end of the import I’ll increment the sequence (assumes postgres, but there’s some equivalent in other DBs) with
Repo.query "ALTER SEQUENCE foos_id_seq RESTART WITH 10000"