I want to store books details in a PostgreSQL table. There will be a field with unique constraint holding book’s ISBN-13 number (example formatted as string as:
978-0-545-01022-1). This field can be the primary key field and would be indexed.
My question, should I store it as:
978-0-545-01022-1 i.e. string
9780545010221 i.e. integer ?
Performance wise and/or due to other considerations, what do you think? Thank you.
I think the answer partly depends on whether you care about the sub parts of of the ISBN-13, for example searching on books in a certain country or publisher. it also depends on whether you need to support ISBN-10 which allows for the character
X as the check digit.
If you store as a string then I would suggest you remove any non-digits to formatting separate from validation.
There is a GTIN validation lib you might find useful too since an ISBN-13 is a subset of a GTIN.
As ISBN numbers can start with 0 the only proper way to store it is to use string or array of digits.
Seems true for old ISBN i.e. ISBN-10 but for new version ISBN-13, maybe cannit start with zero. Actually, I will convert all to ISBN-13 before storing in DB.
But is there a performance impact between storing it as integer or string? When searching the DB? or making joins?
An isbn is an identifier not a number. You’ll never want to perform arithmetic with isbns. Therefore you probably should use a string column and a canonical format for the isbn. I see many identifiers more like names, which just happen to be only comprised of digits.
More important than the type, is probably the index. If you do not have that column indexed, its slow…
I will store all as ISBN13, but yet there is some not programming related issue here:
An ISBN13 looks like this:
All fine, there are 5 groups within the number and they don’t have fixed size. The second group can be 1 or 2 digits. So, it is logical that the dashes should be stored as part of the identifying. Yet, all books I have tested have bar-codes storing the ISBN13 as digits only (integer) with no dashes.
The whole point is to read with a bar-code scanner the ISBN13 and look up book’s info in the database. As such, storing it as an integer, the way the barcodes are printed on the books is the way to go.
And of course, all can be changed later, depending on the needs.
No, bardcodes do not store them as integer, barcodes do “store” them as individual digits.
Barcodes have a very limited alphabet available, it does only know about the digits 0 through 9. And maybe one or two extra “characters”, but thats basically it.
You could do it the hard way and make a Composite Type or a Domain…
Or do it the easy way and use the
isn module, supplied with Postgres, which probably has what you need.
Good news: Amazon supports
isn on RDS.
You might need deal with low-level Ecto / Postgrex primitives to use
isn with Ecto, but if that is a core concern of your application it could be worth the effort.
evadne=# select '978-0-545-01022-1'::isbn13;
evadne=# select '9780545010221'::isbn13;
- In certain cases you can cheese it by specifying a field as string in Ecto and actually using a more detailed type in Postgres.