Citext extension and making case-insensitive columns in PostGresQL

Building off of this older post:

What I need to do is to create a few case-insensitive columns in PostGres. I’ve been following this post: Case insensitive column in Phoenix | Adam Niedzielski but somehow I always seem to find a way to take something off the rails.

In particular, it seems that any citext column cannot specify a maximum length.

Here is my schema that works, but note that for all the citext fields, I can’t specify a length.

defmodule :"Elixir.Languages.Repo.Migrations.Languages" do
  use Ecto.Migration

  def change do
    execute "CREATE EXTENSION citext", "DROP EXTENSION citext"

    create table(:languages, primary_key: false) do
      add :id, :citext, null: false, primary_key: true  # 2-character ISO 639_1
      add :code_639_2t, :citext, null: false
      add :code_639_2b, :citext, null: false
      add :name, :string, size: 80, null: false
    end

    # Ensure case insensitivity for ISO identifiers
    create index(:languages, ["lower(id)"], name: :languages_index, unique: true)
    create index(:languages, ["lower(code_639_2t)"], name: :languages_639_2t_index, unique: true)
    create index(:languages, ["lower(code_639_2b)"], name: :languages_639_2b_index, unique: true)

  end
end

Before I tried the citext, I used strings with specific character limits:

add :id, :string, size: 2, null: false, primary_key: true  # 2-character ISO 639_1
add :code_639_2t, :string, size: 3, null: false
add :code_639_2b, :string, size: 3, null: false
add :name, :string, size: 80, null: false

So I guess what really bothers me about citext is that it’s essentially a TEXT field, not a simple String or VARCHAR type field. In MySQL, the text fields could be a source of inefficiencies and they seem overblown for storing 3-character strings. Does anyone have thoughts or reassurances for dealing with this?

In PostgreSQL text data type is preferred over VARCHAR and in practice more performant, but you should not leave a text or citext field for end users without a constraint on maximum value size.

2 Likes

Interesting. Thanks for the tip! In my case, my Ecto schema has some validations that restrict the input size, so I think I’m good. I am just so used to having the database be the ultimate source on all limits and constraints that it feels weird to rely on the application layer for this, even for something as trivial as size limits.

Character limits on varchar columns are implemented as a check constraint on a text column. If you want to guarantee a limit in the database you can do the same thing with a length check on a citext type instead.

2 Likes

The plot thickened a bit when I tried to build this into Docker. When I have
execute "CREATE EXTENSION citext", "DROP EXTENSION citext"
in my migrations, bringing up the docker-compose stack produces this error:

Evaluation failed with: ERROR 42710 (duplicate_object) extension "citext" already exists

Solution is simple enough:

execute "CREATE EXTENSION IF NOT EXISTS citext", "DROP EXTENSION citext"

2 Likes