Enum, Joins, or Strings

This is just a quick database design discussion. When you encounter a field in a database that can take on one of a set number of values, how do you handle storing them?

Let me give an example: A database that has a table for cars stores the condition the car is in. Say the options are: “New”, “Excellent”, “Good”, “Average”, and “Bad”.

The way I see it there are 3 options here in storing these conditions:

  1. Create a separate “condition” table with a relationship between that and the car table. This means you can associate more information with the condition.
  2. Store the condition in an enumerable type with the ecto enum library. Postgres handles the validation and such for you.
  3. Store the condition as a string and do all validation and such on the server side of things. This might be the simplest, but it’s a lot less flexible.

What do you all think is the best approach and why? Under what conditions would you choose one approach over another?

:wave:

I usually go with 1.

# migration
create table(:car_conditions, primary_key: false) do
  add :name, :string, primary_key: true
end

flush()

Repo.insert_all("car_conditions", [
  %{name: "new"},
  %{name: "excellent"},
  # ...
])

create table(:cars) do
  # ...
  add :contidion_name,
          references(:car_conditions,
            column: :name,
            type: :string,
            on_delete: :nilify_all,
            on_update: :update_all
          ),
          default: "good"
  # ...
end

# schema
schema "cars" do
  # ...
  field :condition_name, :string # or a custom ecto type like CarCondition
  # or field :condition, :string, source: :condition_name
  # or belongs_to if you have a schema for the condition type
  # ...
end

What do you all think is the best approach and why? Under what conditions would you choose one approach over another?

I think all approaches are more or less equivalent … I’ve found 1. a bit easier to manage than enums.

Depends a lot on the use case. For things that will very likely not change enums are useful.

For example if you have a discount schema and one of the fields is value_type which can either be fixed or percent then an enum fits here very well. The ecto enum lib is quite nice for this because you can reference your values as :fixed or :percent in the code base, so it retains readability.

1 Like

It’s possible with approach 1. as well by providing a custom ecto type for the field

schema "cars" do
  field :condition_name, CarCondition
end

The code for CarCondition would be the same as for an enum except for type/0 callback (:string instead of enum name), so ecto enum could probably be used as well.

Some questions that help me decide tables vs enums:

  • Does the code care about the values?

    • a type like “Car Condition” might only be used mostly for filtering / sorting. Code generally only interacts with values of this type in aggregate (“show a list of all conditions”) or from user input ("show cars with the car condition in params[:condition]"). Tables work great for this.

    • on the other hand, a type like “Order Status” that represents an order’s flow through a series of processes is used more specifically. Some code still uses values of this type for filtering and sorting, but code also refers to specific values of the type ("do X if the order status is new"). An enum makes a lot of sense here.

  • What does adding new values to the list mean?

    • Adding a new “Car Condition” value - “Shiny”, say - might not require any code changes at all
    • adding a new “Order Status” is likely to require code changes to do anything meaningful
  • What values need to be available during tests?

    • Tests might only need a non-empty set of available “Car Conditions”
    • Code will not work without a full set of “Order Statuses”

As with everything, there’s a lot of fuzziness - for instance, what if there’s a need to calculate an “average” car condition?

Approach 3 can be useful for short periods - for instance, if you’re prototyping a system and discovering which values should exist. Long-term, it can be hazardous to legacy data integrity.

Other approaches worth considering:

  • map the values to an integer for storage; this requires some discipline to not reuse values long-term. If you pick the right order - “New” => 5, “Excellent” => 4, etc in the car example - things like “average condition” are readily computable. Making an Ecto type should wrap that up neatly.

  • in the enum case, consider a table of “extra data” with a primary key of the enum type - that would allow using a standard belongs_to to fetch the “extra data” for a record

2 Likes

Then I use an enumeration type, none of those 3 choices/options are required then.