How to create a hierarchical taxonomy (association) in Phoenix

I want to categorize the content in my application by terms like this:

  • Education (parent category)
    – Universities
    – Colleges
    – High Schools

  • Stores (parent category)
    – Jewelry
    – Clothes

My question, what is the best way to model such taxonomy trees in the models and database (I use PostgreSQL). How to store the data about parents and children filter results by either parent level term or child level terms? How to store the parent-child relationship in the database?

There might be a situation where the tree might extend to 3 levels, not only two.

1 Like

I usually do this with a parent ID referencing the same table. Perhaps something like:

| ID  | Name             | Parent ID |
| --- | ---------------- | --------- |
| 1   | Fake University  | NULL      |
| 2   | Fake College     | 1         |
| 3   | Fake High School | 2         |

This could be nested infinitely. The benefit here is that records could be “moved” (changing parents) without modifying all children.

Another option would be to use the LTREE type in PostgreSQL, which I believe could be made to work in Ecto, though I do not have experience with this. Using an LTREE would mean that children would have to get modified when you move a record, but it should make it easier to query child records without using things like recursive queries.

My use cases tend to not require that much in the way of querying, so that’s why I go with the parent IDs similar to what’s listed above.

2 Likes

In SQL terms there are a few main strategies:

  • Adjacency List typically models the parent/child relationship using an id and parent_id column. You can model different taxonomies in the same or separate tables. In the same table you would just create a “top_of_tree” node with children “education” and “stores”. This model is quick to implement, easy to reason about - but not great if you want to “get all the children including all their children” type queries.

  • Nested Set models the data using, you guessed it, nested sets. It it great for “get all the children including all their children” type queries but it a little harder to reason about the implementation and can be slower to update the hierarchy.

  • For Postgres, the ltree uses materialized path has the advantage that the hierarchy of a given row is stored in the row. But updating the hierarchy can be a challenge since all rows may have to be updated.

This article might be helpful for you in making choices. As a starting point you might consider this ltree package on hex since it requires minimal changes to your database schema to get going and leverages inbuilt Postgres capabilities. For adjacency lists you might consider this package.

7 Likes

But this can be solved by associating an item in table B with the parent and the child in table A. But then I am not sure how to store that reference (I mean of parent and the child).

This is usually done with left and right fields associated with records…

Reading is speed up because You can query for all children and subchildren of a node with one query.

Writing is slowed down because adding a node, or moving a node in the tree requires updating all records.

For example (using lft and rgt)…

      def insert(%__MODULE__{} = resource, attrs, %__MODULE__{rgt: rgt} = parent) do
        new_resource =
          resource
          |> child_changeset(parent, attrs)
          |> put_assoc(:parent, parent)

        update_rgt_query = from(i in __MODULE__, where: i.rgt >= ^rgt, update: [inc: [rgt: 2]])
        update_lft_query = from(i in __MODULE__, where: i.lft > ^rgt, update: [inc: [lft: 2]])

        Multi.new()
        |> Multi.update_all(:update_rgt, update_rgt_query, [])
        |> Multi.update_all(:update_lft, update_lft_query, [])
        |> Multi.insert(:insert_resource, new_resource)
      end

The move query is even harder :slight_smile:

1 Like

I was unaware of the arbor package. Looks like good stuff!