Partitioning Postgres tables by timestamp based UUIDs

Recently we partitioned a table with 28 million rows by its ULID to improve query speeds and reduce query timeouts. To do this I ended up pulling together various bits of information across the internet and thought it might be worth compressing it into a guide for future reference and discussion on areas I might have missed!

The problem

We had a very large table (28 million) rows that we were getting timeouts querying on anything other than the primary key. No matter what indexes we threw at the table we couldn’t get queries to complete in an acceptable time for use on the front end of the application.

Constraints

The table in question was where we stored and processed payloads from our customers, so it needed to be very quick for individual lookups in order to update its state as it was processed into the application and fast for broader retrieval to display on the front end of our application.

Problems with partitioning

Partitioning had always been an option and something we had attempted previously. Our original attempt to speed it up had been partitioning the table on the insertion time of the record but, if you’ve ever done partitioning before you will know, you cannot have a globally unique index.

To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint’s columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

Without the globally unique index for the id of the payload meant that queries for a specific payload had to check every partition. This made the individual processing of a given payload too slow to make this a workable scenario.

Partitioning with ULIDs

The solution then was to have the id of the payload be a ULID based on its inserted_at and partition the tables on the timestamp component of the ULID it generated on the database! This means we could retain a global index for quick individual row lookups and have a way to ignore irrelevant partitions based on the id.

This took a few different steps…

Preparing the database

Postgres doesn’t with with ULIDs directly but between Ecto and Postgres they are cast to a UUID in the database. We needed a method for Postgres to take a ULID and convert it to the corresponding UUID that we can work with in SQL.

We added the functions here to a migration to make them available on the database.

While you’re on the database it’s worth ensuring that enable_partition_pruning is enabled as this is how Postgres is able to ignore irrelevant partitions.

Functions for manipulating ULIDs

To create the partitions we needed to be able to get the time portion of the ULID and get the bounds of any ids possible within the time.

This required slicing the time portion of the ULID and setting either “0” or “Z” depending on which bound we needed.

  @doc """
  Slices the time portion of a ULID
  """
  def ulid_time(ulid), do: String.slice(ulid, 0..9)

  @doc """
  Returns the time based start of a ULID
  """
  def ulid_start_time(ulid), do: ulid_time(ulid) |> String.pad_trailing(26, "0")

  @doc """
  Returns the time based end of a ULID
  """
  def ulid_end_time(ulid), do: ulid_time(ulid) |> String.pad_trailing(26, "Z")

  @doc """
  Returns the time based ULID with bounds from a Date or DateTime
  """
  def datetime_to_ulid(datetime, bound \\ nil)

  def datetime_to_ulid(date = %Date{}, bound),
    do: DateTime.new!(date, Time.new!(0, 0, 0)) |> datetime_to_ulid(bound)

  def datetime_to_ulid(datetime = %DateTime{}, bound) do
    case bound do
      nil -> DateTime.to_unix(datetime, :millisecond) |> Ecto.ULID.generate()
      :start -> datetime_to_ulid(datetime) |> ulid_start_time()
      :end -> datetime_to_ulid(datetime) |> ulid_end_time()
    end
  end

  def datetime_to_ulid(naive_datetime = %NaiveDateTime{}, bound),
    do: DateTime.from_naive!(naive_datetime, "Etc/UTC") |> datetime_to_ulid(bound)

Ensuring that timestamp and ids timestamp match

This is not strictly necessary for tables where the insertion time is close enough to the autogeneration of the id that you wont get too much drift but if there could be drift, or you want to partition on an timestamp that wont necessarily correlate with the id here is how. The following is all performed on the schema.

Start by turning off the autogeneration of the id.

@primary_key {:id, ULID, autogenerate: false}

Check if the changeset has an id, and if it doesn’t add one based on the timestamp you want to use. In the following example I have used occurred_at.

  defp create_id_if_not_exsits(changeset) do
    case get_change(changeset, :id) do
      nil ->
        occurred_at = get_change(changeset, :occurred_at)
        put_change(changeset, :id, ULIDUtils.datetime_to_ulid(occurred_at))

      _ ->
        changeset
    end
  end

Finally add it as part of your changeset.

  def changeset(changeset, attrs) do
    changeset
    # ...
    |> create_id_if_not_exsits()
    # other methods

Database migration

I cannot take full credit for this as it’s adapted from Matthew Johnston’s work on partitioning, changed to work with id’s.

We need to start with a couple of helper functions to help us partition on month. You might want something different so feel free to modify to fit your requirements.

    defp beginning_of_month(date) do
    if date.day == 1 do
      date
    else
      Date.add(date, -(date.day - 1))
    end
  end

  defp calculate_next_month(date, 0), do: date

  defp calculate_next_month(date, months) do
    next = Date.add(date, Date.days_in_month(date))
    calculate_next_month(next, months - 1)
  end

  def months_from(date, num_months) do
    start_date = beginning_of_month(date)

    for months <- 0..num_months do
      calculate_next_month(start_date, months)
    end
  end

We use the functions above and our ULID functions earlier to have a query that will generate the query to make each partition.

def create_partition_query(table, date) do
    start_date = date

    stop_date =
      date
      |> Date.add(Date.days_in_month(date))
      |> Date.add(-1)

    month =
      start_date.month
      |> Integer.to_string()
      |> String.pad_leading(2, "0")

    stop_datetime = DateTime.new!(stop_date, Time.new!(23, 59, 59, 999_999))
    end_ulid = datetime_to_ulid(stop_datetime, :end)

    start_ulid = datetime_to_ulid(start_date, :start)

    """
    CREATE TABLE #{table}_p#{start_date.year}#{month}
    PARTITION OF #{table} FOR VALUES
    FROM (ulid_to_uuid('#{start_ulid}'))
    TO (ulid_to_uuid('#{end_ulid}'))
    """
  end

Finally we can tie it all together with out table creation!

    create table(:payloads, primary_key: false, options: "PARTITION BY RANGE(id)") do
      add(:id, :binary_id, null: false, primary_key: true)

      # ... other fields and references
    end

    # Create the default table that will ingest entries that don't match the ranges.
    # This is not strictly necessary but means that id's that aren't in the ranges we
    # declared don't create an error
    execute("CREATE TABLE payloads_default PARTITION OF payloads DEFAULT")

    # Create 48 months worth of partitions starting from 2022-01-01
    for month <- ULIDUtils.months_from(~D[2022-01-01], 48) do
      query = ULIDUtils.create_partition_query("payloads", month)
      execute(query)
    end

That’s it you’ve just created your partitioned table based on the timestamp based UUID!

Improving your query speeds

Now your Ecto queries can utilize the id to only query partitions that contain the data they are looking for skipping other partitions that don’t hold any relevant data. We use a combination of the database and ULID functions we created earlier in the where call.

    query
    |> where(
        query,
        [p],
        fragment("? >= ulid_to_uuid(?)", p.id, ^ULIDUtils.datetime_to_ulid(from, :start))
    )

Conclusion

So we went from queries taking multiple minutes to sub-second, zero timeouts and we were also able to retain very quick lookups on individual rows!

I hope that this helps others after me, and if you’ve got any suggestions on where you might do something slightly differently let me know! (I’ve only been working with Elixir for a little over a year now and I suspect there are some code smells etc in the above)

28 Likes

Wow, such a brilliant solution to the problem. I am currently facing something really similar and I was wondering how I could keep some sort of global unique index on tables partitioned by date range.

Thanks for sharing all these resources, I will definitely look into it!

1 Like

Did you consider using the timescale db extension to auto partiition? Besides Automatically partionioning it provides a lot of other useful enhancements.

GitHub extension page here:

2 Likes

Unfortunately under the hood TimescaleDB is a time-based partition and runs into the same issue as our first attempt in that you cannot have a globally unique index that doesn’t include the time column used to create the hypertable.

It’s still totally possible to take this approach and have a unique composite index of the (time + id)! It just means that any lookups on a specific id would also need to include the time column as part of the where condition to take advantage of the partition pruning of the Postgres’ query optimizer.

The difference in the solution above is you can say “gimme this id” and the time and id logic is taken care of for you!

Great post! Thank you for sharing!

Do you think that opting for uuid v7 would have made this easier?

3 Likes

I was unfamiliar with ULID before reading this post and the official spec. Neat!

It sounds like ULID isn’t completely supported by Postgres quite yet? Or are the extensions pretty solid at this point?

I found three Hex packages that might be relevant:

But I assume those aren’t strictly necessary if you’re going the fragment route? :sweat_smile:

2 Likes

Addendum

Because it’s easier to do in one post?

Absolutely!

We had already settled on using ULIDs when we hit the problem and migrating over to UUIDv7 would have been quite a bit more time consuming. The benefit of using a UUID library is not needing the Postgres ULID functions and not needing to convert the ULIDs each time you use them in a query.

If anyone wants to go ahead with UUIDv7 instead you can use the above solution changing out the ULID manipulation section with one for manipulating UUIDs.

  • @Nezteb - But I assume those aren’t strictly necessary if you’re going the fragment route?

So we are using ecto_ulid_next | Hex package forked from ecto_ulid | Hex by @woylie and is actively maintained.

We’ve never run into any issues in production with the library, so pretty happy to give this a solid thumbs up!

Or, you could look into any upcoming UUIDv7 libraries if you wanted to drop the overhead of ULID manipulation on the database.

2 Likes

@ob1 did you see any speed improvements on JOINs as well after partitioning your tables by ULID?

Say another table orders references your payloads table using a payload_id foreign key containing ULIDs of your payloads.

I imagine that when executing a query like the following one, Postgres would be able to quickly satisfy the JOIN by only looking at the relevant payloads partitions.

SELECT * FROM orders
LEFT JOIN payloads on orders.payload_id = payloads.id
WHERE ...

Would that be a correct assumption?

1 Like

Not as much as you would think.

The reason for this was that the table originally had a unique index on the id of the payload and traversing the index btree for unique values tends to be very quick.

Functionally the switch to separate tables didn’t really change the speed too much because it still has roughly the same globally unique index which it still has to traverse! Arguments could be made that we could retrieve all entries from a particular partition but that didn’t tend to be the usage pattern!

Hello ob1,

Thanks for a fascinating walkthrough! Extremely useful on all fronts. I’m totally going to steal the idea :see_no_evil:

One thing I want to clarify, though. PostgreSQL partitioning Overview mentions this:

(Recall that adjacent partitions can share a bound value, since range upper bounds are treated as exclusive bounds.)

Here’s an example from the documentation:

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

Where 2006-03-01 is a shared bound value between two partitions.

I could be wrong here, but given that upper bounds are treated as exclusive bounds, the create_partition_query function can be simplified.

Instead of using DateTime.new!(stop_date, Time.new!(23, 59, 59, 999_999)), create_partition_query can use the first date of the next month.


Following the upper bounds being treated as exclusive bounds rule, it seems that ULIDs generated by datetime_to_ulid(stop_datetime, :end) would be stored in the payloads_default partition instead of their month partition.

Let’s derive the from and end values:

Mix.install([:ecto_ulid])

# "01906b97-5c00-0000-0000-000000000000"
~U[2024-07-01 00:00:00.000Z] 
|> DateTime.to_unix(:millisecond) 
|> Ecto.ULID.generate() 
|> String.slice(0..9) 
|> String.pad_trailing(26, "0") 
|> Ecto.ULID.dump()
|> elem(1) 
|> Ecto.UUID.load()
|> elem(1)

# "01906b97-5c00-ffff-ffff-ffffffffffff"
~U[2024-07-31 23:59:59.999Z] 
|> DateTime.to_unix(:millisecond) 
|> Ecto.ULID.generate() 
|> String.slice(0..9) 
|> String.pad_trailing(26, "Z") 
|> Ecto.ULID.dump()
|> elem(1) 
|> Ecto.UUID.load()
|> elem(1)

Here’s a quick snippet that puts them for a spin:

CREATE TABLE payloads (
id uuid not null,
PRIMARY KEY (id)
) PARTITION BY RANGE (id);

CREATE TABLE payloads_default PARTITION OF payloads DEFAULT;

CREATE TABLE payloads_2024_07 
PARTITION OF payloads
FOR VALUES FROM ('01906b97-5c00-0000-0000-000000000000') TO ('01910b3c-7fff-ffff-ffff-ffffffffffff');

SELECT * FROM payloads_2024_07;

-- | id |
-- | ------------------------------------ |
-- | 01906b97-5c00-0000-0000-000000000000 |

SELECT * FROM payloads_default;

-- | id |
-- | ------------------------------------ |
-- | 01910b3c-7fff-ffff-ffff-ffffffffffff |

As you can see, payloads_default ends up containing the upper bound value of the payloads_2024_07 partition.

Using ~U[2024-08-0100:00:00.000Z] for the upper bounds stores 01910b3c-7fff-ffff-ffff-ffffffffffff to the payloads_2024_07 partition:


That said, the probability of generating bound values is quite low. Though, sharing the upper bound simplifies partition creation.

2 Likes