Pg_large_objects - Dealing with PostgreSQL Large Objects

An application wishing to store larger amounts of data typically has two options for doing so:

  1. A new column on some table can be introduced; Postgres features a bytea type for this purpose. This is easy to implement but suffers from requiring to hold the complete data in memory when reading or writing, something which may not be viable beyond a few dozen megabytes. Efficient streaming or random-access operations are not practical.
  2. A separate cloud storage (e.g. AWS S3) could be used. This permits streaming but requires complicating the tech stack by depending on a new service. Bridging the two systems (e.g. ‘Delete all uploads for a given user ID’) requires Elixir support.

PostgreSQL features a ‘large objects’ facility which enables efficient streaming access to large (up to 4TB) files. This solves these problems

  • Unlike values in table columns, large objects can be streamed into/out of the database and permit random access operations.
  • Unlike e.g. S3, no new technology is needed. Large objects live side-by-side with the tables referencing them, operations like ‘Delete all uploads for a given user ID’ are just one SELECT statement.

The pg_large_objects | Hex package makes it easy to work with large objects.

The high-level API permits importing or exporting data to/from the database in a convenient and memory-efficient manner, e.g.

# Stream data into large object
{:ok, object_id} =
  "/tmp/recording.mov"
  |> File.stream!()
  |> Repo.import_large_object()

A lower-level API makes it easy to work with parts of large objects by exposing common random-access operations, e.g.

alias PgLargeObjects.LargeObject

{:ok, object} = LargeObject.open(object_id)
PLargeObject.seek(object, 1024)
{:ok, sixteen_bytes} = LargeObject.read(object, 16)

See the documentation at PgLargeObjects — PgLargeObjects v0.1.0 for more.

17 Likes

I just released version: PgLargeObjects — PgLargeObjects v0.2.0

This feature features a couple of documentation fixes, but most notably a ready-made implementation of the Phoenix.LiveView.UploadWriter behaviour: PgLargeObjects.UploadWriter. This makes it easy to consume file uploads in LiveView applications by streaming them straight to the database, using constant memory!

First, use the :writer option to Phoenix.LiveView.allow_upload/3 to specify the custom writer. Make sure to pass the :repo option in the tuple to indicate which repository the object should be stored in:

    socket
    |> allow_upload(:avatar,
      accept: :any,
      writer: fn _name, _entry, _socket ->
        {PgLargeObjects.UploadWriter, repo: MyApp.Repo}
      end
    )

Next, when consuming upload entries, extract the object ID referencing the upload and store it in order to be able to reference the data from elsewhere:

    consume_uploaded_entries(socket, :photo, fn meta, _entry ->
      %{object_id: object_id} = meta

      # Store `object_id` in database to retain handle to uploaded data.

      {:ok, nil}
    end)