An application wishing to store larger amounts of data typically has two options for doing so:
- A new column on some table can be introduced; Postgres features a
byteatype for this purpose. This is easy to implement but suffers from requiring to hold the complete data in memory when reading or writing, something which may not be viable beyond a few dozen megabytes. Efficient streaming or random-access operations are not practical. - A separate cloud storage (e.g. AWS S3) could be used. This permits streaming but requires complicating the tech stack by depending on a new service. Bridging the two systems (e.g. ‘Delete all uploads for a given user ID’) requires Elixir support.
PostgreSQL features a ‘large objects’ facility which enables efficient streaming access to large (up to 4TB) files. This solves these problems
- Unlike values in table columns, large objects can be streamed into/out of the database and permit random access operations.
- Unlike e.g. S3, no new technology is needed. Large objects live side-by-side with the tables referencing them, operations like ‘Delete all uploads for a given user ID’ are just one
SELECTstatement.
The pg_large_objects | Hex package makes it easy to work with large objects.
The high-level API permits importing or exporting data to/from the database in a convenient and memory-efficient manner, e.g.
# Stream data into large object
{:ok, object_id} =
"/tmp/recording.mov"
|> File.stream!()
|> Repo.import_large_object()
A lower-level API makes it easy to work with parts of large objects by exposing common random-access operations, e.g.
alias PgLargeObjects.LargeObject
{:ok, object} = LargeObject.open(object_id)
PLargeObject.seek(object, 1024)
{:ok, sixteen_bytes} = LargeObject.read(object, 16)
See the documentation at PgLargeObjects — PgLargeObjects v0.1.0 for more.




















