Postgres and bytea - do you think it's ok to store small images in the DB as byte arrays?

sodapopcan · May 10, 2023, 12:50am

I apologize for the not-really-Elixir-specific post, but I’m currently working solo and y’all are the only programming community I engage with, so I’m hoping this is ok and promise I won’t make it a habit.

For the first time in my life I’m considering storing some images in Postgres as byte arrays (bytea type). I’ve done my reading and it seems ok, but having dealt with images a lot in my career and having it pretty hammered into my head to store file paths in the db (or bucket paths), I just wanted to say this out loud to other devs:

The images are quite small, generally under 300k and even as small as < 1k! They are assets used to generate more complex images as well as print files (I will not be storing these generated images in Postgres). There are not a lot of them—if business continues to go well then in the next 5 years we will have maybe accumulated around 3_000 of them. The only time they are ever shown to a user is in backoffice pages that, other than when the records are initially being created, have a low chance of ever being looked at ever again. They are otherwise sent, along with other data from the same query, to the image processing service which will happen with some regularity (batches of them will accessed daily).

So this is all feeling pretty good to me as it greatly simplifies the implementation. Is there anything I’m missing? Is this still somehow a bad idea anyway?

Thanks for reading!

benwilson512 · May 10, 2023, 1:10am

From what you have said: Nah, seems fine. The closest thing from my experience is PDFs. For certain small services I’ve run I’ve stored generated PDFs as binary and it’s worked fine.

My only advice is to probably segregate it to a dedicated table where your general model holds a foreign key to it. While Ecto makes it pretty easy to query it even if you put it as a regular column on your table it is often annoying to integrate with other postgres clients, and if your overall row count is as low as you say the cost of having it in another table shouldn’t really matter.

mindok · May 10, 2023, 1:27am

I’m all for it for small images (e.g. thumbnails). I’ve done quite a lot of extracting and databasing 128x128 images into SQLite, SQL Server & Postgres recently - admittedly the images are relatively small - 5-15k. Everything is self-contained and performance is great for LiveView front ends - you can push the images to LiveViews over web sockets (Base64 encoded first) which reduces chatter over the network connection (at the expense of slightly larger payloads).

As usual, though, there are tradeoffs to consider. Traditional web-serving of the images will be a little trickier from db rather than disk (not much harder, but you would need to write a controller rather than just use Plug.Static), and you won’t be able to edit the images in a normal editor without a bit of fiddling around.

AstonJ · May 10, 2023, 3:14am

My only experience of storing images in the DB goes back about 20 years - when we stored (tiny) avatars in the DB. However on further reading I learned that this was only really an option intended for people who could not set the appropriate permissions on the file system (usually because of the type of (more restrictive) hosting they might have had) and that the file system was preferred. Sure enough switching to the file system made everything feel much snappier, even on forums with just a couple thousand members.

Not sure how far DBs have come, or whether the possible bottleneck might effect you in your situation but I always remember what @joeerl said whenever I hear someone talk about storing images in a DB:

On top of that you have to consider things like back-ups, DB size, what happens if the DB becomes corrupted, etc, so personally I would avoid storing images in the DB if at all possible.

^{But you may be ok in your use-case if you take into account what @benwilson512 and @mindok have said above.}

mindok · May 10, 2023, 6:20am

All valid points - there are always trade-offs. It is worth noting that bigger, non-traditional RDBMS data is getting stored in databases more and more. For example, Postgres has a point cloud extension (generally a lot bigger than images), and PostGIS has spatially registered raster support where the rasters are stored in the database. But yes, database backups etc become a bit more complex.

For my use cases, the original images (large - 16000 x 2000px) are stored on disk, but small crops and thumbnails get saved in the database.

lkuty · May 10, 2023, 7:06am

A few other considerations related to the benefits of using a DB are worth considering:

Do your binaries have to reside the closest possible to your data ? If the binaries (image, PDF, …) are meaningless without the accompanying data or the other way around, the data is meaningless without the binary, then it might be necessary to store the binary in the DB, close to the data. For example, operators might write reports on the field and take pictures to augment it, which means the report may be unusable if the pictures are not present. The data and the binaries cannot be unlinked contrary to what might happen with a file outside the DB.
Data and binary are backed up together. It is due to point 1.
You benefit from ACID properties due to using DB transactions.

sodapopcan · May 10, 2023, 2:00pm

These are all really great responses, thanks! And @AstonJ, that Joe link set me down a rabbit hole of interesting discussion before my time! I’ve read before where he talks about using the FS for users and that discussion answered a lot of questions I’d had about it.

Like other than Ecto? There are definitely no plans for that and these would never be offered through an API or anything. I think I’ll be doing this regardless so that I can cleanly store different blobs along with their mime types. Most of them are actually SVGs so technically those could just be a text column, but there are other formats as well.

I could have definitely been more clear about my use-case: we’re an e-comm business and we produce our products in-house—at our core we’re a print shop. The assets in question are used to create mockups for the retail shop (these will be stored on S3) and print files to produce print-files on demand (these are processed on the fly and sent to a local box). I do have a flare for hyperbole but the “great simplification” I’d be getting out of this is that I can write tests for this rather important piece of the system just as it would work in production, ie, without having to use a mock for S3 or introducing volume storage just for this. Otherwise these data are mostly just being accessed by jobs and will never be shown to customers—the only time they would be shown on a webpage is in the admin area which only a few people would have access to and even then rarely look at them, pretty only when configuring a new product.

These replies did give me food for thought around the database becoming a bottleneck. Even though 300k is small I’m not usually sending that much per row. Things aren’t crazy at the moment, though, and even it becomes a thing I could always store the most commonly ordered products in ETS. I think I’m gonna go ahead with it—it’s not hard to back out of if it fails.

Thanks so much for the replies! It’s much appreciated.

kwando · May 10, 2023, 2:13pm

You will be fine putting them in the DB, it is not a lot of images and it is not in the hot path

derek-zhou · May 10, 2023, 3:39pm

for un-compressable image data, you may want to disable compression:

sodapopcan · May 10, 2023, 7:57pm

Interesting. I came across this but wasn’t sure if I needed to care. Will read more thoroughly, thanks!

jc00ke · May 17, 2023, 2:01am

One of the apps I built (runs on Heroku) pulls bank statements from an API and stores them in the database for review later. Way simpler than object storage!