Capsule - a modular file upload and storage utility

tfwright · August 22, 2020, 9:58pm

Today, I’m releasing what I’m calling a “preview” version of a new file upload utility for Elixir (with optional Ecto integration). If you want to get straight to the code, the source README should outline the basics. If you’d like to hear more about why, rationale is below.

Last week I needed to add a pretty simple upload feature to an app I’m working on. Since this isn’t something I’ve done before in Elixir/Phoenix, I posted here asking what others were doing. I didn’t get much of a response, which leads me to believe people were either using Arc, Waffle (the fork of Arc) or just creating their own solution. I wasn’t super excited about either lib, and my use case was pretty simple, so I decided to implement my own custom solution. But after a few hours of playing with it I decided it might be worth separating out and offering my own library.

On my last big Rails project, which involved a good amount of file upload and processing, we used Shrine. Despite the usual gripes with dealing with opaque behavior and callback mazes one has with a lot of Ruby projects that tightly couple with ActiveRecord (and Shrine isn’t even that coupled), I was really impressed with its power and simple API, and the plugin system that gave you access to a kind of “middleware” layer between the upload and the storage phases (although in practice they ended up somewhat entangled in sometimes unexpected and confusing ways). Almost immediately I could see that Elixir offered some really exciting possibilities to build a similar design, but with even more clarity and simplicity.

From the beginning I knew I wanted the following:

Modular system so I could use just what I needed and nothing more
Flexible interface so I could build custom use cases on top of that with minimal hassle
Clean sensible DSL so I could avoid the cognitive load of FS details when I just want to move file from A to B

What excited me was not only how easy it was to accomplish (I think) these goals in a relatively short period of time, but the increasing conviction that the stuff I wasn’t planning to bother to do, ostensibly because other libraries already did it fairly well, I didn’t need to do and in fact shouldn’t do. This is probably the most opinionated aspect of this library and the reason I’m sharing it here. Versions, “convert” commands, backgrounding, these are things are certainly possible to do with Elixir. But once you are adding those kind of features to file upload in an app, two things are true: 1) there are some details about what you need that no library will be able to predict, and so can’t support well, and if you have the three things above, the added work to add these on top is vanishingly small.

As I worked on this, I’ve been thinking back to one of my first impressions of (what I take to be) the “Elixir Way” (every language has its Way, right?) when I started learning Ecto syntax a few years ago, one I’ve since seen repeated many times by other new learners. Where is my Model.last? Do I really have to type MyApp.Context.Model |> Ecto.Query.last |> MyApp.Repo.one? The point I started really getting into Elixir is when I realized this wasn’t a defect in the design, or even an oversight, but an intentional decision built deep into Ecto, Elixir itself, and I assume at least in part, the underlying VM. And that was another impression–how much the people I could tell really understood Elixir talked about Erlang! I remember thinking, “isn’t this an Elixir forum??” How often do you catch a Rubyist waxing eloquent about C? Not often…

So, as usual, I’d welcome any feedback the community has about my approach here. Specifically I’d love to have a conversation around the following questions:

Does the model make sense? Are there common (or even uncommon) file upload patterns that would be difficult to implement with the current design?
Is the DSL sensible? Should I choose different terminology for any of the API?
(And because I’m arguably somewhat out of my depth as a web dev here) Are there especially serious security/performance issues I should consider before moving forward?

Thanks!

tfwright · August 23, 2020, 2:24pm

I was asked a question via DM about handling file cleanup that prompted me to add a section to the capsule_ecto README (it applies equally well without Ecto, although the need might not be quite as obvious):

WFransen · August 23, 2020, 4:13pm

Thank you!

tfwright · October 4, 2020, 7:38pm

Kicking off Hacktoberfest with a new release.

Aside from various bugfixes, it contains 2 significant, non-backwards-compatible changes:

The Storage.move callback has been removed in favor of copy. If you were using this API you will need to replace it with a combination of copy and delete
All built in Storage and Upload implementations have been relocated to a new, separate repo: https://github.com/elixir-capsule/supplement

The latter comes with a new Storage and Upload implementation as well: S3 and Plug.Upload respectively.

I’ll be looking to add more, so if you want to get your t-shirt (or tree) feel free to pitch in

tfwright · October 6, 2020, 4:55am

0.7 is a small release that extends options support to all storage callbacks, which is used in the latest version of the S3 and Disk storages to allow overriding configs (bucket and root directory) in specific calls.

tfwright · December 28, 2020, 2:27am

Another small maintenance release!

0.8 contains a breaking change to the API. Capsule.Storage.open is now Capsule.Storage.read.

Supplement also now includes a RAM storage for putting files in memory (via StringIO). After running into a confusing dependency conflict issue, I also published a versioned release on hex: https://hex.pm/packages/capsule_supplement

I haven’t been running into many issues thus far, so the first stable release should be coming soon. So, if you have been trying it out and have pain points, or want to try it out, speak your feedback now or forever hold your peace…or at least until the next major release

tfwright · February 9, 2023, 12:19pm

Latest release contains significant refactors with breaking changes to the API.

“Encapsulation” is now “Locator,” which is less cumbersome and more expressive. Naming things is hard.
Storage.put now returns only the file id and read and delete take only an id rather than the whole Encapsulations (now Locator), increasing simplicity and separation between those modules.
Locator.new added to facilitate converting DB/map data to structs.

I’ve finally been able to use this more extensively in production code so there will likely be more changes coming soon.

tfwright · November 11, 2023, 10:37am

And we’re back with a new feature: uploaders!

Most file upload libraries come with something like this built in but I put off adding it since they come with a disproportionate amount of complexity and I wanted to see if they were really necessary. Well after dealing with several bugs related to slight variations in metadata/file locations, I found that I was adding something very like uploaders to the main live project I am using Capsule in, and felt the time had come to abstract some of that back out into the library.

Check the README for details, but the concept should be familiar to anyone who has used general purpose uploading libraries before (especially Shrine, which Capsule is loosely based on). Essentially it provides an api for linking specific file storages to sets of options (including file locations) and metadata. For example, if you want to store the mime type every time a file is stored somewhere new (and possibly converted in between), you can implement the build_metadata function on an uploader and use uploader.store(some_upload, :your_storage) and Capsule will take care of it for you.

Some notable design choices:

Uploaders are not linked with any “type” of file being uploaded by default, only storages. Although generally speaking different files will need different metadata/locations and thus separate uploaders, Capsule doesn’t impose that (and so allows you to even use a single uploader to generate different sets of metadata using options if that’s what you prefer).
Uploaders do not provide any built in utilities for implementing file processing (type conversion, mime extraction etc) but they are certainly a convenient place to organize that logic if you choose using Plain Old Elixir Functions.
Although uploaders allow you to set aliases for storages (“cache” meaning Disk, for example), they don’t make any assumptions about the order or priority of storages. Feel free to have as many “permanent” storage locations as you want, and by matching on the alias you can modify metadata and other options when copying the file from one storage to the next.
As usual nothing is deleted automatically.

The other change is additional validation for the Locator initializer, as well as a bang version. This helps prevent
accidentally persisting invalid file data, especially when used in conjunction with the new Ecto helper release:

This is a small change but it has major repercussions: storing a plain map will no longer work. This encourages the use of the initializer to ensure that a valid Locator is being passed to Ecto. Allowing maps to be dumped was a major design error that I decided was important to correct despite the fact that it may break many existing uses! If you try to update and find that is the case, the easiest way to fix is to simply convert the maps to Capsule.Locator structs. This won’t give you the validation however, so I would encourage moving to using the Locator.new function, which as least checks that all required keys are present. This is especially important if you are handling the initial upload on the client side!