Belt - A Flexible File-Storage Library

@abitdodgy If you are using a recent version of Plug/Phoenix, you shouldn’t get out-of-memory problems anymore since the bug concerning this was fixed last month. The only limiting factor for uploads then is the disk space you have available for temporary usage which I realize could end up being a problem.
Are you already using Belt? If so, what would be a reasonable API for adding direct upload links that can work across different providers?

2 Likes

@wmnnd I’m not using Belt yet. But in Refile it works like this:

When you add files to the form, Refile makes a request to the app to get a signed token, then using this signed token the app uploads the files directly to AWS and writes the results to a JSON field to store the meta data. This is great because I can have multiple uploads running concurrently, and submit the form and save the record without having much overhead or IO blocking.

Where Refile fails, however, is that it does on the fly processing, which means requested images are processed on the fly from AWS and served through a rack app. This is a major headache on PaaS services like Heroku, even with a CDN, where a page load with lots of avatars causes timeouts as the rackapp struggles to process and serve each image.

The ideal way would be to be able to directly upload to AWS and request the files directly from AWS or via CDN without having the app get involved. Any processing can happen in the background.

In Refile, much of this direct uploading is powered by circa 100 lines of javascript.

Getting the signed upload URL isn’t much of a problem but the JS and form-building part is what worries me here. Right now, Belt is completely independent from Phoenix and/or Plug even if they play together very nicely.

A belt_phoenix or belt_plug module that offers a provider-agnostic way of getting direct or indirect upload links would probably be nice but it’s not at the top of my to-do list right now.
I’d imagine the API to be relatively low-level and something along the lines of Belt.upload_link(config, opts). What do you think about this idea?
Installation for a more high-level module would also be a little more complicated than for Refile since Phoenix doesn’t have a global asset pipeline that allows for the adding of global helper functions and/or JavaScript.

For now, I suggest simply trying if using Belt works out for you even without direct S3 uploads (which are also not supported by Arc, by the way). If your application isn’t, say, a file hoster, this should not pose a major problem :slight_smile: Otherwise you’ll have to build something yourself with ExAWS.

1 Like

I’d imagine the API to be relatively low-level and something along the lines of Belt.upload_link(config, opts). What do you think about this idea?

I think that would be great. We could then build our own JS implementations around the lib.

Do I have to do something special when using Belt.Provider.S3 in a test case? It justs blow up with this error all the time :confused:

(I haveex_aws and sweet_xml in my mix.exs)

16:59:18.658 [error] GenServer Belt terminating
** (RuntimeError) provider Belt.Provider.S3 not registered with Belt
    (belt) lib/belt.ex:284: anonymous fn/1 in Belt.init/1
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:214: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
    (elixir) lib/enum.ex:1811: Enum."-reduce/3-lists^foldl/2-0-"/3
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:213: GenStage.PartitionDispatcher.dispatch/3
    (gen_stage) lib/gen_stage.ex:2309: GenStage.dispatch_events/3
    (gen_stage) lib/gen_stage.ex:1960: GenStage.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.519.0>): {:store, [%Belt.Provider.S3.Config{access_key_id: "<redacted>", base_url: "https://<redacted>", bucket: "demo-data", host
: "static.bemt.eu", https: true, port: 443, provider: Belt.Provider.S3, region: "us-west-2", secret_access_key: "<redacted>"}, "/Users/kwando/projects/

work/SMS/setup.sh", [key: "840277a243d9f9769a71d56285d2569489fb21747f0d32b4db302c77a561b10c3842c579ffc20c28"]]}
16:59:18.662 [error] GenServer Belt.Provider.Filesystem.Supervisor terminating
** (RuntimeError) provider Belt.Provider.S3 not registered with Belt
    (belt) lib/belt.ex:284: anonymous fn/1 in Belt.init/1
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:214: anonymous fn/3 in GenStage.PartitionDispatcher.dispatch/3
    (elixir) lib/enum.ex:1811: Enum."-reduce/3-lists^foldl/2-0-"/3
    (gen_stage) lib/gen_stage/partition_dispatcher.ex:213: GenStage.PartitionDispatcher.dispatch/3
    (gen_stage) lib/gen_stage.ex:2309: GenStage.dispatch_events/3
    (gen_stage) lib/gen_stage.ex:1960: GenStage.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:DOWN, #Reference<0.611478655.3527409665.27844>, :process, #PID<0.311.0>, {%RuntimeError{message: "provider Belt.Provider.S3 not registered with Belt"}, [{Belt, :"-in
it/1-fun-0-", 1, [file: 'lib/belt.ex', line: 284]}, {GenStage.PartitionDispatcher, :"-dispatch/3-fun-0-", 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 214]}, {Enum, :"-r
educe/3-lists^foldl/2-0-", 3, [file: 'lib/enum.ex', line: 1811]}, {GenStage.PartitionDispatcher, :dispatch, 3, [file: 'lib/gen_stage/partition_dispatcher.ex', line: 213]}, {GenStage
, :dispatch_events, 3, [file: 'lib/gen_stage.ex', line: 2309]}, {GenStage, :handle_call, 3, [file: 'lib/gen_stage.ex', line: 1960]}, {:gen_server, :try_handle_call, 4, [file: 'gen_s
erver.erl', line: 636]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 665]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}

@kwando Hey there, thanks for giving Belt a try! You need to register the providers that you want to use in your mix configuration.
Here is the example from the Getting Started guide:

config :belt,
  providers: [Belt.Provider.Filesystem, Belt.Provider.SFTP, Belt.Provider.S3] 

Small SFTP update 0.2.0

Version 0.2.0 of Belt brings improved SFTP transfer performance (as much as 10 times faster transfer speed) thanks to asynchronous writing as well as better support for different SFTP servers.

What’s next?

Belt 0.x will continue to receive updates and soon I’ll add a feature to test configurations without the need to upload something or use one of the other commands first.
A rewrite is coming for version 1.x which will bring new Ecto-style configuration and direct uploads - both for S3 and by proxy for SFTP as well.

4 Likes

Would be awesome if this bring delete feature to S3. That’s is the very problem I’m having right now with Arc and Arc_ecto.

Which delete feature are you referring to? Belt.delete/3?

Nice library, thanks!

Is there a way to store a file without reading from disk? I mean, in my app I want to store a file in SFTP, but its contents don’t come from an existing file, they’re generated in memory. Is there a way to upload directly those or should I write to disk first?

1 Like

This depends on the SFTP lib you use.

I mean, using Belt. :slight_smile:

With Belt, you can’t currently upload data from memory but it’s an interesting idea. So for now I suggest writing the data to a file first and then using Belt.store/3.

How would you imagine the API for storing data to work? Should it take a stream (e.g. Belt.store_stream(config, stream, opts) or would you find it more convenient to store data directly (e. g. Belt.store_data(config, data, opts))?

2 Likes

Personally I think the following options would be nice to have:

  • Upload from a file that exists physically on disc
  • Upload memory content (a binary) literally
  • Upload from a stream (chunked by size or by line)
  • Upload from messages, in a way that the receiver can be passed in to HTTPotions :stream_to option.
2 Likes

Thanks! All the suggestions by @NobbZ seem OK to me; for my current needs I’d need the proposed store_data, so I could pass it a file_name and contents as binaries.

I’ll think about how to implement this. For now, I recommend you create a temporary file and then specify the :key option in order to set the file name (file names are referred to as “keys” and directories as “scopes” in order to provide a more abstract concept that also applies to object storage).

Cool, I got it working this way - the only thing now is that it’s super slow, a single file upload is taking like 10 seconds to upload to SFTP (but if I do that directly from my terminal, it’s fast). I’ll try to investigate about this tomorrow.

Are you already using the latest version of Belt (0.2.0)?

Yes, Belt 0.2.0, Elixir 1.5, Erlang/OTP 20.

Did you have the previous version installed? Maybe you have changed your mix.exs but not actually updated the version you have installed with mix deps.update? I also experienced very slow uploads with the previous version and 0.2.0 then introduced a different mechanism for SFTP uploads.