Depot - Filesystem Abstraction

Depot is a filesystem abstraction inspired by the great flysystem PHP package.

This allows interacting with many different filesystems (cloud based or not) via a common sensible and in some parts simplified API. Change e.g. cloud providers just by switching out the filesystem setup, leaving any business logic as is. It supports both “module based” systems as well as systems setup purely at runtime (thanks to @michalmuskala presisting on it ages ago) .

defmodule LocalFileSystem do
  use Depot.Filesystem,
    adapter: Depot.Adapter.Local,
    prefix: "/some/path/to/storage"
end

LocalFileSystem.write("test.txt", "Hello World")
{:ok, "Hello World"} = LocalFileSystem.read("test.txt")

# or

filesystem = Depot.Adapter.Local.configure(prefix: "/some/other/path/to/storage")
Depot.write(filesystem, "test.txt", "Hello World")
{:ok, "Hello World"} = Depot.read(filesystem, "test.txt")

The following parts are done.

  • Basic API
  • Stream based API (with the help of @mcrumm)
  • Portable visibility (:public vs. :private only)
  • Adapters can support more complex visibility handling
  • Copy between different filesystems with
    • Copy directly on the filesystem (e.g. S3 bucket_01 -> bucket_02) if supported.
      Must be same adapter for both filesystems.
    • Copy using streaming via the current machine.
    • Drop to synchronous read/write when streaming is not supported.

Outstanding additions:

  • Further common metadata: Size, Type, …

GitHub:

The abstraction for the different underlying filesystems are done using adapters:

  • InMemory (Included)
  • Local (Included)
  • S3 (Currently not up to date with latest depot behaviour)
  • Google (Currently not up to date with latest depot behaviour; by @jayjun)
19 Likes

Does this help you with your problem with backslashed paths that you shared about some time ago?

It was the reason for me bringing it up. One would think using Path.type to enforce only relative paths would work, but it’s os dependent. I ended up just copying the code and modifying it to fit the needs of the module.

1 Like

Out of interest, why use Agent in the InMemory adapter instead of ETS? Recently I have used ETS for such behaviour (I should publish that before you published Depot) with 2 tables:

  • “paths” index in form of {path(), hash()}
  • file contents indexed by hash {hash(), content()}

In that way file copies are “cheap” and I do not need to store them twice (in my case I couldn’t rely on Erlang immutability, as I was reading file content from tarballs, so each binary would be “different” from the viewpoint of VM).

I am also preparing the PR with “balckhole” adapter which I often find useful for testing.

BTW

I have reviewed the code and I think it would be useful to make some callbacks optional, as not all adapters will support them, and that would make it easier to implement them for situations when some features aren’t available.

1 Like

Mostly because it was simple and I needed something to test against, which required “starting” as opposed to the local filesystem, which is just “there”.

Would you be willing to add an issue about that? Streaming is already optional, but I’d like to know which ones specifically you’re talking about.

Well, it doesn’t seems so. Also I need to tell that I am big fan of not calling callbacks outside of the module that defines them, I think that this is nice lint in Elvis that I thing should be ported to the Credo.

It’s currently implemented the same way as in the Enumerable protocol where returning {:error, __MODULE__} is considered “not implemented”. @mcrumm actually introduced that part, so I’m not sure how exactly this compares to using optional callbacks on the behaviour.

Given the current API, we could make the *stream callbacks optional and defoverridable, and have them return {:error, __MODULE__} by default.

I’m honestly not sure that’s better than requiring the adapters to do so explicitly, but it would keep the errors consistent.

1 Like

I’ve picked up work on this again and am currently trying to get the API and the S3 adapter finalized. Today I bumped into the reality that minio (what I’m using in dev/test) doesn’t support object level ACLs like AWS S3 does, the docs seem to suggest using bucket policies. I’ve not much experience around those policies, so I’m wondering if anyone here does and would like to help me out or collaborate on the visibility handling part of the adapter. I’d really like for the adapter to support not just AWS.

3 Likes