Event based library integration?

I’ve been hacking some spare hours into an event-driven File Management library designed to avoid assumptions about the persistence and query implementations. It’s been going well so far, but I’m unsure how to proceed with configuration around a stateful library like this.

Most File Management solutions (Active Storage, Arc, etc.) require a tighter coupling to a relational database than I’d like. I have use cases where, for a given file upload, multiple systems need to listen in and handle an event their own way. The idea has been to run a state machine around uploads and define a contract/behaviour around query, persistence, and storage provider (e.g. S3, GC Storage, etc.) concerns. This a long with some tag-management enables features like, live monitoring of upload progress, clean-up of cancelled / errored uploads, and arbitrary use of storage providers.

Responsibilities might be like follows:

file_manager_core - Business logic, OTP runtime, and Contract definitions.
file_manager_S3 - Storage Provider implementation passed as an option during a request_upload or similar command.
file_manager_ecto - Persistence and Query implementation for Ecto.

Some questions I had:

  1. Is what I’ve described even a sane approach? Maybe someone more experienced can foresee unknown-unknowns I’ve missed?

  2. I figured the best way for multiple event-listeners would be a pub-sub mechanism. How would you implement pub-sub in an OTP-way that doesn’t make assumptions about a consuming system’s architecture? I’d expect some kind of stateful process to keep track of subscribers. Would this be done by doing a cast to the list of subscribers?

  3. Are there changes or more unique concerns you’d have for multi-node/distributed systems?

  4. My current implementation of the upload state machine is a GenServer spawned by a DynamicSupervisor. What sort of configuration would you put in place to prevent the DynamicSupervisor from spawning too many upload processes? Some kind of pooling library? Is there a way to dynamically set a pool size based on available system resources?

  5. Can I claim a hex-package name without publishing immediately?