A multi tenancy database solution for customer-owned data

lud · September 9, 2022, 7:29am

This is not directly related to Elixir (though the app using the database will be Elixir) but I know this forum gives sound advice!

We would like to build an application where users can create projects, and each of those projects can have its down data space to store and cache stuff.
The app main point is to run jobs within a project : fetch data from this table, compute some more data, store it into another table, download some files, send some notifications, monitor an API uptime, store that too, etc.
The users configure the data schemas in the projects, not us, so the name of the tables and columns are not know upfront. Also those tables can be used to cache some Google Spreadsheets or Airtable data.

Jobs are run concurrently so the database must support transactions. (or at least locking and queuing transactional blocks)

Each one of this projects could have a simple SQLite file, which is what I think I will do first. But now imagine that the jobs are heavy (CPU intensive tasks, delegated to a local process, for instance a ffmpeg command) so we want to distribute the application on multiple computers. In this case, SQLite is not intended for this I think.

Do you have any solution in mind for that?

My main points are ease of use (create new projects-db on the fly, drop them easily) and deployment ease. Performance is not the main point because the data should be small (say 10 tables of 10 000 rows).

Thank you

joey_the_snake · September 9, 2022, 2:28pm

For your distributed scenario I would personally use a hosted Postgres instance, if you want ease of deployment. Remote calls to SQLite are not really where it shines, but you can read some ideas here: SQLite Over a Network, Caveats and Considerations. Keep in mind this is their own advice:

Choose the technology that is right for you and your customers. If your data lives on a different machine from your application, then you should consider a client/server database. SQLite is designed for situations where the data and application coexist on the same machine. SQLite can still be made to work in many remote database situations, but a client/server solution will usually work better in that scenario.

lud · September 9, 2022, 6:51pm

Hi thanks, the problem with postgres is that I should be able to create and delete databases dynamically, and handle proper permissions. It could be done but it does not seem very practical.

joey_the_snake · September 9, 2022, 6:55pm

You could look at using schemas as well: PostgreSQL: Documentation: 14: 5.9. Schemas

cjbottaro · September 10, 2022, 8:00pm

Agreed about Pg schemas. That’s the way to go when doing multi tenancy with Postgres. Caching is done per connection, and connections are per database. So you want long running connections that just switch between schemas.

It’s not that bad. We have a template schema that we dump/load on new tenant creation.

We did have to make our own multi tenanted Ecto migration mix tasks though.

kanishka · September 11, 2022, 4:08am

Essentially you are building a new database system, which may be able to reuse existing database systems in the implementation.

If the schemas are simple and fairly similar, you could restrict the freedom and define all the allowed tables up front and allow some small level of customization through user defined extensions fields on some tables.

If each tenant needs the full power to define their own schema, then I would either direct your users to use a cloud provider that provides hosted database creation and then add value by creating command line utilities to help them run specific tasks using the cloud providers batch or short lived procedure system, or pick one cloud provider and provide command line utilities to make it easier to create databases as well as run tasks.

lud · September 11, 2022, 1:03pm

@cjbottaro @kanishka Yes the schemas will not be know when creating user databases. We really need number/text/date data types, nothing fancy. But that can be emulated on top of postgres large datatypes (bigint, clob, datetime).

I am not comfortable having 100 or 1000 schemas in my database, but that is because I don’t know postgres very well. I will look into it.

Essentially you are building a new database system

My goal is to avoid that. This is why on a single machine, SQLite is perfect: no need to build your own database system, you just use the existing, the only thing you have to know is the file to use for a given customer.

Something like Couchbase could do the work but there is a limit of 1000 collections if I read well.