How to convert an existing app to a multi tenancy architecture?

zimt28 · June 26, 2020, 7:10pm

I want to convert an existing app to a multi tenancy architecture and will be using Postgres schemas for this. One account will be a tenant = one Postgres schema.

So far I’ve had a single accounts table which held all accounts and their information:

public (schema)
└── accounts
    ├── contacts
    ├── orders
    ├── settings
    └── users

Now the account information should be stored somewhere within the schema boundaries, but having just one account table to store one single entry feels weird. It would look like this:

tenant_1 (schema)
├── account
├── contacts
├── orders
├── settings
└── users

tenant_2 (schema)
...

I cannot think of a good alternative and couldn’t find any examples online, not even for Rails or Laravel. Is this the way to go or am I missing something?

egze · June 26, 2020, 7:21pm

I’ve never built multi-tenant apps, but could an accounts table be located in the public schema? You need to know somehow which accounts exist, who is paid/free/blocked/etc. You could put there the schema name associated with the account. Just an idea

bigbassroller · June 26, 2020, 9:13pm

They way I am doing it is have just have a users table on public (along with billing invoice tables) and have another users table in the subdomain schema for that tenants users.

thiagomajesk · June 27, 2020, 2:25am

I’m also working with schema-based multi-tenancy in a new project. I don’t know if it helps, but this is my checklist so far:

1- Triplex to do the “heavy lifting”
2- “Tenant specific” routes scoped with the tenant name (slug)
3 - All tenant-related data is inside the tenant’s schema; the rest is considered “global” scoped and goes inside the public schema (platform configuration, views, etc)
4- Users access only one tenant “area” (~/tenant-name-slug/resource) at a time
5- We use the :prefix option in our queries, to get tenant-related data

I don’t know exactly what you are building, but I guess you could still store tenant-specific information in the public schema - depending on what you consider part of your global scope. There’s certainly no right answers.

Also, take into consideration that if you need to implement something like search or report, there are pros and cons of both approaches. For instance: queries may be faster inside the tenant’s scope because the schema will have fewer rows in total to be queried against; but if you need to collect global information about all tenants it’ll be more expensive since you’ll have to query all schemas at once.

There are a lot of approaches around these scenarios and all depends on what you want to achieve. In the end it depends on how you are separating the scopes of your application.

thojanssens1 · June 27, 2020, 2:52am

Most likely you did, just table-based multi-tenancy (use of id in the table to distinguish user), not schema-based as mentioned here.

egze · June 27, 2020, 7:37am

True that’s exactly what I meant.

yurko · June 27, 2020, 8:13am

We have a multitenant app that uses schemas, here are few basic points.

Triplex is a low level library that abstracts away few things that you will need and gets out of the way otherwise. It will help with migrations, prefix validation etc.

Similar to what others mentioned we have global stuff in public schema and a schema per tenant, these tenants are managed on the global level - public schema has a table with tenants and their prefixes.

We use plugs in tenant space to load and validate current prefix (schema) as well as to ensure the user only uses their own schema.

Most of the tenant context functions get an extra parameter for prefix which goes to repo calls (Ecto basically supports it out of the box).

We also have global users in public schema that with enough permissions can access any tenant.

All of that seems to work nicely, one thing you might want to know - it’s good if you have few big tenants, but not too many. After about 100 of them (somewhere between 100 and 900) you might experience problems running migrations.

onkara · December 6, 2020, 2:42pm

Using either Triplex or some other library is it possible to achieve MultiTenancy without using separate schema or separate database for each tenant? i.e. just one Customer table whose foreign_key is in every child table and all queries are scoped to that tenant only? This is something easily implemented in Ruby gem acts_as_tenant

mathieuprog · December 6, 2020, 3:09pm

See Multi tenancy with foreign keys

chasers · December 9, 2020, 1:36pm

This is a very important caveat to this per table tenant design. I would probably stay away from this design unless there is a very compelling reason you need to put each tenant in their own table.

wigglybacon · April 18, 2023, 2:25am

Do you have an example of this setup? Trying to do the same here.

yurko · April 18, 2023, 7:42am

Nope, that’s proprietary software so I can’t share the code, sorry.

Though I can say we now have two applications running like that, one has been in production since 2018 and is being used actively, everything seems to be working well.

wigglybacon · April 18, 2023, 3:38pm

Ok, thanks.

spacebat · April 19, 2023, 11:13pm

Something to keep in mind, as the complexity of your DB schema grows, that’s a multiplying factor of how much memory ends up being used per connection on the database, for connections that end up visiting a majority of tenant schemas. Things like table statistics get cached in the connections worker process. I’ve seen this cache approach 1GB per connection with a complicated schema and less than 100 tenants.

wigglybacon · April 21, 2023, 12:21am

Interesting point. Kinda scary. Are there any workarounds?

wigglybacon · April 22, 2023, 3:24pm

Hi:
Would you be up for doing this again from scratch?
Cheers,
Dave

spacebat · April 25, 2023, 5:49am

Have a simpler schema - every table, column and index will have statistics that contributes to the amount allocated per connection in the postgresql worker process. This means that partitioning is a multiplier. Have fewer schemas. Reduce the number of schemas that a connection visits - either by biasing a connections in your connection pool towards schemas it has already visited, or by being disconnected and a new connection replacing it in your application connection pool.
I think it’s a good idea to retain the tenant ID in your tables anyway, so that you can choose whether N tenants will share a schema, or if a tenant will have a schema to itself.