Bullet proof way to enforce tenants with a foreign key?

Turns out this is a tricky problem.

What I’d like to do is have tenancy at the table level with a tenant_id. That’s easy enough, just add the column and an index, and you’re done!

But the real problem is enforcing that right. If you ever forget to scope a query to the correct tenant you’re going to accidentally leak customer data and that simply cannot happen.

Then there’s also the case of wanting certain tables not to be scoped by a tenant.

Is there any way to do this with Ecto to protect all forms of reads and writes? One that ensures any query gets modified to scope the tenant in without having to add it to every query you run.

I know there’s https://hexdocs.pm/ecto/Ecto.Repo.html#c:prepare_query/3 but from the looks of this it might not be bullet proof because how would you account for scoping the tenant in joins or preloads?

I know postgres has schemas and there’s also using separate databases for each tenant but neither of these are strategies I want to use.

Rails solves this with acts_as_tenant but I haven’t found anything comparable in Elixir.

2 Likes

There are two parts of the puzzle if you’re looking for something like acts_as_tenant.

One is changing the actual queries, which I’ve no idea how complex it would be. Ecto.Querys itself are quite a pain to inspect, but it might be possible.

The other is the “in this context” part, which in elixir you can do with the process dictionary if you’re always on a single process or you transfer the data to collaborators. Similar to like e.g. the ecto sandbox tells processes, which db connection to use.

Neither seems really like a functional approach to the problem, which would more in line with passing the tenant to whatever query it’s needed and apply the filtering to the query. Enforce it through tests.

Pretty sure using schemas with postgres, and leveraging ecto prefixes is the closest to bullet proof you’ll get…

If you make any mistake using just a column, you risk leaking data. If it was my data, rather not take that risk…

1 Like

I think this is what prepare_query lets you do, but the docs don’t include any examples on how to apply this to preloads and joins.

Right, I was afraid of this. This still depends on humans not making critical mistakes, and also writing a massive amount of extra tests. You end up having to duplicate tenancy enforcement checks on every query you write instead of writing a very good test suite for the bit of functionality that handles tenancy enforcement.

Yep but using schemas introduces a lot of operational complexity and potential runtime performance issues down the line. I’m avoiding them mainly for the operational complexity.

2 Likes

Yea that was also my concern…

Was actually pretty easy to break it out into tenant / public migrations.

I have a public tenants table, that is where I record the existence of and metadata for the tenant. The function used to create a tenant includes the logic to create the tenant schema, and execute the migrations + seeds for it.

On the delete side, I have a trigger on the public tenants table, when deleting the tenant row, will delete the corresponding schema.

Is that the operational complexity your referring to?

I would rather have a more explicit, and not ‘easy’ solution that guarantees this mistake is not made…as the repercussions of getting it wrong just once could mean the end. I might be confident in my own ability today to ensure the tanancy is enforced all the way down…but future self, or a different developer might slip up…

Maybe give this a try:

1 Like

I didn’t know ecto actually gave in and added a callback mechanism. I had Ecto.Changeset.prepare_changes in my mind when reading your initial post.

To be fair I would expect it to be even harder to write a bullet prove tenancy enforcement on top of ecto queries.

Even just dealing with the many ways of creating bindings for ecto queries.

  • from a in "table_name"
  • from a in Schema,
  • from a in {"table_name", Schema}
  • join: b in fragment("SELECT * FROM wherever")

or even

@raw_sql_category_tree """
SELECT * FROM categories WHERE c.parent_id IS NULL
UNION ALL
SELECT * FROM categories AS c, category_tree AS ct WHERE ct.id = c.parent_id
"""

Product
|> recursive_ctes(true)
|> with_cte("category_tree", as: fragment(@raw_sql_category_tree))
|> join(:inner, [p], c in "category_tree", on: c.id == p.category_id)

The biggest problem being that Ecto.Query.t is basically completely an opaque type. So even getting all the bindings is currently only possible by using private api. Therefore I’m actually quite shook for prepare_query, because the current state of query inspection possibilities makes it kinda not really useful.

Comparing this to acts_as_tenant however is not really useful not matter what. Working on the active record model layer is likely a bunch simpler than the complexity of more involved ecto queries.

2 Likes

might be a radical thought but what’s preventing you from deploying 1 database per tenant in today’s cloud crazy world?

2 Likes

Dealing with multiple migrations and prefixes. I want to try and avoid that. Running migrations against 1 database / schema is complicated enough!

I found that too. It only warns you when you occasionally forget to scope something. If it covered more use cases it would have been a nice solution but as is I can’t see myself using it.

Yep, but it’s just 1 more thing added to the list of things other frameworks can do but we can’t.

Running 1 DB is hard enough. Adding separate database servers isn’t really something I’m up for doing. It wouldn’t be cost effective and the operational complexity is too high.

In other web frameworks slapping on and enforcing a tenant_id was usually a nice combination of easy to implement, doesn’t drastically change how a core feature of your framework works (multiple migrations and DB connections) and scales very well for most use cases.

Of course we can do it here too, but it’s sounding like we won’t have any way to ensure guarantees on the tenant which makes building this app with Phoenix / Ecto a questionable decision at the moment.

1 Like

For what it’s worth I’m dealing with a similar issue right now where I’d like to avoid prefixes. I’ve read several articles about people facing scaling issues for their migrations when they have a lot of tenants. Not to mention prefixes don’t really fit my use case, only some tables are tenant aware and I don’t want to run a server per tenant as they are all pretty small, not sure how to reconcile those two constraints with prefixes.

So far the way I’m handling it is by having a query module for each schema, and a private function to build the base query in each one. All the other queries are build from that base query, which requires a tenant schema if the schema is tenant aware itself. I am still in early stages so no idea if it’s going to work out or not yet. I have yet to deal with very complex queries

2 Likes

Either route has pretty significant tradeoffs. Seems no answer can be bullet proof, sadly…

Pretty sure ecto/phoenix should not prescribe either option for that reason, way too many nuances…

Wonder if new versions of the engines (pg, MySQL) are ironing out some of the pitfalls

Have you considered enforcing it at the DB with row level security policies ?

Once configured, your queries will return no rows by default, reducing the impact of human errors.

3 Likes

Yep. I looked at it briefly but there’s not enough practical examples or Ecto specific examples floating around for me to feel comfortable using it. This isn’t something I want to pioneer or spend ages struggling with instead of building the app. One day I could see this being the norm tho.

That’s why a FK tenant_id is so appealing for now. It’s everything we’ve been doing before with an extra indexed where clause.

1 Like

This is comparing apples to oranges. Ecto is not a replacement for active record and it doesn’t want to be.

Ecto has all the means for someone to building different querying means on top of it. Ecto.Queryable is a protocol, so one can create e.g. a library, which builds up custom queries. Those could have means of inspecting the query and altering it in various fashions, or some means of callbacks hooking into their creation. It could limit the means of query composition, so it’s not as much an explosion of possibilities like plain Ecto.Query. Only when actually putting such a custom query into a Repo function it does need to be converted to an actual Ecto.Query, by the protocol implementation.

Therefore it’s less a matter of “can’t do” and more a matter of nobody having build the higher level abstraction on top of ecto by now. You just seem to be expecting this higher level abstraction to exist already.

“Raw” Ecto.Query need to work at a lower level, as that’s what db drivers deal with. There are no models, there’s no “query builder” to hook into, they’re just ad hoc queries. And those queries are way more powerful (but as a result also more complex) than what active record can do, because it’s super close to actual sql. Try writing tenant enforcement for actual sql and it’ll be a similarly hard problem besides the additional need for an sql parser.

2 Likes

Hi, this isn’t as bullet proof as getting Ecto to modify/enforce queries, or use Postgres schemas, or separate databases for that matter, but I’ve had success designing a multi tenant system using tenant_id (I called it org_id) foreign keys, and adding extra foreign key constraints on all the tables that use org_id, eg:

create table(:orders) do
        add :org_id, references(:orgs), primary_key: true, null: false
        [...etc...]
end

create table(:order_items) do
        add :org_id, references(:orgs), primary_key: true, null: false
        add :order_id, :integer #references(:orders)
        add :item_no, :integer
        add :product_id, :integer
    [etc....]
end

execute "alter table order_items add foreign key (org_id, order_id) references orders (org_id, id) match full;"
execute "alter table order_items add foreign key (org_id, product_id) references products (org_id, id) match full;"

Most tables have a primary key of an item specific autonumber field, plus the org_id as too. The extra foreign key matching constraints mean when you reference for example order_id in the order_items table, the org_ids must match too. Then in joins you must remember to put org_id in the joins condition.

You can use “match simple” if you have an optional field in a table that must refer to something in your org, or be null.

This at least stops you ‘crossing the wires’ of adding an order for a product in another org’s products.

This is for a small one developer SaaS type app, so I don’t anticipate thousands of orgs, users, gigabytes of data etc, plus I can trust myself to try and write the queries with the right joins/where clauses etc.

I used this approach on a previous app using nodeJS + Postgres.

Anyway just my 2 cents worth.

5 Likes

I think it’s apples to apples because end developers like us are developing web applications and we choose things like Rails, Laravel, Phoenix or something else based on what the language and framework provide us.

If an underlying database library in 1 framework isn’t suited towards providing a feature that’s common in a ton of SAAS apps but it’s available in other frameworks, that’s directly saying “ok Phoenix is not catered towards creating this type of application while other frameworks are”.

You’re technically correct in that you can’t compare Ecto to AR, but in practice it doesn’t matter. It really boils down to “Rails can do this but Phoenix cannot”.

Yep this is the route I planned to take. It’s just risky because it’s so easy to forget to scope something, and even with tests there’s a human risk element.

Good tip about “match” tho, I never knew that existed. Thanks!

2 Likes

What you’re asking for is imo not a feature. Nothing prevents you from writing a web application with tenancy in phoenix. You’re asking for a specific kind of solution, which just happens to be not as simple to integrate with a database library, which is marketed as “not an ORM” basically everywhere. Imo this is exactly the “tradeoffs” programmers always speak of. You get some you loose some. It just feels like you’ve yet to find the upsides in your decision to use phoenix.

4 Likes

All your posts I’ve seen on this forum somehow end up in your conclusion that Phoenix/Elixir are bad and cannot be used in production. I’m beginning to think you are a detractor who just fakes interest to praise his own solutions.

1 Like

There’s no need to accuse anyone of being a bad actor, just because they’re hitting the not so shiny spots of our stack of choice.

7 Likes

Are you aware of any potential issues/implications of using match full that lead to this statement? I’m probably going to need to deal with multi-tenancy at some point and this seems like a nice extra protective measure.

1 Like