Phoenix for SAAS application

Hi everyone we are thinking of using phoenix for our SAAS application. use case is pretty tricky
we need separate database for user
eg :- if user xyz registers they will have seperate subdomain xyz.app.com ==> pointing to same phoenix application but using separate database. (xyz_app_db) how we use dynamic db in phoenix? is there better alternative? thanks in advance

4 Likes

As far as I understand, there is no easy way to do exactly that in Phoenix/Ecto. But one could argue that the multiple databases approach is not best for the scenario either.

Postgres comes with a concept of logically separating the sets of data using schemas: https://www.postgresql.org/docs/9.1/static/ddl-schemas.html

and Ecto does support some of that natively, but has also a nice addon called apartmentex that makes working with schemas easier.

https://github.com/Dania02525/apartmentex

I think this is preferred approach rather than dynamically specifying connection parameters/databae URL per tenant.

5 Likes

It’s a perfectly fine pattern to use for SaaS but you will have to use Postgrex directly without Ecto given your use case it might be a good idea in any case.

3 Likes

People generally use multiple DBs when the approach your outline would not be enough e.g. we had a service that would have 300GB tables for a single table and a single client the whole DB for one client could be close to 1TB so using a single server was not an option.

2 Likes

thanks thinking about same

1 Like

I use elixir for exactly the same approach (subdomains and separated organizations). We however just use one database, and all resources reference an organization or an organization_user (in our case the higher level user can join multiple organizations). When scaling, we can easily shard databases, but we do not have these scaling issues yet.

What is the reason you would like to have multiple databases?

2 Likes

I’ve built quite a few SaaS systems and we did the multi-database approach too. This decision has been biting my behind ever since in one of the projects. At some point you will need to query across the tenants to do some reporting, maybe some validation, maybe enforce some other business rules. And with multiple databases it is quite painful to do so.

PostgreSQL can do sharding/clustering quite nicely. It does have some nice modules that can be really helpful too, one example is https://www.postgresql.org/docs/9.3/static/postgres-fdw.html
which basically allow you to set up a “proxy database” that will pull some tables from external data postgres servers.

Having said that, you first need to figure out if you will have a problem. I believe a lot of SAAS systems do run just fine on one database. Maybe there is no problem and we’re “solving imaginary scaling issue”.

3 Likes

Regardless of whether multiple databases is the way to go or not, I believe that Ecto has an overly rigid interface. Preferably, I’d like to be able to start a repo without needing to define a module, or provide App env. Something like the following would be nice:

Ecto.Repo.start_link(connection_opts, gen_server_opts)

And then to use it, do something like:

Ecto.Repo.one(pool_name, query)

Perhaps that’s already possible today, but it’s not obvious how we can do it.

In any case, I feel such interface is much more flexible, and could accommodate various scenarios, including dynamic addition/removal of repositories at runtime. The current interface (with a dedicated module and app config) could still exist as the icing on the cake on top of the plain functional interface.

8 Likes

initially i was curious about security issue if we use single database later i found it will be hard to maintain and single database can scale pretty good.

It compares all them them. can you share some of your experience how it is working for you and why you choose single database for all tenants?

1 Like

Unfortunately it’s not possible right now, and I agree it has a potential of simplifying a lot of dynamic uses. Unfortunately that would probably mean maintaining a whole other set of APIs. Also - currently transactions are handled through pdict, you’d need to explicitly handle the connection for this case, which would be troublesome for things like Ecto.Multi that don’t expect this.

I’m not saying it’s not possible to do that, but it would be quite a lot of work to change ecto (at least internally) to passing all this state around.

Michał.

4 Likes

Through pdict of which process? If it’s a repo process, then note that I’m not suggesting we should drop that. I’m just proposing such process should be powered by the generic module, rather than by particular module.

Of course, I’m not familiar with internals, so no idea how complex this would be, and what are the trade-offs.

I mentioned recently elsewhere on this forum that I’m not happy with how library authors tend to sometimes reach for app env and even require concrete modules, where a generic module that takes some parameters would suffice. So my issue with the current approach of Ecto is not just about supporting dynamic use-cases (which are admittedly esoteric and rare), but also about setting a proper example. Both Phoenix and Ecto require app envs for setting parameters, and I wonder to what extent does this influence other library authors.

2 Likes

Honestly for it was an very easy decision. Needing a DB per tenant is a pretty exotic situation. As we plan more than 5-10 tenants that didn’t make sense. For scaling I don’t see a problem until we hit massive scale, and then I can hire people that are smarter than me :wink:
If you have a use case that for some reason needs a db per tenant, why don’t you also deploy a separate app server per tenant, then you don’t need ecto support.
Easy support in ecto for multiple databases would be welcome although mainly for other use cases like connecting to a analytics database.

2 Likes

When it comes to security I wrote a rule based authorization library, that authorizes every action on the db. First check is always denying other organizations.

1 Like

That is so awesome, that is how I bring in an Oracle database through PostgreSQL. ^.^

yes, the foreign data interface is quite awesome. You can hook up a file, a CSV, oracle database etc. and make it look like it was a local table/tables inside postgresql.

https://wiki.postgresql.org/wiki/Foreign_data_wrappers

I am curious, do you use Ecto with Oracle this way? It should work…

2 Likes

I do, yes. :slight_smile:

2 Likes

Better than having a single database per user, I recommend sharding your PostgreSQL, specially using PostgreSQL schemas. You can use my library Ecto.InstaShard, which allows you to use an arbritary number of physical and logical (schemas) databases, following Instagram’s pattern: https://github.com/alfredbaudisch/ecto_instashard, while still using Ecto Repositories and Schemas (the library dynamically creates Ecto repositories).

Instagram presentation: http://media.postgresql.org/sfpug/instagram_sfpug.pdf – this is how Instagram scaled to billions of posts using only PostgreSQL.

The library is rough around the edges (it was my first Elixir piece of code around 8 months ago and I open sourced it recently), but we at https://fredbots.com are using it to store roughly 50,000 msgs/seconds (note that we save in batches, but messages are distributed across only 2 physical databases and 2048 logical shards).

4 Likes

For what it’s worth, the reporting use case is when you’d want to use something like RedShift. Transfer data into it, run all your reports, tear it down.

Is the library you wrote for authentication per organisation open source? If not, I’m considering rolling my own…

Hi @themarlzy, sure. I have published it as authorize. Let me know what you think!

1 Like