Different apps, shared database, different schema... a good idea?

abitdodgy · October 17, 2019, 12:51am

We would like to use a shared database for a number of separate applications. These apps are completely separate (separate repo, deploys, etc…). Many of these apps have interdependencies between their data. For example, an application for creating and sharing project estimates would need access to product data in a catalog app that furnishes an API. Also, all of these apps would need authentication, which would be handled by an identity app that keeps user data.

There are several approaches we can take:

Integrate the apps over APIs. I don’t like this idea because it might mean recreating tables and schemas in two separate databases and then worrying about keeping data in sync. For example, we might need to create a user and product tables and their schemas in both apps. Although, an API serves as a contract and could make the integrations more robust.
Keep all apps in an umbrella or monolith. I don’t like this option because these apps deal with different business concerns, and while they might share underlying data, they implement this data in completely different ways.
Separate apps, shared database. I don’t like this because, unless you are mad, you would need a dedicated app to manage your repo and migrations.
Separate apps, shared database, separate database schemas. This is my preferred option.

I prefer option four because it’s easier to work on and maintain separate applications. Separate schemas ensure that each app can manage its own database. But a shared database makes it easier reuse data, setup joins, and database constraints or maintain referential integrity. It’s as easy as joining a different prefix.

For example, an estimates.cart (schema.table) could have an owner_id foreign key that points to identity.user. And estimates.line_items can have a product_id column that could point to catalog.products.

Does this sound like a good idea? Has anyone done this type of architecture? What disadvantages (other than some schema data being shared between migrations) do you see?

Would you go down this route, or would you recreate data where needed? For example, multiple user schemas and so on…

anthonator · October 17, 2019, 2:49am

We’ve done a combination of 2 and 4.

We have an umbrella app that uses a shared database and the data layer of each app is separated by Postgres schemas.

Our approach used the onion architecture. We have 3 different layers within our application.

infrastructure - This layer is where different infrastructural applications live. Our repo lives in this layer. An event bus. Etc.
context - We keep all of our business logic in this layer. These applications expose their functionality through a public interfaces that encapsulate the business processes for each application.
service - These applications are how the outside world interacts with our application. This is where our REST and GraphQL API’s live.

The dependency flow within our application looks like this.

[service] —> [context] —> [infrastructure]

This means that our service applications can directly interact with our context and infrastructure applications but they cannot directly interact with other service applications. Context applications can only directly interact with infrastructure applications and infrastructure applications don’t directly interact with anything. If information needs to be be passed to applications in the same layer or the layer above then we dispatch events using a bus. Applications in the same layer or the layer above will never depend on each other. This helps make the dependency chain and information flow between the various applications easier to reason about.

The context layer is where all the database querying and mutating happens. Each context application’s data is “isolated” from the other. I put isolated in quotes because they are separated by Postgres schemas. So it’s possible to query between the schemas but we have strict rules preventing that. Each application is responsible for it’s own data. Hard stop. If some piece of logic needs information that typically comes from another context application’s database then it’s queried for in a service application and passed in as an argument to a function. We essentially pretend that each Postgres schema is a separate physical database.

We’ve been using this approach for about 2 years and it’s worked very well for us. It’s made it easier to manage the application as it’s grown. It also promotes loose coupling which makes the application easier to change as well.

hauleth · October 17, 2019, 9:17am

What do you mean by the “shared database”? Is this “shared DB installation” or “shared DB within Postgres with separate Postgres Schemas”? If first one then I would say that this is almost no different from different DB instances on different machines.

abitdodgy · October 17, 2019, 12:13pm

I mean one database where each application has its own Postgresql schema.

Don’t you think that’s is a little different? You can query and join in between schemas, maintain foreign keys, and avoid having to duplicate data.

I can reach into the identity schema and grab a user, or join on its id column, instead of having to make an API request between two apps, grab the user data, and store it in a second db.

abitdodgy · October 17, 2019, 12:16pm

Do you declare the schema for each application using @schema_prefix? As far as I know, you can’t do:

  use Ecto.Repo,
    otp_app: :foo,
    schema_prefix: "some_app",
    adapter: Ecto.Adapters.Postgres

So how do you declare the schema within each app? Query time, or module attribute? I’m wondering it it’s worth opening an issue in Ecto to request a schema_prefix option in the Ecto.Repo macro.

al2o3cr · October 17, 2019, 1:11pm

These two statements are in direct conflict: if you’re pulling records from a schema that belongs to another “separate app” you’ve created a coupling between the code that reads that data and the code that writes it. For instance, renaming a column in identity will have to be handled with coordinated deploys of every caller that uses the renamed field.

IMO this is a good use case for an umbrella app containing multiple Ecto.Repo applications.

abitdodgy · October 17, 2019, 1:25pm

That’s true. I alluded to that in my original post. There would have to be some mechanism to ensure that these changes in coupled apps don’t create problems.

My problem with using an umbrella app is that we would have to build several web projects into the same application. We have to worry about assets, whether shared at the umbrella level or for each app, deploying apps on different ports, websockets, and other complexities that could arise, etc…

Another issue I have is that, while undoubtedly some of the data is shared, much of it is only relevant to the context of a single application. For example, we have a shipping api that needs user management, but none of its data is in any way relevant to the other applications. It receives weight, distance, and store, and spits out delivery methods with prices.

Great point, though.

LostKobrakai · October 17, 2019, 1:47pm

In an umbrella project you can build releases with just a subset of applications included. So you can have all your different projects and their shared dependencies in an umbrella and still have their releases only include the parts they actually need.

anthonator · October 17, 2019, 2:02pm

The repo is shared between all applications so the prefix is declared in each schema.

We share the repo so we can use the same connection pool between each application.

abitdodgy · October 17, 2019, 2:35pm

How do you handle joins between schemas in different applications then? Or do you just not do that? It’s easy to couple everything with has_many and has_one and belongs_to.

anthonator · October 17, 2019, 2:39pm

We essentially pretend that each schema is a separate physical database. So we don’t do joins between schemas.

abitdodgy · October 17, 2019, 2:43pm

What do you do when you need data from two schemas in this case? For example, say you want to get a product from a catalog schema and its creator or editor from a users shema. Things could get complicate fast. One of the things that joins and associations do well is sorting, linking, and organising data.

anthonator · October 17, 2019, 2:53pm

What’s hard with this approach is drawing your boundaries. If you have two tables that rely on each other then they may belong within the same application. If you feel like you need to create a join between two applications that could be a signal that your boundary is incorrect.

abitdodgy · October 17, 2019, 8:18pm

I think it’s an impossible task to draw your boundaries objectively since there is no single criteria that can tell you right from wrong. For example, I work for a multichannel commerce company. We have several business systems that could be thought of as coupled or independent. This is an extremely oversimplified example:

A PIM (product information management) system for registering and distributing product data to a number of other applications.
An e-commerce website.
User management for said applications.

The PIM is used to create and enrich product data. From poster descriptions, to label information, all the way to web descriptions and digital assets. The PIM is responsible for a hundred or so attributes and only a couple of dozen of those are used on the e-commerce website.

But, fundamentally, they both make use of the same data. Products and prices, along with other tables specific to each system. They also share the same users.

If you separate them, you would have to duplicate and then synchronise shared data. If you couple them, then you would be writing two fundamentally different systems that happen to share a subset of data.

Whether you decide to decouple them entirely or within an umbrella doesn’t matter. An umbrella merely brings some conveniences. You said you treat schemas as separate databases. So, in this case, your approach would be to have a products table in both apps, a user table in both apps, etc… you might run into issues of session management, which user to authenticate, track, and and all sorts of other stuff, but I don’t know if these drawbacks are that serious.

There isn’t an easy answer here. Coupling the schemas could reduce duplication, but perhaps it’s not worth the hassle.

PS: We actually have an e-commerce, estimates, internal sales, and telesales systems. We also have an ERP connector, users, BI automation, marketing communication with suppliers, shipping app and api, customer registration app, and more. Sometimes you can say it’s all related. E-commerce needs customers and products. So do internal sales systems. So does an estimates app.

Deciding whether all this stuff belongs to the same app or in separate apps and databases is tricky.

chouzar · October 21, 2019, 6:14am

Sounds like a current design I’m dealing with (just a toy project), basically I have an app that prints documents and a CRUD app which manages users. I need some of the user information to “stamp” the document and print it, so the document app contains both document and user schemas.

User App		Document App
		Document schema
User schema	—>	Sign Schema

Yes, this means that data is duplicated and inconsistent between apps; this is fine, the document app will retrieve user info and transform it into a “sign schema” which is later stored into its own document database.

For practical purposes all schema relationships will be kept exclusively to the bounds of the app, my only SQL joins will be between the “document” and “sign” tables. It may be a naive strategy for the medium or long term but I like how this design is settling down.