Two different Ecto repos sharing the same database?

markmark206 · October 31, 2022, 6:47am

Would it be problematic for us to use the same postgres database with two different Ecto Repos?

# This Repo belongs to our service. We manage its schema and migrations.
config :my_app, MyApp.Repo,
  database: "db",
...
# This Repo is used by a package our service depends on.
# Its schema and migration cadence are outside of our control.
config :journey, Journey.Repo,
  database: "db",
...

Things seem to work fine in my tests, but I wonder if we might run into problems down the road.

Thank you!

Aetherus · October 31, 2022, 6:50am

I think there are 2 management problems:

you have to keep all migrations in one repo.
you have to keep the schemas in both repos compatible.

benwilson512 · October 31, 2022, 6:58am

There seems to be a lot to unpack in this question. Why are there different, unrelated sets of migrations happening on a single database? How do you avoid changes that disrupt each other?

There’s no fundamental reason you can’t do this, it just seems like design with a lot of practical pitfalls.

zachallaun · October 31, 2022, 12:03pm

One option that would provide more safety would be to use different prefixes for each repo. This would provide separation while still not needing to manage infrastructure of two databases. I could imagine this being useful if you have e.g. umbrella or just very clearly separated application components and you want to fully decouple the data layer between them and ensure that neither ever “reaches in” to the other.

derpycoder · October 31, 2022, 1:13pm

As @zachallaun suggested, perhaps using two prefixes for each repo would provide separation.

That way, the package won’t interfere with the main db. However I have zero experience with this scenario.

P.S. Never thought an application itself will become multi tenant within its own database!

al2o3cr · October 31, 2022, 2:22pm

The biggest hazard with a setup like this is if users call functions inside a transaction on SomeApp.Repo that successfully does an operation on Journey.Repo and then subsequently rolls back, “leaking” the committed Journey operation.

I’ve encountered similar behavior with job-queueing on a separate DB connection. That manifested as jobs appearing containing IDs of never-created records!

zachallaun · October 31, 2022, 4:24pm

To expand on this, that’s a hazard when dealing with any external API. You’d want to treat this second repo as if you’re dealing with a totally separate entity, which means you have to do the cleanup manually. If you have a long-running transaction and shoot off a random POST request to some API in the middle of it, you of course wouldn’t expect them to know if your transaction fails.

markmark206 · October 31, 2022, 7:08pm

Why are there different, unrelated sets of migrations happening on a single database? How do you avoid changes that disrupt each other?

Ah yes, these are great questions.

For a bit more context, the external package (Journey) manages persistent workflows, and we use it to orchestrate the “lifetime” of customers of our service (e. g. conditionally sending each of our customers a daily report).

Journey.Repo contains Journey’s internal data, and we don’t really care what happens there. We are currently configuring Journey.Repo to use its own database (let’s call it journey_db), separate from the service’s own db (let’s call it my_service_db).

In the process of operating the system, we noticed that

the schema of journey_db is quite simple (again, none of our business, but…), and
when troubleshooting things, we sometimes get a quick glimpse at the state of a customer’s workflow by running a quick query on journey_db.

This query might look similar to: SELECT state FROM journey_db.wf_steps WHERE step_name='daily_report' and reference='CUSTOMERID123'. In this example CUSTOMERID is a concept from MyService / my_service_db world, and it is, essentially, just a “reference number” in Journey’s world.

Obviously, we are reaching into the bowels / internal data structures of another component, so we are not architecting anything around this hack, this is just a time-saving hack for when a human is investigating / troubleshooting things.

So we started wondering

if pointing Journey.Repo at my_service_db, and having just one database would give us simpler operational ergonomics (fewer db’s to operate / backup / restore, simpler “cross-db” troubleshooting queries for humans), and
what would be the downside (besides, perhaps, potential conflicts in table names and migration versions across the two Ecto repos, and less resistance against the temptation of coupling with another component’s internal data;).

benwilson512 · October 31, 2022, 8:36pm

If you’re looking to combine data across these, then I think distinct schemas would work pretty well here to make querying easy and make it hard to have conflcits.

Another option is to use a foreign_fdw. In this way, you have two distinct databases that can even be on different database servers, but you can basically expose the journey_db inside your main db as a schema. It’s pretty useful for cases like this, but takes a bit of setup.

markmark206 · November 2, 2022, 7:44pm

Ok, thank you, and thank you everyone for your insights, and for chiming in!

Yes, thank you for mentioning postgres fdw. It is a fine option, but the complexity / overhead it adds is hard to justify for our use case.

So far, my takeaway is that it doesn’t sound like there any fundamental / obvious reasons why placing two Ecto repos onto the same Postgres db wouldn’t work. To remove the risk of table name / migration version conflicts, we could maybe look into migration_source and migration_default_prefix.

I don’t know if we will actually do this, but it’s great to know that we have the option.

Thank you, everyone!