Best approach to use a project with its own database/Repo as a dependency in another project

antoniosb · July 8, 2019, 9:13pm

I have an elixir project that does some crawling and saves the crawled data to a postgres database, called Tenjin.

Tenjin contais all the logic to perform the requests, parse the responses, validate the data and input it to the database using its own entities.

Now I want to create a graphQL api that will serve the data crawled by Tenjin. As I may use Tenjin in another projects, I don’t want to create the API in the same project, thus not coupling the crawling to the API.

Let’s say the api project is called Omoikane. I want that Omoikane can call and use Tenjin modules and functions in order to serve the data that is available on Tenjin.

But the database configuration and crawling settings are all defined at Tenijn, how can I build Omoikane as a complete separate project, perhaps with its own repo and database, but still using Tenjin as a dependency or so?

Thanks in advance <3

idi527 · July 8, 2019, 9:33pm

Maybe you can separate the repo into its own app and make both other apps depend on it.

apps/
- repo
- tenjin (depends on repo)
- omoikane (depends on repo)

You’d still be able to package them separately since the common component is extracted from both of them.

antoniosb · July 8, 2019, 9:52pm

It would need to be an umbrella app so ?

axelson · July 8, 2019, 10:42pm

Can’t you just make Tenjin a dependency of Omoikane?

antoniosb · July 8, 2019, 11:03pm

@axelson, this is what I am considering, but How this would work in production, for example?

Tenjin will need to be a separate deployable application, and I will need a way for omoikane to depend on it, being another deployable application. Does it makes sense?

idi527 · July 9, 2019, 3:07am

It would need to be an umbrella app so ?

Not necessarily. However, if you are ok to access the database in Omoikane via Tenjin, there isn’t really a problem with making Omoikane a direct dependency on Tenjin.

al2o3cr · July 9, 2019, 5:27pm

These two desires are directly in conflict with each other: if you’re sharing code and a database the projects are tightly coupled together - putting them in separate repos doesn’t remove that coupling, it just makes it more complicated.

You have a couple options:

as mentioned by others, put both applications in an umbrella and have Omoikane depend on Tenjin
extract the relevant modules and functions to a third place and have both the Omoikane repo and Tenjin repo use that as a dependency. This approach STILL tightly couples the two together via their expectations about the schema, but their code is separated
create a public API (HTTP? Protocol buffers? grpc?) as part of Tenjin that Omoikane can interact with

Going down the list, each of these options requires writing progressively more code to implement but they also provide progressively more isolation between the applications.

antoniosb · July 11, 2019, 3:24pm

@al2o3cr thanks for your detailed answer!

I agree on that conflict, and I confess I’m having a hard time figuring out how these two things would be feasible.

But as you detailed it above, it cleared out the path for me, thank you!

I’m more inclined to go on with the umbrella approach for now, since I have nothing on short term that could justify the burden to create a communication layer at Tenjin that could be used by multiple projects.

And I understand that, when that time comes, it would be little harm to extract Tenjin from the Umbrella and replace its touching points at Omoikane (since these interfaces are well defined there).

Do you think this is a nice path to go?

peerreynders · July 11, 2019, 5:13pm

That sentence points towards this solution:

Clearly Tenjin has been implemented as a single project
Omoikane calls some Tenjin modules
The functionality currently covered by Tenjin falls into two broad categories: 1. crawling 2. data management
Its unlikely that Omoikane will need access to “crawling”. Furthermore it may not even need access to “inserting” or “updating” data.
“the touching points at Omoikane” is identifying a third service - common to both Tenjin and Omoikane. That service should be as narrow as possible in order to minimize Omoikane’s coupling to Tenjin.
Ideally you should be able to deploy DB + “the service” + Omoikane without the rest of Tenjin to have your functioning API - the fact that it is never going to be deployed this way is irrelevant - the point is to minimize coupling and to make it very clear what functionality is common so that it is clear when Tenjin is maintained which changes will affect Omoikane.
From a very high level “the service” looks a lot like a Repository - this is not to be confused with the Ecto repo - the Ecto repo would be part of “the repository”
Given that Omoikane may only require read-only access, you may also have Command Query Separation going on
- I’m not talking about a full blown CQRS system. I’m talking about one module (and possibly multiple support modules) that is only used by Tenjin for inserting/updating data. Then a second module (and possibly multiple separate support modules) that is used by both Tenjin and Omoikane for fetching data.
- Omoikane may not require the detailed data access that Tenjin needs - so there can be an argument for a two versions of the second module, one for Tenjin, the other (“less detailed”) one for Omoikane - and both versions would likely share a significant number of the support modules. Again the idea being to couple Omoikane only to the significant parts - not to the parts that only Tenjin needs.

You want to enable a deployment where

One node runs the DB
A second node runs “the service” + Omoikane
A third node runs “the service” + Tenjin

while designing it so that initially you can run

DB + “the service” + Omoikane + Tenjin on a single node

Bottom line: “touching points at Omoikane” identifies a commonality within Tenjin that should be factored out for the sake of future maintainability; there is value in identifying that commonality and factoring it out now (it identifies a boundary that is worth exposing/segregating at least at the module level of a possible repository application. What you are currently calling Tenjin is actually Tenjin + repository).