Umbrella project: applications inter communications, direct function call or message passing?

I’m working on a project that involves a phoenix frontend, an ecto backend, and scrappers.

The project is generated with phoenix 1.3 umbrella generator. So the frontend directly calls functions in the backend, and has the backend present in its deps, with in_umbrella option. Which I can understand as being a good solution as long as they are both on the same node.

Now I’m adding my scrappers, they should be able to interact with the backend, but it is not going to be on the same node. I need my scrappers to be able to move on different nodes, for example in the case when their IP address gets banned.

So if I understand right, I can’t add the backend in the scrappers’ deps, or it would start the backend app in the scrappers nodes. (Actually I’m not even sure that would be a problem…)

I wondering if I’m meant to create a GenServer in the backend that would act as some sort of a proxy for the nodes that can’t directly call its functions ? (and then add the backend in the deps, with the apps: false option)

Which also poses the question in the case where you’d like to have both phoenix frontend and backend on different nodes ?

A rulle of thumb: Use modules with functions to separate your concerns. Only use processes to allow concurrent execution, where useful.

Saša Jurić’s article to spawn or not to spawn goes into more detail on this.

However, if you have scrapers on external nodes and you want to be able to control which node’s scraper is called, then indeed, processes and message passing are what you are looking for.

Thanks for the article Qqwy.

My question is more like: “If i’m forced to split my apps across nodes, what is the best to do in my case ?”

And my case is as follows:

I have 3 apps in an umbrella project.

- backend
- backend_web
- scrapper_ikea

I know that I should be able to split scrapper_ikea on separate nodes, mostly to have multiple instances in parallel with different IPs.

So now I’m wondering if I should have {:backend, in_umbrella: true} in the scrapper_ikea app ?
If understand correctly this would start the backend app too, which (as of now) is only my Ecto repo, doesn’t seem like too much of a problem to have multiple repos instances running on multiple nodes, right ?

Another option would be to have {:backend, in_umbrella: true, app: false} in the scrapper_ikea app ?
Then the backend app is not launched, but if I got it right means I’m not going to be able to interact with the database, and I’d only be able to call functions as long as they don’t rely on a particular process being present.
Which leads me to thinking, maybe I should have a GenServer launched with the backend application, that is only there to proxy calls to the repo, for use by the scrapper instances.
This way, I’d still be able to interact with the repo from the scrapper apps, but only through the GenServer API, which is kind of strange, as I’d have to depend on an app where most code is not usable… so maybe make a fourth app, only there to have this glue/proxy code ?
As you can see I’m quite deeply confused as to how to do all this.

1 Like

It’s fine to run Ecto on many nodes, you just need to be careful not to go over your DB’s max connections limit if applicable. You can tweak the DB pool size in the app config.

If you do want to do RPC between nodes, the basic mechanisms are covered in the Getting Started guide.

So at the end I ended up doing something that I’m sure is not “idiomatic elixir”, but for now I’m happy with it, so I guess that’ll do :stuck_out_tongue:

So I just created a GenServer in the backend app, that exposes higher level functions above low level Ecto db calls.

And just because I didn’t feel like having anything else accessible in my scrapper apps, I created an additional app that only has wrapper functions above the aforementioned GenServer’s functions.

The code is here on github btw

Why not use distributed tasks like in the link I gave above? The GenServer is a massive bottleneck if you expand to multiple scrapers. Any DB operation will block it for multiple ms.

Hi @dom, nothing rational really, it’s just that I have a huge respect for my noobness when I’m learning something new. I just felt like I wanted to use a GenServer, just for the sake of it. You know, demystification’n’stuff…

Now I have now doubt that, as the project (and the others) goes on, I’ll get back to this and refactor the whole thing a thousand time, and I’d still get back to it two weeks later thinking “who the fck wrote this sht ?!”.

Anyway thanks for the good (re)read guys:)

Hey @dom, ended up doing it with :rpc, obviously much better :slight_smile:

Now there is something I’m not sure. I’ve made a module to wrap the :rpc calls: https://github.com/CaPasse/backend/blob/master/apps/scrapping/lib/scrapping.ex

The thing is, I put it in a separate app, which all my scrappers will depend on.
This is to avoid having the scrappers depend on the backend app, because that would get me to a situation where the scrappers depend on an app where most functions would lead to a crash (considering the fact that the deps will be with the app: false attribute, don’t want the whole backend app to start on all scrapping nodes).
Do you think this is legit concern ?