Hi,
I’ve never done any Elixir programming – just reading a bunch of stuff about it.
However, Elixir always draws my attention when I’m working on concurrent stuff in other languages; I mean… the ability to spawn 1000 of processes? That sounds just cool. And, of course, the pipe operator…
Anyway, I’m working on a quite interesting project in Python and I think, Elixir will be a great fit for it…
… and I like your opinion on a “proper” architecture.
Basically, it’s a crawler which crawls certain sites, extracting a list of entries… crawls the website of each entry and stores the result in a database.
At the moment (the Python solution), I crawl the listing site, store each entry in a list, loop through all the entries and crawling them 1-by-1.
Here are my thoughts on structuring the whole application in Elixir:
First, I start the crawling process of the listing site. Each entry found is “transferred” to another process which crawls this entry for certain things.
I love the idea of spinning up a separate crawling process for each entry found because they are not related to each other. Is this feasible?
I plan to create an umbrella project and want to separate the crawling of the listing site, the crawling of each individual entry sites and the persistence separated as much as possible.
So the crawling of the listing site would be an otp application which sends a message for each entry found with the entry as the payload.
“Someone” listens for that message (the supervisor?) and spawns a process to crawl the entry when receiving such a message.
As I plan to create another otp application for the crawling of the entries, is spawning a process the right term here? In fact it’s a whole application and not just a function that needs to get started.
Last, I send another message when an entry is crawled which is then stored via Ecto by my 3rd application.
As I said in the beginning, I’ve never programmed in Elixir so my structure could be ridiculous…
Any thoughts on this?
Thanks!
- R