Supervision with infinite restarts

I am building a small web-application that aggregates certain kinds of art from around the internet.

It thus consists of a user interface (web-facing) component, a business domain component that handles searching and finally a set of crawlers that call some APIs/perform remote requests.

These parts exist in different branches of the supervision tree. However, it has happened a couple of times that one of the crawlers or the api it was accessing had a serious problem, causing one of the vrawler GenServers to restart over and over again, thereby bringing down its supervisor, but since the problem was not fixes, the supervisor of that one and finally the whole app went down.

Crawling should of course never interfere with the user experience. How can I further insulate the crawlers? It would be totally acceptable if they quit at some point, as long as they do not bring down the rest of the app with them.
Is this something that is possible using an umbrella structure? Or is it somehow possible to tell the crawler supervisors to just restart the crawlers ad infinitum? Or is there another, even better way?

Things that come to my mind:

  • Increase the Supervisors restart limits
  • Use a fixed worker pool and crawl failed locations repeadetly with an exponantial back-off, this should take presure from the Supervisors as well
1 Like

We crawl in async tasks, makes sense cause we have many URLs but also for the reason of isolation. Do something like this and you should be fine

    urls
    |> Enum.map(&Task.async(fn() -> visit(&1) end))
    |> Enum.map(fn(task) ->
      try do
        Task.await(task, 20_000)
      catch _reason, _info ->        
        Task.shutdown(task, 500)
        :error
      end
    end)
1 Like