I am looking for a good article that discusses the change in mindset when approaching error handling for async background jobs, for people who have only worked on sync web requests.
You mention both background jobs and web-requests in the subject. What sort of a problem you want figure out how to test idiomatically?
Web requests are tested in isolation from each other. If you do background jobs using, say, Oban - it provides with helpers and nice guides documented on testing Introduction to Testing — Oban v2.14.0.
Also keep in mind, each process by itself is “single threaded”. So in isolation it’s straight forward how to test them. When the unit that’s being tested involves other processes - then 2 things come to my mind:
Taskswhenever possible because it automatically provides with Ancestor and Caller Tracking. Some popular libraries make use of it to help with testing (e.g. Ecto and Mox)
- Pass messages between processes to synchronize them for tests. That might require some thoughts put to architecting GenServers upfront. @TylerAYoung gave a talk on that ElixirConf 2021 - Tyler Young - Architecting GenServers for Testability - YouTube
My concern currently is not about testing background jobs, although I enjoy that topic.
I am finding that when I see developers implement their first background job after working on web requests exclusively for two years, that the developers don’t seem to focus on the retry behavior or consider how the job will behave when it encounters failures. I am trying to find articles to highlight the failures to keep in mind and expect, as the developers implement their first background job. In a web request setting, ignoring failures and just assuming the backend will forward an error to front end and the user will retry when they observe the error has been sufficient for their work, but I don’t think the same approach to background jobs will work out okay.