Retrying operation, but how

Hi there,

I have a (long running/about 4 seconds) operation that is invoked by an phoenix controller. I need to support retry, so if the actions fails after some seconds, I have to retry it. Right now I use a bad style and throw an error, when something goes wrong and at the bottom catch the error, and recursively retry (for a couple of times).

That works, but it feels wrong. I think I would prefer to put the operation inside a new process, and if process fails, I start one again. I could of something like the Task module, but with an retry option. Is there something like that?

Cheers
Marcus

3 Likes

I know you can can start Tasks through Task.Supervisor with restart: :transient, which will effectively retry whatever the Task was doing, but not sure how you’d limit the number or retries.
I got the idea from http://blog.danielberkompas.com/2016/04/05/background-jobs-in-phoenix.html

1 Like

With the :max_restarts+:max_seconds options: http://elixir-lang.org/docs/stable/elixir/Supervisor.Spec.html%23supervise/2

2 Likes

Thannks for the pointers. I will give it a try.

The Task.Supervisor is startet in the supervision tree like any other Supervisor, right?

Cheers

1 Like

I tried to get into a retry but failed. I used this little example:

{:ok, sup} = Task.Supervisor.start_link(restart: :transient, max_restarts: 3)
fun = fn ->
  IO.puts "enter fun"
  :timer.sleep(500)
  raise "Bang"
end
Task.Supervisor.async_nolink(sup, fun)
|> Task.await

I would expect to see three times “enter fun” and then it should fail. But I get one enter fun and it failes. What am I doing wrong?

Cheers

1 Like

You have to use Task.Supervisor.start_child/2 instead of Task.Supervisor.async_nolink/2.

Swapping async_nolink out for start_child yielded 3 bangs in my console :slight_smile:

1 Like

I tried that and I got 3 bangs as well. But then I cannot await the result. I mean I dont get the %Task as result, that I can use to await.

2 Likes

It is unclear how the action might fail, but if it’s something you expect might happen (for example a request to an external service), then I’d say rescuing the expected exception and retrying, perhaps with some delay, would be the way to go.

Supervisors are more appropriate to recover from unexpected bugs, and I’d say they make more sense for server processes (GenServer and friends). Such processes are more like internal services which respond to various request. Due to some bug, they might occasionally fail, but after restarting they will probably work again.

In contrast, what you describe is more of a one-off job. It takes some input, does some processing, produces the output and stops. Hence, if there’s a bug, restarting won’t really help you because you’ll start with the same input which will lead you to the same failure.

However, as I said, there might exist some expected failures, such as database or some other external service not responding because of a brief network outage or overload of the other service. By rescuing the expected error, you can explicitly retry and even implement growing retry delays.

It’s also unclear whether the phoenix controller needs to wait for the result of the job. If yes, then I’d just run the job in the same process. Otherwise, I’d start a Task under some supervisor and immediately return the response (e.g. status: :queued) from the controller action.

5 Likes

Sasa,
thx for taking the time for the detailed analysis. The action is using other services that fail on a regular basis. Retrying actually does help since the failing requests does not fail caused be invalid input but just overload or whatever.

For my main error group, that happen about 25% of my requests I do a manual retry. For the other reasons that happen seldom I let the process die and start a new on. Being a total elixir newbieI tried do follow what I read to not handle exception but let processes die.

Right now the service that is provided it synchronous but I will go to async later. Maybe then it is better to go with a supervision tree.

I ended up writing a RetryTask module that does the job without using a supervisor but using more or less the interface of Task. I am not happy with writing such infrastructure code. I my opinion Task and Task.Supervisor should support the retry in combination with await. But it works and I hope to use standard components later.

1 Like

I’m using GenRetry. It works perfect. https://github.com/appcues/gen_retry

4 Likes

This issue was raised on the Elixir github project.

It is not possible to await on a restarted task because the monitor reference can not be reused or new pid detected by the caller.

1 Like

But can’t we achieve this behavior with Registry?
So basically instead of a pid, you could get a ref under which the task would register itself?

2 Likes