How to handle background jobs with elixir/phoenix

Hello, I am working on an task, that requires to parse a csv sent from a react frontend. When the backend receives it, it should make an API call to a webservice. But now the csv is large and it’s not functional. My idea is to get the csv, answer the frontend and process the csv and API calls in the background, but I have no idea where to start. Can someone point on a tutorial or something on how to start? If there a way to avoid rabbitMQ redid would be better, if possible
Thank you very much

Depending in your specific use case you can start simple with tasks:

https://hexdocs.pm/elixir/Task.html

task = Task.async(fn -> do_some_work() end)
res = do_some_other_work() res + Task.await(task)

1 Like

Looks like you need a background job processing library.

Take a look to rihanna which uses postgres as store for jobs.

Other solutions use redis: exq, verk, toniq.

que is backed by Mnesia.

More libs:

3 Likes

If you do it like this (without a job processing library, which is fine) keep the things in mind that are described in this article from Chris McCord https://dockyard.com/blog/2016/05/02/phoenix-tips-and-tricks under the title “Avoid Task.async if you don’t plan to task.await”. The summary is that if you want to use Task.async to process your CSV file, hook it into it’s own Task.Supervisor so that it is isolated from the controller.

4 Likes

As @santif mentioned using a library better but if you wanna start quickly. You can start like that

Task.start(fn -> 
process_something() 
|> notify_phoenix_channel()
end)
1 Like

There are many ways to do this and You might find a different approach for each answer.

to start with communication.

  • Initiate the command/request client side using channels
  • The channel answer with request received
  • The channel trigger a long pipeline of transformation
  • When the pipeline is done, the channel get the async result (for example with handle_info message) and notify the frontend via websocket with the result.

Now You have a large choice with the pipeline, GenStage, Queue… or just a process.

The only thing async is the API call, all the rest could be transformed by simple functions.

For this call, I would use a simple Task, either supervised, or not.

Anyway You can leverage with any libraries mentionned, depending on the situation :slight_smile:

2 Likes

Here is another thread that might help @beto:

Best regards,

4 Likes

When I first came to Elixir, this was one of the biggest questions I had.

I think it would be really beneficial if there were a few blog posts and open source projects that went over a few production-ready solutions to handle cases like this. I think it would really help people using Elixir who are coming over from Rails or other stacks with the “web app + worker + redis” mentality.

Some of the use cases could be:

  • You don’t care about the response
  • You do care about the response
  • You need it to persist across beam reloads
  • Handle rate limiting (X per minute style)
  • Handle retries (constant or exponential back-offs) and hooks for custom things to happen on error
7 Likes

Hello ! I appreciate your comments a lot, and I come to this solution, still not working, but I would love to see if I am on the right track:

def call_external_api(csv_file) do
  csv_file
  |> parse_csv()
  |> filter_list()
  |> map_list()
  |> prepare_tasks()
end

def prepare_tasks(list) do
  Enum.each(list, fn(element) -> create_task(element))
end

def create_task(record) do
  Task.start_link(ExternalAPI.notify(record))
end

For starters, you have to pass a pointer to a function, not to invoke the function immediately and pass its result:

def create_task(record) do
  Task.start_link(&(ExternalAPI.notify(record)))
end

EDIT: As pointed out by @axelson, this is what would compile (also see the comment below mine):

def create_task(record) do
  Task.start_link(fn -> ExternalAPI.notify(record) end)
end
1 Like

Actually that won’t compile because you can’t use the shorthand anonymous function syntax (using &) to define an anonymous function with 0 arguments. So you need to define the anonymous function with the explicit anonymous function syntax:

def create_task(record) do
  Task.start_link(fn -> ExternalAPI.notify(record) end)
end
1 Like

I think the nicest way to probably do this, is to notify the user once it’s done using a WebSocket connection.
Starting the job then in a background task using the functions in the Task module; I think what @rjk proposes sounds the most sensible, with as final command, sending stuff to the websocket channel (as well as possibly persisting a ‘success’ result, such that when the user would have closed the browser tab in the meantime and comes back to the application, that they are able to see that something had happened in the meantime).

I actually think that many of the background-task libraries that currently exist on hexpm were written by people who:

  • either attempted to re-create something they knew from another programming context while they were still new to Elixir.
  • or wanting to interface with an external system (like RabbitMQ, Redis or Celery) that is part of their pre-existing (Ruby/Python/NodeJS etc) application.
1 Like