Function executed via Erlport stops responding

kaquadu · October 12, 2020, 7:05pm

Hello!
I am writing my thesis application. I need linear programming, but my app is written in Elixir, which is really not the language for such operations. That is why I decided to use Erlport as the Elixir dependency, which is capable of connecting Python code with Elixir. I’m also using Pulp as the python library for the optimization.

Elixir version: 1.10.4,
Erlport version: 0.10.1,
Python version: 3.8.5,
PuLP version: 2.3

I’ve written such a module for Elixir-Python communication, which leverages the GenServer as the main ‘communication hub’ between Elixir and Python:

defmodule MyApp.PythonHub do
  use GenServer

  def start_link(_) do
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  end

  def init(_opts) do
    path = [:code.priv_dir(:feed), "python"]
          |> Path.join() |> to_charlist()

    {:ok, pid} = :python.start([{ :python_path, path }, { :python, 'python3' }])

    {:ok, pid}
  end

  def handle_call({:call_function, module, function_name, arguments}, _sender, pid) do
    result = :python.call(pid, module, function_name, arguments)
    {:reply, result, pid}
  end

  def call_python_function(file_name, function_name, arguments) do
    GenServer.call(__MODULE__, {:call_function, file_name, function_name, arguments}, 10_000)
  end

end

The GenServer module is calling python file, which contains such a function:

def calculate_meal_4(products_json, diet_json, lower_boundary, upper_boundary, enhance):
  from pulp import LpMinimize, LpProblem, LpStatus, lpSum, LpVariable, value
  import json
  products_dictionary = json.loads(products_json)
  print(products_dictionary)
  diets_dictionary = json.loads(diet_json)
  print(diets_dictionary)

  model = LpProblem(name="diet-minimization", sense=LpMinimize)

  # ... products setup ...

  x = LpVariable("prod_1_100g", lower_boundary, upper_boundary)
  y = LpVariable("prod_2_100g", lower_boundary, upper_boundary)
  z = LpVariable("prod_3_100g", lower_boundary, upper_boundary)
  w = LpVariable("prod_4_100g", lower_boundary, upper_boundary)

  optimization_function = # ... optimization function setup ...

  model += # ... optimization boundary function setup ...

  model += optimization_function

  print(model)

  solved_model = model.solve()

  print(value(model.objective))

  return [value(x), value(y), value(z), value(w)]

The call to the GenServer itself looks like that:

PythonHub.call_python_function(:diets, python_function, [products_json, meal_statistics_json, @min_portion, @max_portion, @macro_enhancement])

where python_function is :calculate_meal_4 and products_json and meal_statistic_json are jsons containing required data.

While calling calculate_meal_4 via python3 diets.py, which launches the python script above with some example, but real (taken from the app), data everything works fine - I’ve got the minimized result in almost no time. The problem occurs while calling the python script via Elixir Erlport. Looking at the printed outputs I can tell that it seems working until

solved_model = model.solve()

is called. Then the script seems to freeze and GenServer finally reaches the timeout on GenServer.call function.

I’ve tested also the call on a simple python test file:

def pass_var(a):
  print(a)
  return [a, a, a]

and it worked fine.

That is why I am really consterned right now and I am looking for any advices. Shamefully I found nothing yet.

kaquadu · October 13, 2020, 6:54am

With some help from Stack Overflow I’ve managed to solve this problem by making .py file executable and calling it via System.cmd - more info: Stack Overflow Thread

aseigo · October 13, 2020, 9:48am

There are a couple of ways to deal with this:

a) call is sync … but can wait forever. Instead of using the default timeout, pass in :inifinity as the timeout param to the GenServer and then just … wait.

b) call is sync … so don’t use it. Use cast instead and make the Elixir side of the code properly async around this: wrap the Python call in a Task, use message passing to get results around. However, this is not really a great option in this case due to the python process being stateful and synchronous.

c) call is sync … but can return a ref instead of the answer right away and then return the actual answer later on. This pushes the async’ness into the GenServer (easy path: wrap the python call in a Task), and is accomplished by returning {:noreply, new_state} from the handle_call implementation and when the python finishes returning the result with GenServer:reply/2. Suffers from the same issue as (b)

d) Don’t use a GenServer in these sorts of cases! Just use a regular module and call its functions as needed. This means having a setup, use, and cleanup set of functions which would need to be used by callers. Not as pretty at the call-sites, but gets rid of the GenServer business. Not an amazing solution if you are doing a lot of calls into your python as that python setup time can be an expensive bit.

You can also consider using a pool of python-instance GenServers which are used to service calls … with a pool of e.g. 10 python instances a simple API to make python calls could be provided which checks out an available server (and either waits until one is available, potentially for quite a while, or returns with a timeout if the pool is depleted for an extended period of time, if that makes sense in your application), runs the command, and returns when the results are available. This has the benefit of giving your application some concurrency for python calls by spreading them out across multiple environments … but assumes that each call is separate and does not rely on state being held in the python process between subsequent calls.

(I’ve used erlport to perform long-running (and stateful, even…) ML workloads via python from distributed Elixir applications before, so this all sounded rather familiar )