Venomous - Erlport wrapper for managing concurrent python processes with ease

RustySnek · June 11, 2024, 4:19pm

Venomous aims to simplify the concurrent use of erlport Python Ports, focusing on dynamic extensibility such as spawning, reusing, and terminating processes on demand. It also handles unused processes, by killing them once they pass their configured inactive TTL. Venomous core functions ensure that whenever :EXIT signal appears, the Python process dies without further execution by killing its OS process (brutally).

This is my first attempt at creating an Elixir library. The idea stemmed from the challenge of properly exiting Python processes. Even after closing the Python port, execution would persist until the end of a function or iteration. My goal is to handle these exits effectively while also enabling process reuse, thus avoiding the constant spawning and stopping of new ones.

Any feedback would be greatly appreciated

https://hexdocs.pm/venomous/Venomous.html

RustySnek · June 13, 2024, 9:02pm

I’ve released 0.3.0 version of Venomous

fixed many major issues with how python processes were handled which caused zombie processes and weird behaviour.
multiple tasks ran on a single SnakeWorker will now stack instead of timing out
add option keywords

Changelog: Release v0.3.0 · RustySnek/Venomous · GitHub

kip · June 13, 2024, 9:24pm

Congrats on your first Elixir library. I like the delicious irony too that pythons aren’t venomous

RustySnek · June 13, 2024, 9:40pm

They must have drank some kind of toxic MIXture

jackalcooper · June 14, 2024, 3:49am

thanks so much! I am using NimblePool to wrap python process running models. Your implementation to manage python process is way more sophisticated than mine so I guess I’m going to use it

xsteve · June 14, 2024, 6:35pm

Thanks for providing that library.
However, I am unsure how to use it.
The documentation shows how to call a single python function.
Would you please explain how the python interpreter is started. How do I import some python modules and declare the function I want to call?

RustySnek · June 14, 2024, 6:49pm

Hey so basically as of now, all of the python modules (files) are loaded from PYTHONPATH env variable. So if you put your python modules inside python/ directory you would have to add that directory to PYTHONPATH envvar.

There is also a config for Venomous python processes to load the type encoder/decoder of erlport. I made a quick guide to this here: Quick guide on erlport Python API — Venomous v0.3.0

As for calling mutiple python instances you have to do it yourself for example.

 args = SnakeArgs.from_params(:time, :sleep, [0.1])
    1..100
    |> Enum.map(fn _ ->
      Task.async(fn -> python!(args) end)
    end)
    |> Task.await_many(:infinity)

Here Venomous will spawn as many processes as its allowed to via max_children configuration of SnakeSupervisor up to a 100. If it can’t spawn anymore it will just wait and reuse the already spawned ones once they are done with their tasks.

Feel free to ask if you need any more help.

xsteve · June 14, 2024, 8:39pm

Thanks for the explanation. I guess I understand it now.
Perhaps it would be nice to put the python code “time.sleep(0.1)” also somewhere into the documentation. And also add the needed usage of PYTHONPATH when there is a need to call an own python module (which will be the case most of the time)

RustySnek · June 14, 2024, 9:30pm

Yeah I’ll probably extend documentation and add few examples in the next release

RustySnek · June 16, 2024, 1:09pm

I’ve released 0.4.0 version of Venomous

Included support for erlport python options. ex. module_paths, python_executable, packet_bytes…
Add named processes, separate from the regular SnakeManager ones
Fixed issue with lib breaking whenever python process was killed on exception…
Quicker exits whenver processes are spammed
Include examples in docs

Changelog: Release v0.4.0 · RustySnek/Venomous · GitHub

RustySnek · August 3, 2024, 3:41pm

I’ve released 0.5.1 version of Venomous which adds optional Hot reloading for python modules.

To enable the hot reloading:

Install python watchdog dependancy using mix venomous.watchdog install

Enable serpent_watcher in your dev config:

config :venomous, :serpent_watcher, enable: true

Add your module paths in snake_manager config:

config :venomous, :snake_manager, %{
  python_opts: [
    module_paths: ["my_python_modules/", ...]
  ]
}

Now all modules inside the configured module_paths should reload on edit.

Kallee · August 22, 2024, 12:20pm

I have built an ETL in Python that I want to call from Elixir. I wanted to ask if you have any suggestions on how to best pass maps/dictionaries between Elixir and Python. Some of the maps/dicts will contain simple structures, but others will contain more complex structures, such as pandas DataFrames.

I have looked into both serialization via JSON and writing custom functions on each side. I assume this will be quite a common use case for Venomous, so I wanted to ask your opinion on this.

Thanks for a great library with good documentation; it’s been a great introduction to the world of Elixir!

RustySnek · August 22, 2024, 3:40pm

Hey, for simple classes that can be easily serialized with .__dict__ you can just handle that recursively for basic data types. venomous.py provides a function that does handles such cases and encodes all strings into ‘utf-8’ so they won’t appear as charlists on elixir’s side.

def encode_basic_type_strings(data: Any):
    """
    encodes str into utf-8 bytes
    handles VenomousTrait classes into structs
    converts non VenomousTrait classes into .__dict__
    """
    if isinstance(data, str):
        return data.encode("utf-8")
    elif isinstance(data, (list, tuple, set)):
        return type(data)(encode_basic_type_strings(item) for item in data)
    elif isinstance(data, dict):
        return {
            encode_basic_type_strings(key): encode_basic_type_strings(value)
            for key, value in data.items()
        }
    elif isinstance(data, VenomousTrait):
        return data.into_erl()

    elif (_dic := getattr(data, "__dict__", None)) != None:
        return encode_basic_type_strings(_dic)
    else:
        return data

If you want to maintain the structs/classes between elixir/python you can experiment with VenomousTrait class all tho I haven’t documented it very well yet.
As for the more complex structures you have to handle them individually, like for example DataFrames provides to_dict() function which returns a clean dict with data. All of the logic of conversion should be put inside the encoder/decoder functions of erlport. So for the DataFrame you could do:

```python
# encoder.py
from typing import Any
from erlport.erlang import set_decoder, set_encoder
from erlport.erlterms import Atom
from pandas import DataFrame
from venomous import decode_basic_types_strings, encode_basic_type_strings


def handle_types():
    set_encoder(encoder)
    set_decoder(decoder)
    return Atom("ok".encode("utf-8"))


def encoder(value: Any):
    if isinstance(value, DataFrame):
        return encode_basic_type_strings(value.to_dict())
    return encode_basic_type_strings(value)


def decoder(value: Any):
    return decode_basic_types_strings(value)

# data_frames.py
import pandas as pd

def data_frames(dict):
    df = pd.DataFrame(dict)
    return df

iex(16)> df = %{
...(16)>   "Age" => %{0 => 25, 1 => 30, 2 => 35, 3 => 40},
...(16)>   "City" => %{0 => "New York", 1 => "London", 2 => "Paris", 3 => "Tokyo"},
...(16)>   "Name" => %{0 => "John", 1 => "Jane", 2 => "Bob", 3 => "Alice"}
...(16)> }
%{
  "Age" => %{0 => 25, 1 => 30, 2 => 35, 3 => 40},
  "City" => %{0 => "New York", 1 => "London", 2 => "Paris", 3 => "Tokyo"},
  "Name" => %{0 => "John", 1 => "Jane", 2 => "Bob", 3 => "Alice"}
}
iex(17)> Venomous.SnakeArgs.from_params(:data_frames, :data_frames, [df]) |> Venomous.python() 
%{
  "Age" => %{0 => 25, 1 => 30, 2 => 35, 3 => 40},
  "City" => %{0 => "New York", 1 => "London", 2 => "Paris", 3 => "Tokyo"},
  "Name" => %{0 => "John", 1 => "Jane", 2 => "Bob", 3 => "Alice"}
}

vtno · December 26, 2024, 7:30pm

Awesome library!

I’m building a PoC on some machine learning API which uses Elixir to manage Python processes for NLP task.

There is a small cold start when the method is invoke the first time with Venoumous.python call. I wonder if there is an easy way to pre-start some worker so there is no cold start time when running the program?

I’m looking at the SnakeWorker/Supervisor but not sure if it’s the correct place.

RustySnek · December 28, 2024, 12:26am

Hey, I’m happy you found the library helpful! ^^

I have added Venomous.preload_snakes/1 in the 0.7.5 release, which basically starts x amount of processes with :ready state. So you can basically start workers at the start of your program with:

:ok = Venomous.preload_snakes(10) # Starts 10 workers
{:retrieve_error, :max_children} = Venomous.preload_snakes(-1) # Starts all available workers

lmk if it helped!

vtno · December 28, 2024, 9:25am

That should work! Another nice thing to add is to shutdown the workers during termination.

I noticed that you already have the terminate hook in the worker

github.com

RustySnek/Venomous/blob/master/lib/snake_worker.ex#L133


      
                      {:SNAKE_ERROR, error_message}
                  end
          
                send(origin, data)
                :done
              end)
          
              {:reply, :ok, pypid}
            end
          
            def terminate(_reason, pypid) do
              GenServer.call(SnakeManager, {:remove_snake, self()})
              :python.stop(pypid)
            end
          
            defp get_os_pid(pypid) do
              {_, _, _, port, _, _} = :sys.get_state(pypid)
              info = port |> Port.info()
              info[:os_pid]
            end
          end

but not sure why when my supervisor exited by Application.stop. There are hanging erlport processes.

I need to add my own list_alive_snake and slay them manually on my terminate hook.

RustySnek · December 30, 2024, 2:12am

Hey, I don’t encounter such problem when I do Application.stop(:venomous). However you mentioned that you exit a different supervisor so perhaps you would have to link them so they terminate alongside each other? Calling stop on :venomous is also a way.

vtno · December 30, 2024, 8:55am

Actually it took a while for the process to be removed. After I waited a bit, ps -aux | grep erlport does not show running worker anymore so all good!

I’ve been using the preload as well and it works perfectly. A little curious about the reason why you make the return value when using -1 as {:retrieve_error, :max_children} instead of :ok when it’s successful?

RustySnek · December 30, 2024, 12:18pm

It wasn’t really well thought out as if you just supply the function with -1 it will keep on spawning workers until it encounters the error which in this case will be the :max_children. It’s kind of a way of signaling that you have reached the limit. I might change it later on to make a little bit more sense as it’s not really an error if everything did work as intended.