Pods - Pods support for JVM and babashka

clsource · April 24, 2024, 3:19am

This is a simple proof of concept. The idea is using Babashka Pods (GitHub - babashka/pods: Pods support for JVM and babashka).

Babashka Pods enables using external services that can be writen in any programming language. It’s not required that the program has a CLI, since a script is created that interacts with the internal SDK. Babashka Pods are standalone programs that can expose namespaces with vars to Elixir.

With this simple proof of concept I can say:

Elixir can benefit from the ecosystems of other technologies, even if they don’t have a CLI, just an SDK is enough to create a Babashka Pod.
This is better than calling using System.cmd or Erlang Ports, since the process is started once and then listens to commands, saving resources.

cevado · April 24, 2024, 11:48pm

are you planning on moving forward with this beyond the proof of concept?
i’d love to see that being a full featured library, like rustler and other tools.

clsource · April 25, 2024, 5:20am

Thanks. Yeah maybe a good idea, but some pods require using transit+json format instead of just json. I found a library for that in erlang (github isaiah/transit-erlang). I have to learn a little more to implement a valid babashka pod client.

cevado · April 25, 2024, 12:39pm

there are a few libraries implementing edn on elixir/erlang too. maybe that can help with a good support to transit.

do you have a list of functionalities that need to be supported to be a full feature pods client? maybe that can help achieving the goal of full support one day.
i’d like to help but i’m short on time

D4no0 · April 25, 2024, 3:02pm

Doesn’t that mean that you have to create pods code from both sides, if yes how is that different from creating a CLI, or if not how does that work?

Erlang ports are designed for this kind of thing, maybe you wanted to say NIFs?

cevado · April 25, 2024, 3:46pm

you just need pod implementation to be done. from the pod client side you could do everything on the fly if you want.

it differs from the CLI approach just as any other RPC approach. for doing CLIs usually you choose to scope to your usage and communication ad-hoc. while with a RPC you’re exposing access to code and doesn’t need to focus your scope that much.

at least in my experience, CLIs solutions are very narrow to avoid complexity and when the need to have more functionality exposed you end up needing to choose between a very complex CLI or adding more narrow CLIs that can’t even be composed in a simple way.

D4no0 · April 25, 2024, 4:01pm

This is true, but another question is: is it different than already existing solutions like grpc?

From what I understand you also have to define and keep a contract between parties?

cevado · April 25, 2024, 4:26pm

there are at least 3 main differences:

protobuf is way more strict than json or edn.
grpc is somewhat complex to implement in the server side, pods are very simple.
grpc require network interfaces, while pods are expected to run locally on stdin/stdout of the pod.

the difference there lies on the strictness of the contract, while with grpc you need to map a lot of stuff with protobuf. in a pod you receive the everything through stdin/stdout and exposes the entire namespace of the thing without need to keep updating “contract layer”.

clsource · April 25, 2024, 4:27pm

Erlang ports is not sufficient for this kind of use case. I will cite muontrap author GitHub - fhunleth/muontrap: Keep your ports contained

The Erlang VM’s port interface lets Elixir applications run external programs. This is important since it’s not practical to rewrite everything in Elixir. Plus, if the program is long running like a daemon or a server, you use Elixir to supervise it and restart it on crashes. The catch is that the Erlang VM expects port processes to be well-behaved. As you’d expect, many useful programs don’t quite meet the Erlang VM’s expectations.

Currently this POC is using GitHub - saleyn/erlexec: Execute and control OS processes from Erlang/OTP for properly handling the external processes.

The pod protocol must be defined by each pod (Pod Service) and can be implemented in any language that can be encoded in bencode and json. But the elixir client is just a simple wrapper for communication.

Example:

defmodule Pods.LispyClouds.SQLite do
  # the directory of the pod
  @namespace "pod-lispyclouds-sqlite"

  # the script that will be run, must have execution permissions (655)
  @script "#{@namespace}.py"

  # the prefix for the commands that the script expects
  # example command: pod.lispyclouds.sqlite/execute!
  @prefix "pod.lispyclouds.sqlite"

  require Logger

  def start(callback \\ nil, opts \\ []) do
    Logger.info("Starting #{__MODULE__} Pod")

    Pods.load(
      __MODULE__,
      @namespace,
      @prefix,
      @script,
      callback ||
        fn response ->
          response
          |> IO.inspect()
        end,
      opts
    )
  end

  def describe(pods) do
    Logger.debug("describe")
    Pods.call(pods, @namespace, "describe")
    pods
  end

  def invoke(pods, command, args \\ []) do
    Logger.debug(command)
    Pods.call(pods, @namespace, "invoke", command, args)
    pods
  end

  def execute!(pods, args \\ []) do
    invoke(pods, "execute!", args)
  end
end

mpope · April 25, 2024, 4:41pm

Erlang Ports, since the process is started once and then listens to commands, saving resources.

Erlang ports can be long running and listen for commands or data over stdin or TCP sockets even.

That being said this interface is quite nice, but one problem I’ve run into with erlexec is that heavy I/O becomes a bottleneck fast due to the exec-port process marshaling all communication. Does Pods specify a Pod-management mechanism that could avoid this?

clsource · April 25, 2024, 4:49pm

Erlang ports provides a wonderful tool for CLI processes, but fails due to small quirks in non Erlang tools. For example ping as shown in muontrap docs.

Elixir did indeed terminate both the process and the port, but that didn’t stop ping. The reason for this is that ping doesn’t pay attention to stdin and doesn’t notice the Erlang VM closing it to signal that it should exit.

Imagine now that the process was supervised and it restarts. If this happens a regularly, you could be running dozens of ping commands.

This was just a weekend POC, maybe some details can be ironed out with more time. But at least the Pods communication is encoded with bencode, used by bitorrent for message passing between the process and Elixir, so its a little more lightweight than raw stdio.

D4no0 · April 25, 2024, 4:54pm

OK, let’s say we replace it with a generic HTTP API, these are just implementation details.

Yeah, this is more a marketing argument, there are more things involved around this topic like tooling around it and in that case grpc wins by much.

Not really, the general specification of the protocol is not tied to http, but even if they do they run on unix sockets in a local linux system, so this is once again an implementation detail.

This is a fact of life, there is no library in any language that will fix potential problems with zombie processes and other misbehaving stuff, nobody said you have to just use ports without any kind of supervision over the processes you create.

I have mixed feelings about this technology. While it is good that there is an attempt to make communication between different languages easier, the fact that the elixir version is also managing the process magically makes me think twice about using it, I would personally prefer that part to be separated.

I would also opt to choose it if it offered tools for visibility and debugging, as that is in a lot of cases why a lot of companies opt to use a external system for this kind of communication.

clsource · April 25, 2024, 5:03pm

These two libraries can do a good job on managing zombie processes.

It can be separated, maybe the library can just be the pod client and is up to the developer choosing the pod process manager mechanism.

I would need an example for this kind of “feature”, so I can grok better the request. The debugging for each pod depends on each programming language chosen to implement the protocol.

D4no0 · April 25, 2024, 5:13pm

Yes that would be much better.

Let’s say I want to see the throughput and for example debug issue mentioned by @mpope. The simplest example is using HTTP API for RPC calls, you can track the number of requests, payload size, response time etc.

While this could be achieved at elixir level rather easily by having some taps and sending telemetry events, it would be great for this functionality to be a part of the overall spec of the protocol. This most probably is not possible because each language implements it’s own version of client/server but it is a mandatory feature for systems that leverage on this kind of functionality a lot.

clsource · April 25, 2024, 5:21pm

The communication is stdin and stdout. In Unix systems you can access by using cat /proc/<pid>/fd/1 (1 stdout, 2 stderr)

Maybe some “hooks” can be added to trigger some functions in certain lifecycle events.

D4no0 · April 25, 2024, 5:24pm

A lot of things could be done, but this technology promises a turnkey solution (correct me if I’m wrong), so if the only thing it offers is a wrapper over 2 protocols, then I might as well go with a more widely used solution as there will be more support.

At least this is my general thinking about why me and other users would consider using it.

cevado · April 25, 2024, 6:25pm

nice… good choice. @clsource just showed a proof of concept of something that he likes I was interested in it too.
in this topic no one is advocating for using it over grpc or any other solution.

for me particularly it’s way better to use something like this to wrap a java sdk to deal with brazilian government and bank stuff instead of writing a fully featured microservice just to expose the sdk.

different problems, different approachs, the more option the better, not worse.

clsource · April 25, 2024, 6:27pm

At first this was just an experiment if something like Babashka Pods could be implemented in Elixir.
A proper library would need to consider edges cases and other production ready features like the ones you mention.

So my vision for a future library would be:

Pods tooling (pod installation and registry) handled by babashka, since a pod would work in babashka and elixir (and any other client that implements the protocol).
Pod Core: handles input/output to a pod service (encoding and decoding messages, parsing the protocol responses)
Pod Process Manager (optional): start the Pod services and provides IO mechanism to pods.

I don’t know if this would be a “turnkey” solution. The steps required to use pods maybe are reduced to:

Install the pods in pods directory.
Set the initial configuration for Pod Core and Pod Process Manager.
Call the pod functions defined in each pod ex file.

clsource · April 26, 2024, 5:32am

Ok so I could improve the proof of concept. Now have some awesome goodies.

Pods are just mix projects, no need for other tools than mix for installing pods.
Separated encoder, decoder, handler and process manager from the core.

Pods.Core.start(
  # Available Pods List
  [Pod.LispyClouds.SQLite],
  # Pod Manager
  Pods.ProcessManager,
  # Message Encoder
  PodsExampleProject.Encoder,
  # Message Decoder
  PodsExampleProject.Decoder,
  # stdout and stderr handler
  PodsExampleProject.Handler
)
|> Pod.LispyClouds.SQLite.execute!("create table if not exists foo ( int foo )")
|> Pod.LispyClouds.SQLite.execute!("delete from foo")
|> Pod.LispyClouds.SQLite.execute!("insert into foo values (1), (2)")
|> Pod.LispyClouds.SQLite.execute!("select * from foo")

yeey

clsource · April 27, 2024, 1:56am

Ok so today I managed to trigger a installation pipeline and improved a little more the stdout handler.

Running the command mix pod.babashka.sqlite3.install can install the desired artifact inside the pods directory and be runned as a pod (as long as you have babashka in $PATH).

This is a good example when you need more complex artifacts that maybe require other steps.

With this I learned these things:

Babashka Pods are tailored to clojure, so many of them will not be compatible with Elixir pods. Is best to just consider Babashka Pods as an inspiration for the project, more than a resource to use.
Reading and Writing stdio is a little more difficult than expected since the process manager uses Streams, a command that outputs a lot of text would mangle the parsing. For now the solution was to create a temp file and try to decode it with bencode. If is decoded successfully then it calls the handlers. Maybe not the best solution, but at least it works for now.

For the current state of this experiment I think it demostrates how Elixir Pods can be implemented in a project. More than a library it seems that can be as an example solution for when you need something similar and a RPC, NIF or CLI is not desired.