Hi.
I’ve started experimenting with adding distributed support to a project of mine:
You can see a demo running on fly.io here:
https://git.limo/redrabbit/git-limo
For a little bit of context:
- I have a NIF wrapper around
libgit2
written in C. - Most functions take and return wrapped C pointers (see erl_nif - Resource objects).
- I have implemented a
GenServer
in order to safely (thread-safety) manipulate a repository from multiple processes simultaneously.
Here’s a brief example of the API:
alias GitRekt.GitAgent
# load repository
{:ok, agent} = GitAgent.start_link("tmp/my-repo.git")
# fetch master branch
{:ok, branch} = GitAgent.branch(agent, "master")
# fetch commit pointed by master
{:ok, commit} = GitAgent.peel(agent, branch)
# fetch commit author & message
{:ok, author} = GitAgent.commit_author(agent, commit)
{:ok, message} = GitAgent.commit_message(agent, commit)
IO.puts "Last commit by #{author.name} <#{author.email}>:"
IO.puts message
In this example,
agent
is a PID,branch
is a struct in the form of%GitRef{oid: <<...>>, name: "master", type: :branch}
,commit
is a struct in the form of%GitCommit{oid: <<...>>, __ref__: c_ptr}
,author
is a map in the form of%{name: "...", email: "..."}
,message
is a binary.
I’m not very familiar with the internals of the BEAM and I’m not sure what really happens when passing NIF resources around processes. In the example above, commit
contains the referenced pointer to an actual C pointer (:__ref__
field).
I assume that because the current process is only holding the pointer, thread-safety is not compromised (manipulating the C pointer is always done in the dedicated GitAgent
process).
So here are my first questions:
- Is a NIF resource object a simple safe-pointer that can be passed around processes?
- Does any copy operations happens?
- Is it garbage collected like everything else?
Now GitAgent
provides a subset of libgit2
functions. In order to implement more complex things on top, it provides GitAgent.transaction/2
:
def resolve_commit_all(agent, commit) do
GitAgent.transaction(agent, fn repo ->
with {:ok, author} <- GitAgent.commit_author(repo, commit),
{:ok, committer} <- GitAgent.commit_committer(repo, commit),
{:ok, message} <- GitAgent.commit_message(repo, commit),
{:ok, parents} <- GitAgent.commit_parents(repo, commit),
{:ok, timestamp} <- GitAgent.commit_timestamp(repo, commit),
{:ok, gpg_sig} <- GitAgent.commit_gpg_signature(repo, commit) do
{:ok, %{
oid: commit.oid,
author: author,
committer: committer,
message: message,
parents: Enum.to_list(parents),
timestamp: timestamp,
gpg_sig: gpg_sig
}}
end
end)
end
- The anonymous functions is executed by the
agent
process, repo
is a NIF resource object handle (wrapsgit_repository
),commit
is not explicitly passed anywhere to it but is captured from the scope,
Again, I assume that things just work™ because my NIF resources (repo
and commit
) are not violating anything because they are always manipulated from the same process.
So in my adventure of running my project in a distributed environment. I have a few things I’d like to understand.
-
Is it OK to share NIF resources across different nodes with my current approach? Basically, the NIF safe-pointers are used like simple
reference
s and are only manipulated from a unique dedicated process. -
How does the garbage collector handle NIF resources that are shared between multiple processes? What about running process on a different node?
-
How are captured variables handled when running anonymous functions on different nodes? What about NIF resources?
Also I would be very interested in any good papers, docs on the internals of the BEAM for this kind of things. I’ve came across a few interesting topics in the forum
Any tips on how to debug and profile code (memory leaks, garbage collector, etc.) when working with NIFs are also very welcome.