Using a compile time mutable registry?

Let me start by saying that while I have done some basic coding in Elixir and messed with Ecto and Phoenix I am coming from a mostly Ruby/class-based OOP background, and I can tell I have a lot brain-training to do. The main issue is dealing with immutable structures. It’s been smooth sailing thus far actually but… in a project I’m messing around with I’ve hit an issue.

Let’s say I have a struct/module named Thing. A Thing can be related to other things via some kind of value—let’s say I just have a key named relationships. The problem I see is when you have a struct of thing1 and thing2 that relate back and forth. If this was OOP and I was building a class or even a simpler structure like a hash, thing1 would get created first then you’d create thing2 and probably have it point back to thing1. From there you’d mutate thing1's relationships to point to thing2. However, since I cannot mutate thing1, changing the relationships results in a new struct which points to thing2 which points to the old thing1.

At runtime it seems like you can accomplish this with a GenServer. All the Thing things understand that they need to talk to a GenServer and related a name. In this case, thing1 would have a reference to a name “thing2” that, at some point in the future, would resolve to the actual thing2 once defined.

What I really want is a global map available at runtime but generated dynamically at compile time. In terms of OOP, these structs are just “instances” of Thing. What would be better would be to have a module named ThingsRegistry with a function called get that took a string or atom and returned a named struct but, unlike a GenServer is not a process but rather the map that get is reading from has been set at compile time and perhaps register to replace the name map.

Can a macro take an existing function, get the return data, and then overwrite it?

I understand that I might be thinking about this all wrong and if that’s the issue, please feel free to let me know.

I am not sure I understand the requirements properly, but this is the raw implementation of the module holding map built in the compile time

defmodule AllInOne.Thing do
  defstruct params: %{}
end

defmodule AllInOne.Registry do
  @things %{
    foo: %AllInOne.Thing{params: %{id: 1}},
    bar: %AllInOne.Thing{params: %{id: 2}}
  }

  @spec things :: map()
  def things, do: @things

  @spec thing(atom()) :: %AllInOne.Thing{} | nil
  def thing(name), do: Map.get(@things, name)
end

AllInOne.Registry.thing(:foo)
# ⇒ %AllInOne.Thing{params: %{id: 1}}

Here I built the map statically, but one might easily call another Elixir code there to build it dynamically. The only requirement, the called code should have been already compiled (it might be in the same project, but it cannot be in the same module, because modules get compiled “all-at-once”.)

Can a macro take an existing function, get the return data, and then overwrite it?

Sure it can. Here is an example from my Telemetría library, that grabs a function and wraps the call. You might want to explore this library source because it also modifies a code a lot during compile time.

As always, it depends what you’re trying to do…

If you are looking at maintaining relationships between Things, probably the easiest way to look at it is to separate the entities from their relationships (i.e. model it as a digraph with nodes and edges - you could take a look at https://github.com/bitwalker/libgraph for inspiration).

Other than that, looking up other entities by some kind of key is a pretty common approach. How that is implemented depends on what you are trying to achieve (in detail). For example, https://github.com/lau/tzdata provides a timezone database for elixir - it builds the data and relationships in ets on startup (and periodically updates on a schedule when timezone databases update) and provides data back to consumers via lookups on the ets tables. Incidentally, it is building links between entities as part of the database build process to allow alias names for timezones.

1 Like

What you’re describing sounds a lot like ETS except for the compile-time stuff.

The stdlib uses ETS to store graphs (see :digraph and :digraph_utils) because it’s a mutable key-value store that can be shared with other processes.

For a simpler example, here’s an Advent of Code solution from the 2018 problems that uses an ETS table as a doubly-linked circular list:

Each element in the list has an identifying “value” and the values of its left and right neighbors; the list is traversed by repeatedly looking up entries by the first value and then recursively following the links.

For that problem, the structure was really convenient: each update was a fixed, small number of steps way from the previous update and inserted or removed a single element. That meant changing three entries in ETS, versus rewriting every following entry in a map or list.

1 Like

I do not know why my brain said “no” to ETS. It’s local storage and doesn’t use a network either. For whatever reason I think I had tossed GenServers and ETS in the same mental bucket which, obviously, is wrong. I think I am actually going to start there because it appears to be the simplest.

With that said, I’m definitely going to look deeper at the other examples even if for my own edification. I was a user (and for many years a downright abuser) of metaprogramming in Ruby. I find the Elixir approach to be really fascinating and will likely read Chris McCord’s book on the subject but I’m actively avoiding it right now as a matter of fighting some bad habits.

Thanks everyone.

Nothing is stopping you from starting and using GenServers at compile-time.

However, there is probably a better way to do what you want to accomplish here.
Structs that relate back-and-forth are not usually things we like to do in an immutable/functional context.

In the few situations where we do want to refer back-and-forth between structs, a common technique is to keep track of things in a couple of arrays, and then you can store the indices of the elements in each-other’s structs, and use these as “poor man’s pointers”. But again, this is for highly specialized situations; it almost always is better to (re)structure your code so that back-and-forth references are not necessary.

It’s not terribly wrong - they’re both things you interact with by using either a name or an opaque identifier (pids for processes, tid for ETS tables).

You’ll frequently see them used together, where a GenServer will be the “owner” of a long-lived ETS table.

It took a long time to break the “ETS is a database, databases are heavy” thought-pattern for me; it’s a very different set of tradeoffs than anything I’d used before.

@Qqwy

In this case, there is no way to restructure what I want. When you have a parent and a child in a relationship or spouses or whatever, you these things to be “aware” of each other, especially if they can be accessed in isolation. If something looks at the parent, it might need to get to the children and vice versa.

I don’t think the structs themselves need a “hard” reference to the related structs (terms?) so much as a place to look them up. So if a struct is named parent and another is named child I can know ahead of time to tell the parent "you’re going to be related to a thing named child" and tell the child “you’re going to be related to a thing named parent.” After they are both created I just need a map that says “if you’re looking for a struct named parent, here’s the actual term.”

You can obviously hand code the map in a function in your own code.

Here’s a super contrived example, but it’s what I would want the resulting code to look like:

defmodule Thing do
  @enforce_keys [:id, :relationships]
  defstruct @enforce_keys
end

defmodule Thing.Registry
  def all() do
    %{
      "child" => %Thing{id: "child", relationships: ["parent"]},
      "parent" => %Thing{id: "parent", relationships: ["child"]}
    }
  end
  
  def get(key), do: all() |> Map.get(key)
end

Functions elsewhere would know how to stitch everything together if they use relationships.

While just statically creating this stuff works, I don’t think it’s ideal at all. For structs that are significantly more complex it’s really ugly. I’d like to be able to define all my Thing terms in a single module with a wee bit of magic from macros and once all is said and done, have that all() function automatically generated. Using a GenServer or an ETS or whatever temporarily would be fine too. Use a macro to create a Thing, it registers itself in the ETS or in a GenServer and then the data in there can be converted to a literal that goes into all().

So something like this:

defmodule OtherThingRegistry do
  use Thing.RegistryMaker
  
  create(%Thing{id: "child", relationships: ["parent"]})
  create(id: "parent", relationships: ["child"])
end

Would effectively result in this:

defmodule OtherThingRegistry
  def all() do
    %{
      "child" => %Thing{id: "child", relationships: ["parent"]},
      "parent" => %Thing{id: "parent", relationships: ["child"]}
    }
  end
  
  def get(key), do: all() |> Map.get(key)
end

Right after saying, “Okay, ETS will work” (and it would actually) it still seems like the wrong tool here. I think this article actually showed me what I needed. It also linked FastGlobal which, while mucking about, could certainly solve the problem in some way.

@al2o3cr I just meant that they’re not the same thing.

Because these are static literals by the time compilation happens I don’t think the overhead of anything else makes a ton of sense. I probably should have started the topic with the examples above instead of attempting to be more abstract about it. (I feel this way every time I post any programming question, lol.)

While I originally stated I was trying to avoid metaprogramming, the reality is, the way to achieve what I want without it is… a lot of hand coding which is exactly what metaprogramming is there to solve, haha.

(emphasis mine)

I think we might have a case of the xy-problem here: You are looking for help with your supposed solution, but do not give us the full problem which makes it difficult to reason about the solution space. We’re slowly getting closer, as we now know that what you actually want to do has very little to do with ‘compile-time registries’.

You think you want to use bidirectional relationships between e.g. parents and children. However, what would you use these relationships for?
In object- and class-oriented languages where we deal a lot with inheritance, (bi-directional) relationships see a lot of use, especially since pass by (mutable) reference is very common, which is what gave rise to Joe Armstrong’s famous quote You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

In an immutable functional language such as Elixir, we’d model the problem usually in a very different way where these mutable references are not needed. In these situations we often pick only one direction to go in (such as only parent → child) and that is how we fill in our structs. This is what e.g. Ecto’s relationships use.

But say we are indeed trying to model the ancestry of a family of humans, which is one of the few cases where indeed relationships in both directions might be important. For instance, we want to figure out who all the nieces and nephews of a particular family member are.
In that case, we use a dedicated graph datastructure. (which can be implemented under the hood in a number of ways on an immutable system, which I won’t go in here as it’s not important when using them). One example might be using libgraph but there are other libraries as well.

1 Like

I’m not going for an ancestry of humans per se, it’s just a very convenient and easy to recognize model. It could be artists and albums. It could be any number of things that are, in some ways, bi-directional. In terms of an OOP example, Thing is more like a class than an instance. So if I create an “artist” thing and an “album” thing, the things understand that conceptually they’re related in some way.

An Ecto schema does define bi-directional relationships (has many and belongs to) because database relationships are bi-directional. For example:

defmodule Friends.Movie do
  use Ecto.Schema

  schema "movies" do
    field :title, :string
    field :tagline, :string
    has_many :characters, Friends.Character
  end
end

defmodule Friends.Character do
  use Ecto.Schema

  schema "characters" do
    field :name, :string
    belongs_to :movie, Friends.Movie
  end
end

Unless I grossly misunderstand something, what’s happening here is Friends.Movie knows :characters is somehow associated with Friends.Character and Friends.Character knows that :movie is associated with Friends.Movie. In neither case does it seem to know or care about anything else at compile time. This is most definitely the functionality I want in what I am working on. I want to define a thing a compile time that knows how to get to some other thing defined at compile time.

The difference between what I want and what I believe Ecto is doing is:

  1. Ecto doesn’t create a registry because it’s using modules themselves as the registry (at least, that’s what I believe here). We could add something like Friends.Actor or Friends.Genre and associate away. We can tell each schema, “here’s where to find the related thing.” I can even turn a string into an existing atom to handle this dynamically if I want. I want that sort of functionality here, but I’m curious if there is a way to do it without creating a new module per “thing.” I’d rather just have the map as described above.

  2. Not only is Ecto using modules as the registry (again, so far as I can tell), but it’s creating a brand new kind of struct which requires its own module space. For what I want to do, the Thing describes what a thing is and how it relates to other things. I could certainly create a few macros that let me define a new thing per module that would achieve what I am going for here but that seems like the wrong tool to me.

Am I completely wrong about what Ecto is doing here?

To be clear, bi-directional relationships are a requirement here. I want them because part of the point of what I am working on is being able to look at a thing and get it to describe itself and its relationships. Since this relationships can be both circular and self-referential, I need a way to make references in a way that does not require mutating the things themselves. There’s no inheritance here though. Things are purely related.

I can achieve that via ETS, GenServers, or even using modules, or any other way that, “Hey, at runtime, this reference will result in the thing I need.” I would rather, mainly because it just seems more appropriate, create some kind registry as in my example. Hand coding it will absolutely give me what I want, but the is there a good way to package this up in a prettier DSL? (The answer is actually “yes” to that.)

I see the flow being:

A module has some kind of empty map at the start, every time a new thing is created via a macro, that map is updated (or more specifically, replaced with a new map) with the thing and some kind of key (like the id) that other things can reference, and then something like __before_compile__ can be used to create the all() function which just returns whatever the final version of the map was. I’m just having trouble understanding where to keep the maps as I am adding things.

I’m looking for a prettier, easier to use way to do something I can already do by hand coding, which seems like the whole point of metaprogramming. I don’t even see that this as being anti-functional since all() produces the same state every time and for the purposes of what I’m building, is all the state I need at runtime.

So, the issue gets a little clearer, how do I keep running state in my macros that can ultimately be expressed in a function that’s dynamically generated by something like __before_compile__? (Or is there a better way to produce this.)

After sleeping on the issue I realized a couple things. Making the registry without metaprogramming was actually not complicated. The fact that it could be hand coded, even if awkwardly, made the clear enough. I just created a new Registry struct the keeps the map of the things as they are added and has a couple functions for associating things and it’s really, really simple—which is what I was going for.

I can carry on with no metaprogramming for now.

The only thing I needed to change was where some of the functions that focus on Thing, they need the whole registry in some cases—basically anything that needs to transverse relationships, which is a pretty small subset.

I think one of the things that @Qqwy hit on, even if not directly, with Joe Armstrong’s quote about the gorilla and his jungle wasn’t so much whether I want the banana but, in this case, I actually want the gorilla and his banana stash! If I want the gorilla (or the state he represents in what is becoming an increasingly absurd analogy) I need my functions to accept that state as opposed to looking for it implicitly—which is, obviously, one of the biggest gremlins in class-based OOP. (The gorilla cares about the bananas. The bananas do not care about the gorilla.)

Whether I ultimately store it in an ETS, a GenServer, or have some default hanging around in Mix.Config or FastGlobal is an implementation detail. In my tests, I can just create a Registry struct and make sure everything works.

Another added benefit of this approach is I can add other configuration/customization details to the registry that I was beginning to put into Mix.Config.

Also, with my goofy focus yesterday, I kept meeting Ecto. As it turns out there’s source code I can look at, lol. I start by saying “no metaprogramming” and then evolve to looking for a solution in that regard. Old habits die hard.

Thanks for your input everyone.