Representing the world with processes

lud · July 14, 2022, 2:31pm

You can register multiple keys in a registry for the same process. Each process could register itself as {:city, "Paris"}, {:country, "USA"}, {:continent, "Asia"}. Then you can query the registry if you want to dispatch a message to all processes of a continent, a country, or to all cities that start with "A".

(Sorry this was in reply to @thiagomajesk )

APB9785 · July 14, 2022, 2:31pm

I think the idea is to use a single process with state like:

%{
  "North America" => %{
    data: %{...},
    countries: %{
      "Canada" => %{
        data: %{...},
        regions: %{
          "Ontario" => %{
            data: %{...},
            cities: %{...}

derek-zhou · July 14, 2022, 3:01pm

Agreed. Registry is the best way to lookup processes.

mpope · July 14, 2022, 3:30pm

An interesting exercise could be running each continent on different sized VMs as their own nodes. In theory that should add a ‘travel time’ for messages sent between continents, might fit into the real world a bit better. For example, Greenland could be a single core vCPU In this scenario I think a WorldSupervisor would be best put on a separate node from all the continents, and could launch remote procs for each country on the appropriate continent.

ityonemo · July 14, 2022, 3:47pm

You should use registry, and the metadata value should be a path, so Berlin would be ~w(Europe Germany Brandenburg Berlin) for example

hubertlepicki · July 15, 2022, 7:05am

But then, you could make some tweaks to make it way more realistic. Let me give you some ideas. Let’s say you are building a “Earth monitoring system” based on satelites. You have finite number of satellites, going around the Earth.

The first problem to solve would be controlling these satellites. Program it in a way that you have a finite number of them, going around the planet, with some random drift that you have to correct so they stay on their designated paths. Program each satellite as an outside program that communicates with your BEAM instance by sending and receiving messages that report and correct telemetry.

Then, you can even improve by the satellites sending the photos, that you have to process, analyze and store, and you do have limited resources on your servers to do so, so introduce some bottlenecks like you can only process 4 images at the same time.

The above exercise is hard but also somewhat more realistic

kac · July 15, 2022, 10:24am

Wold is a process https://youtu.be/NvLlpY9vd9E?t=2219

thiagomajesk · July 15, 2022, 1:38pm

It would be a bit hard because that we first organize the data, typically separated by location in a single source. But I see where you are going with this. The problem to me with this approach is that we’d had a lot of duplication of information. For instance, for all cities in a country on a continent, we’d have to repeat the same information about both countries and continents, and keeping this in sync would be hell .

This could be a very cool experiment of actually trying to represent this in a physical way using distribution. Unfortunately, I don’t think I’d have the resources to actually try this, but very cool concept nonetheless .

@hubertlepicki the only thing missing from the original premise is that we don’t have a hierarchy to represent. So let’s increase on top of your previous example: Imagine that we have those same satellites for all planets in our solar system and each cluster of satellites is controlled by a different space agency, which reports directly to the UN.

APB9785 · July 15, 2022, 2:10pm

I don’t understand - where do you think there would be repeated information? If you’re changing all cities in a country on a continent, you would update the value of the country, mapping over each associated city with the changes to be made.

thiagomajesk · July 15, 2022, 2:16pm

I think perhaps I misunderstood your Idea, you’re talking about having a single process to hold all information and not a process per city (for instance), right!? If that’s the case, please disconsider it

hubertlepicki · July 15, 2022, 2:20pm

I don’t know if introducing another level of supervisors would be warranteed even then. Most likely not. It’s still the same runtime requirements, and you can identify the clusters of processes by using Registry or some other way if you have to. This is still my point that you try to represent real life concepts as processes instead representing processes as processes.

APB9785 · July 15, 2022, 2:26pm

You know one process per city goes against the most fundamental principles of OTP?

The Erlangelist - To spawn, or not to spawn? was already linked for you…

mpope · July 15, 2022, 6:11pm

Yes, could be a bit expensive. Using a cluster of local nodes would be pretty trivial to setup though. Less ‘interesting’ forsure, given the lack of network latency between the nodes.

Anyways, a single DynamicSupervisor is probably the most simple way to set this up. Having that supervisor allocate a continent, then that continent can tell the supervisor to start children processes and link them to the ‘parent’ continent. Can do that recursively for each hierarchical level. Doing this combined with a hierarchical naming scheme could create a clean way to track what children need to be allocated using a single flat map.

For example, you can create a named process called NorthAmerica, then have it’s linked child called NorthAmerica.Canada. This would be trivial to index into a map of which children it should spawn by:

%{
   NorthAmerica => [Canada, UnitedStates, Mexico],
   NorthAmerica.Canada => [Quebec, Saskatchewan, Alberta],
   NorthAmerica.Canada.Quebec => [...]
}

Using this hierarchy could be simple to dynamically add more countries, cities, etc. If you track all Continents then you can update this map and iterate through all named continents to update their children by traversing the named process tree. This is more of a naming hierarchy, the process links establish a loose process hierarchy without introducing a large tree of supervisors. Adding new levels would be more complex. This could be combined with Registry if named processes are too ‘primitive’, but it seems like overkill to me.

While it would require some work, this could be expanded with a node-naming scheme that represents each continent.

thiagomajesk · July 15, 2022, 10:02pm

Just to be clear, I’m not that concerned about the (subjective) “best way” as much as toying around with the overall concept. BTW, I’m not making any claims about what should or shouldn’t be done. However, the premise of the question is well stated in the topic, feel free to make a suggestion of what you think is “right” for this scenario .

Yes, someone already posted this link before, but I’m curious why you think that this use-case “goes against the most fundamental principles of OTP”. The post you linked also says: “Use processes to separate runtime concerns”, which seems appropriate.

I might be completely wrong here, but I have read and watched people comparing processes in Elixir/ Erlang with objects (in the traditional sense) many times. So, imagine me trying to make sense of “this goes against the most fundamental OTP principles” . There’s this other old topic over here, with a video of Joe Armstrong talking about processes in terms of “objects” as well: Objects vs Processes.

PS.: I’m not saying that processes should be used as objects from OOPL (I’m also not saying that we shouldn’t either), but I think you already got that this is not exactly the case. I’m sure folks already have strong opinions on what the “best practices” are.

Very cool idea @mpope! I think this hierarchical naming scheme makes it very obvious how we can track the processes and easily understand what are the “dependencies” even in a flat structure. It also makes the process of mapping those dependencies trivial if we rely on this map for lookups - very cool! I’d toy around with this idea right now if I had the time, but I think I’ll have to wait for the weekend .