Are nested structs an anti-pattern?

Dusty · July 5, 2020, 2:01am

In the course of working on a tree-like data structure, I have designed a struct for each node in the tree. Since there are only 5 fields in each node, and all nodes have the same 5 fields, using a struct made sense to me.

However, one of the %Node{} struct fields contains the children of the node, which is a list of %Node{} structs, each of which can have its own children, and so on.

Given the challenges of accessing nested structs, I am beginning to question my original decision to use a struct instead of a map. I see that there are a number of posts already on how to access nested structs, but I’m still left asking: What does a struct really offer me here? Basically, it offers me the ability to limit the keys for each node. But is that a worthwhile tradeoff, given that the number of nested levels could be very high? Is it really a problem to just use a map?

The nature of the code is such that reaching down many levels at once is unlikely to be needed. For the most part, the tree will be walked one level at a time, but it will frequently be necessary to match on or modify fields in the node’s children (1 level down). It seems that the risk of unintended fields on the struct is perhaps not worth the complexity of using a struct in the first place.

Thoughts?

MrDoops · July 5, 2020, 5:14am

Depends - a couple I’d things to try:

You could try wrapping/adapting an existing graph data structure like libgraph or digraph maybe using your existing %Node{} struct as a vertex. There are functions you can you use to access and travel the graph.

But you might get somewhere useful quicker with Kernel.get_in/2.

NobbZ · July 5, 2020, 7:38am

I don’t see how accessing nested structs is different from nested maps, as structs are just maps with some additional field from this point of view.

Structs, nested or not, enable you to have at least limited compiletime checking of fieldnames.

Using bare maps you won’t have that.

Dusty · July 5, 2020, 8:43am

I guess my concern was mostly about behaviours that are missing from structs, like Enumerable and Access. Access seems like the issue for deeply nested structs, as it normally allows arbitrary nesting.

Dusty · July 5, 2020, 8:49am

Thanks for your thoughts. I had a good look at libgraph a while back, and I decided that my use case is sufficiently specific that I wouldn’t use any of the provided API out of the box.

I don’t understand why Kernel.get_in/2 would be an option for structs. The docs say that get_in uses the Access module, which I understood not to be an implemented behaviour for structs. What am I missing?

NobbZ · July 5, 2020, 8:58am

Not beeing able to use Access out of the box for your Node structs is actually a feature in my opinion. This enables you to come up with an implementation of Access that would make even more sense for a tree.

dimitarvp · July 5, 2020, 9:14am

You can draw inspiration from my extremely amateurish Trie library which was basically my first ever Elixir code somewhere back in 2016, here: https://github.com/dimitarvp/trie/blob/master/lib/trie.ex#L202. This demonstrates a basic Access.get implementation. Once your struct has that you could easily use get_in with instances of the struct as parameters.

Also check out Access docs. You need only implement a few functions to be able to use get_in and put_in.

domvas · July 5, 2020, 9:46am

Your struct is a good idea. It’s in fact better than a bare map because you need the field where children are stored. Without it, some of your code can break.
Regarding your node data / properties, you can have anything you want if you specify a filed for them like this:

@type t :: %Node{
  children: [Node.t], # This is mandatory for each node
  id: integer,        # This is an example data that must be defined for each node
  properties: map     # Other data specific to the node
}

Regarding the missing Enumerable, you have two solutions:

Implement it for your struct
or the easy way: Map.from_struct/1

But on in all, it can be interesting to code your own traversal to have more control on what it’s done. Something like Node.update(depth, filter_fn, update_fn) could be nice for example.

Additionnally, if you want to go back and forth in your structure while updating it may be useful to investigate Zippers.
Hard to code but pure delight afterwards when using it.

Dusty · July 5, 2020, 11:32pm

In the beginning, I took a serious look at just using your library, Dimi, but I decided in the end that I really need to match at any point in the string. Don’t call it amateurish, I’ve been treating it like the gold standard!

Dusty · July 5, 2020, 11:55pm

Using Map.from_struct/1 as a panic button seems like a good option. I will probably only need to implement Access, if that. Thanks for your thoughts.

dimitarvp · July 6, 2020, 8:53am

Thanks for the kind words but I still don’t deserve that much praise for pretty much my first Elixir project.

I know I’ve seen one much better than mine recently here, casually linked by someone in a thread (which I now can’t find).

In any case, I did my best to utilise best coding practices and tidy it up a bit some months ago and I think that for its very basic use case it does the job well. Glad you found it useful!

bottlenecked · July 6, 2020, 9:38am

You could also draw inspiration from erlang’s :gb_trees implementation where each node is a {key, value, left_subtree, right_subtree} tuple which looks quite cheap/easy to update and probably cost less memory to keep around than a struct or map

Dusty · July 9, 2020, 6:37am

So I have been working a bit on the struct for each node in the tree. Each node needs to hold a pointer to its parent, as well as a list of children. What is the best way for the two nodes to point to each other in memory? I am coding in circles a bit. For example, if I start with this abbreviated module:

defmodule SuffixTree.Node do
  alias __MODULE__
  use Puid

  @enforce_keys [:id, :children]

  defstruct id: nil,
            parent: nil,
            label: nil,
            leaves: [],
            children: [],
            link: nil

  @type t :: %Node{
          id: String.t(),
          parent: Node.t(),
          label: {String.t(), Range.t()},
          leaves: [{String.t(), integer()}],
          children: [Node.t()],
          link: Node.t()
        }

  def new_node() do
    %Node{
      id: generate(),
      parent: nil,
      label: nil,
      leaves: [],
      children: [],
      link: nil
    }
  end

  def add_child(%{children: children} = parent, child) do
    {:ok, child} = add_parent(parent, child)
    children = [child | children] |> Enum.sort(Node)
    {:ok, %{parent | children: children}}
  end

  def add_parent(parent, child) do
    {:ok, %{child | parent: parent}}
  end

end

And then in IEx I load the module, and do something like:

node1 = new_node()
node2 = new_node()
{:ok, node1} = add_child(node1, node2)
node1

I get back something like this:

%SuffixTree.Node{
  children: [
    %SuffixTree.Node{
      children: [],
      id: "NJ_fgObmA_gBgU9GB7iy-Q",
      label: nil,
      leaves: [],
      link: nil,
      parent: %SuffixTree.Node{
        children: [],
        id: "rd2Y7laSMSDmaeT9JbLFeA",
        label: nil,
        leaves: [],
        link: nil,
        parent: nil
      }
    }
  ],
  id: "rd2Y7laSMSDmaeT9JbLFeA",
  label: nil,
  leaves: [],
  link: nil,
  parent: nil
}

I see that node1's id matches in both places (in itself, and in the parent field of its child). But the Node listed in the parent field has no children, and we know that node1 now has a child. It doesn’t take much thought for me to realize that there’s no way to display this kind of referential circle, because you would infinitely print the child inside the parent inside the child inside the parent…

So my question is: Am I pointing at 2 different memory locations? Are these actually 2 different node1's?

The obvious answer seems to be to point only to the id of the node, rather than the node itself, but that means I need a lookup table/map containing every node by id.

Am I thinking about this wrong? How would you approach it?

NobbZ · July 9, 2020, 6:59am

You can’t have circular references in immutable data structures.

You can of course simulate them using a flat datastructure and mentioning IDs.