Immutability, dereferencing, and complex data structures

What’s on my mind is when a small part of a complex data structure is modified, one must traverse from the root to that part to update the whole.

For the children of the part being modified, one can effectively dereference. Why can’t we dereference the direct parent of that part, saving us the trouble of traversing the rest of the data structure?

The implementation would be adding a reference to its parent on the part. Then, its immediate parent could be accessed (without the need to traverse from the root to the part to be modified), and that parent could mutably dereference to the modified child.

Because a reference was changed and not any independent data, the benefits of immutability would be preserved–as one cannot concurrently modify a single parent. AND, one can concurrently modify its children, and then dereference to the parent. In this sense, adding mutable dereferencing actually makes concurrency more powerful in an immutable language… we should add it to Elixir!

How could this be implemented? Have I missed any pitfalls?

And just to recap, I am highlighting the distinction between mutating data and mutating references in relation to concurrency. The latter is okay because the datum depends on the other, meaning it can’t be modified concurrently anyways!

I’m not sure if I understand you correctly, but given a list [1, 2, 3], you want to alter the second element that the list looks like [1, -2, 3] and change the pointer from the first element to the second as well?

Such that we get this:

old = 1 --->  2 ---> 3 ---> []

garbage =     2
new = 1 ---> -2 ---> 3 ---> []

This would brake my assumption, that I were able to use old as it was before. Also it would brake assumptions of the garbage collector about the structure of the heap (old items never reference newer ones).

If though I got you wron, can you please elaborate?

1 Like

In your example, what I am suggesting is:

[head | tail] = [ 1 | [2,3] ]

[head2 | tail2] = [ 2 | [3] ]

new_list = [-2] ++ tail2

parent(new_list, head) # => [1, -2, 3]

Under the hood, head's reference to tail is dereferenced to new_list

However, dereferencing a middle node within linked list isn’t a good use case.

list = [1,2,3,4,5,6,7,8,9,10]

def dereference([head | [remove | keep]], 0, value), 
  do: parent([value] ++ keep, head)

def dereference([head | tail], index, value), 
  do: dereference(tail, index - 1, value)

dereference(list, 6, -7) # => [6, -7, 8, 9, 10]

list # => [1,2,3,4,5,6,-7,8,9,10]

This isn’t the API we want.

The API we want is dereference(...) # => [1,2,3,4,5,6,-7,8,9,10] a brand new list.

Changing a middle node is a bad use case because the traversal is necessary anyways so mutable dereferencing is unnecessary and mutates the data.

The better example would be appending to a linked list, which could now be done in constant time.

list = [1,2,3,4,5,6]

new_list = 
  [-7,8,9,10]
  |> dereference(list)
  # => [1,2,3,4,5,6,-7,8,9,10]

list # => [1,2,3,4,5,6]

list
|> tl
|> tl
|> tl
|> tl
|> tl
|> tl
# => []


new_list
|> tl
|> tl
|> tl
|> tl
|> tl
|> tl
# => [-7,8,9,10]

Under the hood, parent/2 copies list… okay, that’s a linear operation.

So this functionality isn’t the domain of a list because a list already has syntactic sugar and the way it is used, mutative dereferencing would be very counterintuitive.

As a result, a singly linked list would not qualify as a complex data structure, especially given that each parent has only one child.

The use case I have in mind is for a graph. I guess you’d lose the old copy, but if we modify @spec dereference(child, parent)'s API to return the dereferenced subtree, we can store that and still access it if we need it. In the meantime, we DO mutate the parent, meaning we can only access the old version by dereferencing again.

Given that I’m thinking about this functionality for what in Elixir are abstract data types, I no longer think this functionality need be added to Elixir. Rather, dereferencing can be implemented using an NIF function.

I think the thing to be very aware of is that all data, ALL data, in Erlang/Elixir is immutable. This is not just a language feature but implemented/enforced all the way down in the BEAM. This means that when mutating a structure and rebuilding “on the way up” you never have to make copies of the unchanged parts of the structure only the actual path down to the new parts. No one else can change them so you are always safe. So yes when appending a new element to the end of a list you need to rebuild the list cells but there is no need to copy the actual elements in the list.

This means that there is actually no copy instruction/call in the machine. Well, there is one for binaries but that exists for one special purpose.

Writing NIFs for accessing parts of deep structures would be fine but I am not certain that they would be very much faster than writing code instead.

7 Likes

I’m not sure I get what you’re suggesting, but newer data can’t point to older data in Erlang, it would break the GC. So you can’t have a child node keep a reference to a parent.

https://www.erlang-solutions.com/blog/erlang-garbage-collector.html

1 Like

It’s interesting… The more I think about all the cases where people highlight the benefits of mutability, the more I realize that those cases almost always involve a traversal, in which case an immutable approach is just as performant.

And if one wants to do multiple things to that piece of data, one can just add an optional list of operator functions to pass in and call, passing in that piece to each, e.g. with tail recursion.

At this point, I think my initial concern has been resolved. Mutable implementations are usually just as time-efficient as immutable ones, and the Erlang VM allows them to be just about as space-efficient in most cases.

With the benefits immutability has for concurrent programming, I think in time immutable implementations will replace mutable equivalents.

Another misconception debunked. Thanks!