Is there a way to test reference equivalence?

blatyo · October 13, 2019, 7:38pm

Is there a way to tell if two variables reference the same data?

For example:

x = "thing"
y = x
z = "thing"

In this example x and y would have the same reference. However x and z would have different references, even though they have the same value. In ruby for example, you can check this by comparing object_id for both values. In Java, this is done with ==.

lpil · October 13, 2019, 8:03pm

In Elixir there is no meaningful difference between values and the pointers to them as data is immutable. Could you share what you want this functionality for?

blatyo · October 13, 2019, 8:09pm

It’s useful for testing optimizations that the language, runtime, or macros may provide in optimizing memory. For example, if you ran Enum.map/2 on an empty list, one implementation of that might return the same list, whereas another returns a new empty list.

blatyo · October 13, 2019, 8:36pm

This is a better example than the one I gave: https://github.com/erlang/otp/commit/6007bf0f822014e0bbd0cef675848d18ae99b69e

Having a way to check reference equality would make it easy to test a situation like this.

LostKobrakai · October 13, 2019, 8:46pm

I feel like you’re at the wrong level of abstraction. Either you shouldn’t be concerned with optimisations (they’re supposed to be opaque in terms of logical correctness) or you’re the implementor testing the implementation, but you’d then write the test at whatever level the implementation/optimisation is done in and not the language being implemented. Your example already shows this as the commit is all about C code.

If you’re concerned with memory usage of your code, that can be a concern even at a higher level of abstraction, but even then I’d not try to test for a certain implementation detail (same “pointer” – which is a concept not really present in elixir), but test against actual memory usage.

blatyo · October 13, 2019, 9:09pm

I agree that you’d want to fix it at the level it occurs at. I don’t agree that you necessarily want to start there. Suppose I notice that a particular process is using more memory than I expect. I start looking at that process and see that I’m creating a lot of maps. I then look to see where those maps are coming from. I see that updating the map with the same value creates a new map with reference equality checks. I’m able to fix it at the level of my application. I can then submit a patch to Erlang that fixes it at that level and remove my fix when that makes it to a stable release.

Knowing there is a memory problem isn’t the same as knowing why there is a memory problem. Do you have a suggestion of some other means to discover why?

LostKobrakai · October 13, 2019, 9:29pm

Memory management of data on the beam is a black box to users of beam languages. It’s handled completely by the vm. Detecting problems with it is somethings you’d do it like for any other problem, which is caused in a black box. Form a hypothesis, change the code in a way aligned with said hypothesis and by observing the results see it supported or not. If you want a more straight forward way you need to open the black box and look into it.

Take for example the optimisation that small maps are stored as tuples by the vm. The only way to observe that from elixir is that keys keep their order in certain scenarios. Those can be found if you look hard for them. You might be able to infer the connection to tuples based on keys keeping the order, which is not how maps usually work. But there’s no way in elixir to access the “tuple” form of the map or test for it being a tuple specifically. If you want to ensure the optimisation is using a tuple you’ll need to drop to lower levels.

blatyo · October 13, 2019, 9:48pm

I did find a way that works from this stackoverflow thread. Note that in some instances where you might expect a different value, Erlang may have optimized it by using the constant pool or by assigning the same reference to two different variables.

Partially reproduced here:

defmodule Test do
  def same?(a, b) when a != b, do: false

  def same?(a, b) do
    :erts_debug.size({a,b}) == :erts_debug.size({a, a})
  end
end

x = "foo"
y = "foo"

Test.same?(x, x) #=> true
Test.same?(x, y) #=> false

x = %{x: 1}
y = %{x | x: 1}
Test.same?(x, y) #=> true

Note: I made a small modification to not hardcode the tuple size.

EDIT: Only compare size of tuple for things with equivalent value

christhekeele · October 14, 2019, 1:34am

I think the key observation here is that those only make sense for object-oriented languages. The way I think about this is that generally speaking, objects are runtime constructs that combine:

identity (via object ID in the Ruby case)
data (via the object’s state in instance variables in the Ruby case)
behaviour (via methods and the object’s inheritance hierarchy in the Ruby case)

Functional programming separates data from behaviour. And many of its optimizations, and nice concurrency properties that allow moving data between systems (like from node to node in the BEAM), are predicated on identity being defined in terms of data in the FP world: identical data must have the same identity, sans intrinsic internal id. This lets us ship data between nodes, processes, and functions without worrying about stateful thread-safety, or serializing behaviour as well as data.

Correct me if I’m wrong, but this exact language implementation optimization wouldn’t be possible if the language made it possible to access underlying reference information? If we leaked reference information, the optimization might have broken an end user’s expectations that updated maps would have different references from the original pre-mutation?

This is why I’m inclined to agree with @LostKobrakai that this should, and in fact must be a blackbox property of the BEAM: it’s required by FP semantics to provide optimizations when building an immutable language atop a mutable world to stay fast. Such testing must be done at the blackbox VM level rather than language level for the VM to make any sense.

blatyo · October 14, 2019, 2:13am

What you’re describing is referential transparency and I agree that in a running system it is a useful property to have. The place where I’m disagreeing, is that it is necessary to keep referential integrity when you are debugging.

If you were relying on it in your runtime system, correct. However, :erts_debug.size/1 already leaks that information. I think the name of that module is important, in that it specifically calls out the debugging use case.

Erlang and Elixir are not purely functional for pragmatic reasons. I don’t think they need to be purely referentially transparent for pragmatic reasons as well.

LostKobrakai · October 14, 2019, 5:40am

It doesn’t actually leak the information. Your example code does heuristically determine that a and b are the same if their size is the same and they compare to be equal at runtime. In theory the beam could copy a term, store it in another low level representation, which just happens to have the same size than the old term and your code would determine it’s the same even though there was actually a copy involved. It’s very unlikely this will actually ever happen, which makes the heuristic in your code quite useful, but you need to acknowledge that it’s just heuristic and not a leak of information. You still have no access to the actual pointers, which would allow you to be certain you’re still using the same data in the same memory slot.

What you’re doing is basically what I mentioned in my first answers as “test [the] actual memory usage” and “trying to find supporting evidence for a hypothesis”. This might be enough for your use-case and to not need to drop down to a lower level, but it’s not what I would call a solution bringing total certainty.

OvermindDL1 · October 16, 2019, 7:35pm

Technically there is only one empty list value in the BEAM, it’s shared everywhere, specifically because it’s a non-heap encoding of the stack bits (it has no ‘real’ data associated with it). ^.^

But no, no such thing as structural vs reference equality on the beam, it’s all just ‘equality’ (which is structural equality).

That’s quite costly and will give you many many tons of false positives for what you are trying for.

blatyo · October 16, 2019, 11:18pm

Do you have some examples?

benwilson512 · October 17, 2019, 3:05am

iex(4)> Test.same?(Enum, List)  
true
iex(5)> Test.same?(1, 2)      
true

Can I ask what the point of all of this is? If you’re trying to optimize comparisons, you should know that erts_debug.size is not exactly fast.

blatyo · October 17, 2019, 5:25pm

Good point, I did mention the constant pool as a caveat. Didn’t know :erts_debug.size(:c) would equal 0. Test.same?/2 is really only intended for values that are equivalent right now, so I can update the answer to handle that.

I wanted to know if erlang would allocate a new value when returning a constant from a function. For example:

defmodule X do
  def x do
    [1]
  end
end

Test.same? X.x, X.x #=> true
Test.same? [1], [1] #=> false

I’m not worried about the performance, because like I said, I’m only interested in using this for debugging contexts. I doubt I’d ever run this in anything other than iex on my local machine.

josevalim · October 17, 2019, 5:38pm

Erlang/OTP 22 (and possibly earlier) provides :erts_debug.same/2, which will allow you to do the desired memory pointer test. However, note the function is undocumented and in a module named erts_debug, so you should only rely on it for debugging and testing, and never in production code.

In my almost 9 years using Erlang/Elixir, I have only used it once, which is to test that we are not needlessly allocating structs in Ecto. Here is the commit for reference.

PS: I have also added this same reply to the Stack Overflow thread above.