Is there a way to tell if two variables reference the same data?
For example:
x = "thing"
y = x
z = "thing"
In this example x and y would have the same reference. However x and z would have different references, even though they have the same value. In ruby for example, you can check this by comparing object_id for both values. In Java, this is done with ==.
In Elixir there is no meaningful difference between values and the pointers to them as data is immutable. Could you share what you want this functionality for?
Itâs useful for testing optimizations that the language, runtime, or macros may provide in optimizing memory. For example, if you ran Enum.map/2 on an empty list, one implementation of that might return the same list, whereas another returns a new empty list.
I feel like youâre at the wrong level of abstraction. Either you shouldnât be concerned with optimisations (theyâre supposed to be opaque in terms of logical correctness) or youâre the implementor testing the implementation, but youâd then write the test at whatever level the implementation/optimisation is done in and not the language being implemented. Your example already shows this as the commit is all about C code.
If youâre concerned with memory usage of your code, that can be a concern even at a higher level of abstraction, but even then Iâd not try to test for a certain implementation detail (same âpointerâ â which is a concept not really present in elixir), but test against actual memory usage.
I agree that youâd want to fix it at the level it occurs at. I donât agree that you necessarily want to start there. Suppose I notice that a particular process is using more memory than I expect. I start looking at that process and see that Iâm creating a lot of maps. I then look to see where those maps are coming from. I see that updating the map with the same value creates a new map with reference equality checks. Iâm able to fix it at the level of my application. I can then submit a patch to Erlang that fixes it at that level and remove my fix when that makes it to a stable release.
Knowing there is a memory problem isnât the same as knowing why there is a memory problem. Do you have a suggestion of some other means to discover why?
Memory management of data on the beam is a black box to users of beam languages. Itâs handled completely by the vm. Detecting problems with it is somethings youâd do it like for any other problem, which is caused in a black box. Form a hypothesis, change the code in a way aligned with said hypothesis and by observing the results see it supported or not. If you want a more straight forward way you need to open the black box and look into it.
Take for example the optimisation that small maps are stored as tuples by the vm. The only way to observe that from elixir is that keys keep their order in certain scenarios. Those can be found if you look hard for them. You might be able to infer the connection to tuples based on keys keeping the order, which is not how maps usually work. But thereâs no way in elixir to access the âtupleâ form of the map or test for it being a tuple specifically. If you want to ensure the optimisation is using a tuple youâll need to drop to lower levels.
I did find a way that works from this stackoverflow thread. Note that in some instances where you might expect a different value, Erlang may have optimized it by using the constant pool or by assigning the same reference to two different variables.
Partially reproduced here:
defmodule Test do
def same?(a, b) when a != b, do: false
def same?(a, b) do
:erts_debug.size({a,b}) == :erts_debug.size({a, a})
end
end
x = "foo"
y = "foo"
Test.same?(x, x) #=> true
Test.same?(x, y) #=> false
x = %{x: 1}
y = %{x | x: 1}
Test.same?(x, y) #=> true
Note: I made a small modification to not hardcode the tuple size.
EDIT: Only compare size of tuple for things with equivalent value
I think the key observation here is that those only make sense for object-oriented languages. The way I think about this is that generally speaking, objects are runtime constructs that combine:
identity (via object ID in the Ruby case)
data (via the objectâs state in instance variables in the Ruby case)
behaviour (via methods and the objectâs inheritance hierarchy in the Ruby case)
Functional programming separates data from behaviour. And many of its optimizations, and nice concurrency properties that allow moving data between systems (like from node to node in the BEAM), are predicated on identity being defined in terms of data in the FP world: identical data must have the same identity, sans intrinsic internal id. This lets us ship data between nodes, processes, and functions without worrying about stateful thread-safety, or serializing behaviour as well as data.
Correct me if Iâm wrong, but this exact language implementation optimization wouldnât be possible if the language made it possible to access underlying reference information? If we leaked reference information, the optimization might have broken an end userâs expectations that updated maps would have different references from the original pre-mutation?
This is why Iâm inclined to agree with @LostKobrakai that this should, and in fact mustbe a blackbox property of the BEAM: itâs required by FP semantics to provide optimizations when building an immutable language atop a mutable world to stay fast. Such testing must be done at the blackbox VM level rather than language level for the VM to make any sense.
What youâre describing is referential transparency and I agree that in a running system it is a useful property to have. The place where Iâm disagreeing, is that it is necessary to keep referential integrity when you are debugging.
If you were relying on it in your runtime system, correct. However, :erts_debug.size/1 already leaks that information. I think the name of that module is important, in that it specifically calls out the debugging use case.
Erlang and Elixir are not purely functional for pragmatic reasons. I donât think they need to be purely referentially transparent for pragmatic reasons as well.
It doesnât actually leak the information. Your example code does heuristically determine that a and b are the same if their size is the same and they compare to be equal at runtime. In theory the beam could copy a term, store it in another low level representation, which just happens to have the same size than the old term and your code would determine itâs the same even though there was actually a copy involved. Itâs very unlikely this will actually ever happen, which makes the heuristic in your code quite useful, but you need to acknowledge that itâs just heuristic and not a leak of information. You still have no access to the actual pointers, which would allow you to be certain youâre still using the same data in the same memory slot.
What youâre doing is basically what I mentioned in my first answers as âtest [the] actual memory usageâ and âtrying to find supporting evidence for a hypothesisâ. This might be enough for your use-case and to not need to drop down to a lower level, but itâs not what I would call a solution bringing total certainty.
Technically there is only one empty list value in the BEAM, itâs shared everywhere, specifically because itâs a non-heap encoding of the stack bits (it has no ârealâ data associated with it). ^.^
But no, no such thing as structural vs reference equality on the beam, itâs all just âequalityâ (which is structural equality).
Thatâs quite costly and will give you many many tons of false positives for what you are trying for.
Good point, I did mention the constant pool as a caveat. Didnât know :erts_debug.size(:c) would equal 0. Test.same?/2 is really only intended for values that are equivalent right now, so I can update the answer to handle that.
I wanted to know if erlang would allocate a new value when returning a constant from a function. For example:
defmodule X do
def x do
[1]
end
end
Test.same? X.x, X.x #=> true
Test.same? [1], [1] #=> false
Iâm not worried about the performance, because like I said, Iâm only interested in using this for debugging contexts. I doubt Iâd ever run this in anything other than iex on my local machine.
Erlang/OTP 22 (and possibly earlier) provides :erts_debug.same/2, which will allow you to do the desired memory pointer test. However, note the function is undocumented and in a module named erts_debug, so you should only rely on it for debugging and testing, and never in production code.
In my almost 9 years using Erlang/Elixir, I have only used it once, which is to test that we are not needlessly allocating structs in Ecto. Here is the commit for reference.
PS: I have also added this same reply to the Stack Overflow thread above.