Efficient comparision of large data structures

Say I have a very large data structure who’s size is in the 10s of MB. This structure is arbitrarily complex, consisting of many tuples and proplists and maps, but in a roughly tree-like structure. What is the most efficient way to compare two of these structures? Was curious before I started running some benchmarks. Is pattern matching breadth first, or depth first, or does it involve a bit to bit comparison of the terms? Would using zlib on a a term_to_binary then comparing the compressed binaries make sense?

I’m having trouble finding documentation on the generally most efficient way of doing this.

What kind of comparison are you trying to do? A simple equality check? A diff?


Yup, a check if the two structures are the same or different. Not checking for the same references, but a deep comparison.

Are you finding that == is not working? == does a structural equality check and should be very efficient.

This would also work.

def same(same, same), do: true
def same(same, not_same), do: false

Depends on how the structures are loaded and whether you need to know which parts are different.

== is working forsure. Was just curious on if == (or pattern matching, pretty sure they compile down to similar things) was definitively the most efficient way.

Fair, but in this case we want to check for equality across all parts as all are subject to change in this scenario. I believe a better way to rephrase the question would be: is == / pattern matching the most efficient way to ‘fail fast’ if the structures are unequal.

I’m pretty sure that pattern matching is the most efficient way to check for equality.

There is no difference between == and pattern matching.

1 Like

They compile down to the same things more or less.

I meant “as opposed to comparing hashes or using other tricks”. As far as matching vs equality operators, I believe there shouldn’t be any difference, although to nitpick I vaguely remember pattern matching using the strict operator (=== or =:= in Erlang terms).


They aren’t.

1 === 1.0
#=> false
1 == 1.0
#=> true
1 = 1.0

I meant “in matter of performance”.