In my journey to learning Elixir, I gave myself the task of implementing an
algorithm that obtains the intersection of two MapSets without using the
built-in MapSet.intersection/2 function:
a = MapSet.new([3, 3, 3, 2, 2, 1])
b = MapSet.new([3, 4, 5])
# => Expected result #MapSet<[3]>
def intersection_a(a, b) do
Enum.reduce(b, MapSet.new(), fn x, acc ->
if MapSet.member?(a, x) do
MapSet.put(acc, x)
else
acc
end
end)
end
def intersection_b(a, b) do
# Returns a List[3] instead of #MapSet<[3]>
Enum.filter(b, fn x -> MapSet.member?(a, x) end)
end
The intersection_b function returns a List instead of a MapSet.
Why does this happen?
Any suggestions to improve the algorithm is welcome.
It’s a quirk of Elixir that most of the Enum functions that return collections return lists rather than the type you start with–it has to do with lack of typing and no good way to definitely determine what the return type should be. You’re looking for Enum.into to turn your list into a map.
As for the first algorithm, that is pretty much the core of what you want to do; my only suggestion would be a modification for performance in some edge cases. Imagine that a has 1,000,000 elements and b has one, what does your code do? Now imagine the opposite, a has one and b has 1,000,000
I think that @sribe is suggesting to choose which collection you iterate through (a or b in your code) depending on which one is the smallest. You can in fact minimize the number of comparisons necessary to produce the intersection by iterating over the smallest of the two collections.
If a has 1,000,000 elements and b has one, iterating through b instead of a lets you produce the intersection after only one comparison instead of 1,000,000
This is more efficient by virtue of the fact that MapSet.member? has a performance that grows less than linearly with the size of the collection. So making fewer MapSet.member? checks on bigger collections tends to be better than many checks on smaller collections
It’s not so much “lack of types”, but more that there are quite a few Enumerable types out there, which cannot be transformed if you’d like to preserve the type. Especially if you’re looking at lazy or infinite collections things might just not work. But even if you look at something as simple as %Date.Range{}. If you map a date range to only the day of month it’s no longer a date range.
Not keeping the shape of data was a conscious decision, which is explained (in a sadly little hidden place) here: Collectable — Elixir v1.16.0