Intersection of two MapSets without using MapSet.intersection

Hello.

In my journey to learning Elixir, I gave myself the task of implementing an
algorithm that obtains the intersection of two MapSets without using the
built-in MapSet.intersection/2 function:

a = MapSet.new([3, 3, 3, 2, 2, 1])
b = MapSet.new([3, 4, 5])
# => Expected result #MapSet<[3]>

def intersection_a(a, b) do
    Enum.reduce(b, MapSet.new(), fn x, acc ->
      if MapSet.member?(a, x) do
        MapSet.put(acc, x)
      else
        acc
      end
    end)
end

def intersection_b(a, b) do
    # Returns a List[3] instead of #MapSet<[3]> 
    Enum.filter(b, fn x -> MapSet.member?(a, x) end)
end

The intersection_b function returns a List instead of a MapSet.
Why does this happen? :thinking:

Any suggestions to improve the algorithm is welcome.

Thanks.

1 Like

It’s a quirk of Elixir that most of the Enum functions that return collections return lists rather than the type you start with–it has to do with lack of typing and no good way to definitely determine what the return type should be. You’re looking for Enum.into to turn your list into a map.

As for the first algorithm, that is pretty much the core of what you want to do; my only suggestion would be a modification for performance in some edge cases. Imagine that a has 1,000,000 elements and b has one, what does your code do? Now imagine the opposite, a has one and b has 1,000,000 :wink:

3 Likes

Hi @sribe. Thanks so much for your answer.

I did not know that. I will pay close attention to this caveat.

I would greatly appreciate an example of the optimization you suggest. :wink:

I think that @sribe is suggesting to choose which collection you iterate through (a or b in your code) depending on which one is the smallest. You can in fact minimize the number of comparisons necessary to produce the intersection by iterating over the smallest of the two collections.

If a has 1,000,000 elements and b has one, iterating through b instead of a lets you produce the intersection after only one comparison instead of 1,000,000 :slight_smile:

This is more efficient by virtue of the fact that MapSet.member? has a performance that grows less than linearly with the size of the collection. So making fewer MapSet.member? checks on bigger collections tends to be better than many checks on smaller collections

1 Like

Hello @lucaong.
Thank you very much for your explanation. Very clear and concise. :smiley:

1 Like

The source is a good place to start:

2 Likes

It’s not so much “lack of types”, but more that there are quite a few Enumerable types out there, which cannot be transformed if you’d like to preserve the type. Especially if you’re looking at lazy or infinite collections things might just not work. But even if you look at something as simple as %Date.Range{}. If you map a date range to only the day of month it’s no longer a date range.

Not keeping the shape of data was a conscious decision, which is explained (in a sadly little hidden place) here: https://hexdocs.pm/elixir/Collectable.html#module-why-collectable

3 Likes

Hi @al2o3cr

Definitely. Thanks so much for your answer. :smiley:

Thank you very much for your clarification and for sharing this link.