When to choose between MapSet, Map, sets, or gb_sets, any more?

Hi,

it seems there are several (too many, I think) options for making sets, but no clue which one to choose in which situations, like number of elements in the collection, speed and space complexity, to name a few factors.

How do you choose between all these options when you need to use element sets in your applications?

Thanks,

1 Like

Map is not a set. You can implement one on top of it, then you get a MapSet.

sets and gb_sets are some erlang implementations of different sets, also there is ordsets.

I’m not sure about how sets is implemented, ordsets though is just an ordered list of elements. gb_sets uses a self balancing tree to implement an ordered set.

In elixir code, I’d always prefer MapSet, as its interface just fits into elixir, as its designed for/in it.

In erlang code though I quite often just use sets, because it has the shortest module name :wink:

4 Likes

Some of the erlang modules are effectively legacy implementations from before Maps existed on which you could build MapSet. I believe there are benchmarks which show MapSet to be faster on all operations than the original sets.

6 Likes

I would say that by default one should then opt for MapSet, and to consider the other options if rewriting the application-level algorithm is not enough to get the desired increase in performance.

I see a use of Map to implement a set in those cases where a MapSet.pop would be handy. I believe that mimicking such non-existent function via MapSet.member? followed by MapSet.delete can be slower when data is large, otherwise Map would not need to provide such function.

defmodule Demo do
  def pop(%MapSet{map: map} = map_set, value, default \\ nil) do
    case :maps.take(value, map) do
      {_, new_map} -> {value, %{map_set | map: new_map}}
      :error -> {default, map_set}
    end
  end
end

set = MapSet.new([1, 2, 3])
IO.puts(inspect(set))
IO.puts(inspect(Demo.pop(set, 4)))
IO.puts(inspect(Demo.pop(set, 2)))

$ elixir demo.exs
#MapSet<[1, 2, 3]>
{nil, #MapSet<[1, 2, 3]>}
{2, #MapSet<[1, 3]>}
$ 

MapSet is an opaque datatype though. So this would make dialyzer scream at you.

1 Like

I would have expected as much.

I see this as an optimization of edge functionality - I want this value removed from the set and know whether it existed in the set - in the service of still using a type that communicates the nature of a set rather than that of a key value store for its overall intended use.

The tradeoff I see here is that I can still use MapSet where appropriate (rather than Map) but am potentially forced to take extra steps to placate dialyzer for the optimized edge functionality (i.e. pop on a set).

1 Like