When to choose between MapSet, Map, sets, or gb_sets, any more?

mguimas · July 18, 2019, 4:55pm

Hi,

it seems there are several (too many, I think) options for making sets, but no clue which one to choose in which situations, like number of elements in the collection, speed and space complexity, to name a few factors.

How do you choose between all these options when you need to use element sets in your applications?

Thanks,

NobbZ · July 18, 2019, 5:40pm

Map is not a set. You can implement one on top of it, then you get a MapSet.

sets and gb_sets are some erlang implementations of different sets, also there is ordsets.

I’m not sure about how sets is implemented, ordsets though is just an ordered list of elements. gb_sets uses a self balancing tree to implement an ordered set.

In elixir code, I’d always prefer MapSet, as its interface just fits into elixir, as its designed for/in it.

In erlang code though I quite often just use sets, because it has the shortest module name

benwilson512 · July 18, 2019, 6:03pm

Some of the erlang modules are effectively legacy implementations from before Maps existed on which you could build MapSet. I believe there are benchmarks which show MapSet to be faster on all operations than the original sets.

mguimas · July 18, 2019, 10:03pm

I would say that by default one should then opt for MapSet, and to consider the other options if rewriting the application-level algorithm is not enough to get the desired increase in performance.

I see a use of Map to implement a set in those cases where a MapSet.pop would be handy. I believe that mimicking such non-existent function via MapSet.member? followed by MapSet.delete can be slower when data is large, otherwise Map would not need to provide such function.

peerreynders · July 18, 2019, 10:38pm

defmodule Demo do
  def pop(%MapSet{map: map} = map_set, value, default \\ nil) do
    case :maps.take(value, map) do
      {_, new_map} -> {value, %{map_set | map: new_map}}
      :error -> {default, map_set}
    end
  end
end

set = MapSet.new([1, 2, 3])
IO.puts(inspect(set))
IO.puts(inspect(Demo.pop(set, 4)))
IO.puts(inspect(Demo.pop(set, 2)))

$ elixir demo.exs
#MapSet<[1, 2, 3]>
{nil, #MapSet<[1, 2, 3]>}
{2, #MapSet<[1, 3]>}
$

github.com

elixir-lang/elixir/blob/v1.9.1/lib/elixir/lib/map_set.ex#L138-L140


def delete(%MapSet{map: map} = map_set, value) do
  %{map_set | map: Map.delete(map, value)}
end

github.com

elixir-lang/elixir/blob/v1.9.1/lib/elixir/lib/map.ex#L631-L636


def pop(map, key, default \\ nil) do
  case :maps.take(key, map) do
    {_, _} = tuple -> tuple
    :error -> {default, map}
  end
end

LostKobrakai · July 19, 2019, 7:30am

MapSet is an opaque datatype though. So this would make dialyzer scream at you.

peerreynders · July 19, 2019, 10:14am

I would have expected as much.

I see this as an optimization of edge functionality - I want this value removed from the set and know whether it existed in the set - in the service of still using a type that communicates the nature of a set rather than that of a key value store for its overall intended use.

The tradeoff I see here is that I can still use MapSet where appropriate (rather than Map) but am potentially forced to take extra steps to placate dialyzer for the optimized edge functionality (i.e. pop on a set).