Finding x number of duplicate items in a list

Given a list,

ls = [1,2,3,2,2,1,3]

How to identify if an item appears 3 times in ls and return it as a list? For example, the return would be [2,2,2] in this case.

1 Like

It’s almost always in the Enum module… and this time, it’s Enum.group_by.

It depends on what should happen if there’s more than one element that appears 3 times. If we’re looking for all of them, or just any one of them, then a solution with Enum.frequencies would be pretty slick.

BUT

If the requirement is for the FIRST element to be repeated three times, that doesn’t work (maps do not retain order).

In that case, you might consider explicit recursion. Instead of just posting the code, here’s a walkthrough of how you might derive similar code.

Start with an even simpler case: “does this list have NONE of this item?” There are several possibilities:

  • the first element of the list matches the item. The answer is false.
  • apart from the first element, does the rest of the list have NONE of the item?
  • an empty list always has NONE of the item. The answer is true.

This translates directly to Elixir:

  def has_no_item([first | rest], item) when first == item, do: false
  def has_no_item([_ | rest], item), do: has_no_item(rest, item)
  def has_no_item([], item), do: true

Now take a step closer to the goal: does this item appear EXACTLY ONCE? If so, return it as a list

Again, there are cases:

  • the first element of the list matches the item. If there are NONE of the item in the rest of the list, return the item wrapped in a list, otherwise return empty list
  • does the item appear EXACTLY ONCE in the rest of the list? If so, return it as a list
  • the item does not appear EXACTLY ONCE in the empty list, return empty list

That NONE function is useful here!

And the corresponding Elixir code:

  def has_one_item([first | rest], item) when first == item do
    if has_no_item(rest, item) do
      [first]
    else
      []
    end
  end
  def has_one_item([_ | rest], item), do: has_one_item(rest, item)
  def has_one_item([], _), do: []

Next we’re headed to TWO. Similar cases appear:

  • the first element of the list matches the item. If item appears EXACTLY ONCE, prepend the first element and return. Otherwise, check if the item appears TWO times in the rest of the list.
  • the item appears TWO times in the rest of the list
  • the item does not appear TWO times in the empty list

And similar Elixir:

  def has_two_items([first | rest], item) when first == item do
    case has_one_item(rest, first) do
      [] -> has_two_items(rest, item)
      items -> [first | items]
    end
  end
  def has_two_items([_ | rest], item), do: has_two_items(rest, item)
  def has_two_items([], _), do: []

Finally we’re ready for THREE, the thing we started looking for.

The cases are:

  • if the first element of the list appears TWO times in the rest of the list, prepend the first element and return. Otherwise, does the rest of the list have an item that appears THREE times?
  • the empty list does not have an element that appears THREE times

And the code:

  def has_three_items([first | rest]) do
    case has_two_items(rest, first) do
      [] -> has_three_items(rest)
      items -> [first | items]
    end
  end
  def has_three_items([]), do: []

One quick note: this implementation has a performance pitfall, can you spot it? Think about what would happen if has_three_items was given a looooooong list with exactly two copies of every element. That can be resolved by keeping more state when recursing, similar to how memoized Fibonacci massively outperforms naive Fibonacci.

3 Likes

This does exactly what you asked. It is performant, as it does not traverse the whole list. It stops as soon as limitis found.
Returns :error if no limit is reached.

defmodule X do
  def duplicate_items(enumerable, limit) when is_integer(limit) and limit > 1 do
    result = 
      Enum.reduce_while(enumerable, %{},
        fn x, acc ->
          case acc do
            %{^x => count} when count == limit - 1 ->
              {:halt, {:ok, List.duplicate(x, limit)}}
      
            %{} ->
              {:cont, Map.update(acc, x, 1, &(&1 + 1))}
          end
        end
      )
    
    case result do
      {:ok, _} -> result
      _ -> :error
    end
  end

  def duplicate_items(enumerable, 1) do
    case Enum.fetch(enumerable, 0) do
      {:ok, value} -> {:ok, [value]}
      :error -> :error
    end
  end  
end
iex> ls = [1, 2, 3, 2, 2, 1, 3]

iex> X.duplicate_items(ls, 1)
{:ok, [1]}

iex> X.duplicate_items(ls, 2)
{:ok, [2, 2]}

iex> X.duplicate_items(ls, 3)
{:ok, [2, 2, 2]}

iex> X.duplicate_items(ls, 4)
:error
3 Likes