How to `get` or `remove` the first matched element in a list without traverse it?

Changxin · July 24, 2022, 1:45pm

Hi, are there some functions like

Enum.find_first(enumerable, x -> boolen())
# or
Enum.fliter_first(enumerable, func)
Enum.remove_first(enumerable, func)

I don’t want to traverse a whole list, but some implements like below are not so efficent.

def filter_first([], _func), do: {nil, []}
def filter_first([h | t], func) do
  case func.(h) do
    true -> {h, t}
    false ->
      {first, left} = filter_first(t, func)
      {first, [h | left]}
  end
end

are there some efficient ways to make it?

kokolegorille · July 24, 2022, 1:47pm

Hello and welcome,

You could use Enum.find, as it returns the first match…

hst337 · July 24, 2022, 2:22pm

The most efficient and simple solution is to use pathex

To define first element in an enumerable which satisfies the condition (for which func returns true), you can write something like

import Pathex; import Pathex.Lenses
path_to_first = some() ~> filtering(func)

And you can use this path to delete, set, update, etc. using Pathex verbs like

Pathex.delete(enumerable, path_to_first)
Pathex.set(enumerable, path_to_first, new_value)
Pathex.over(enumerable, path_to_first, fn old_value -> old_value + 2 end)

al2o3cr · July 24, 2022, 5:40pm

Can you describe what you’re concerned about with regards to efficiency with that code?

One minor issue I see is that filter_first with a predicate that returns false for every element in the list will create a new list despite not actually removing anything. You could avoid that by distinguishing “nil was found in the list” from “nothing was found in the list”:

  def filter_first(list, func) do
    case do_filter_first(list, func) do
      {first, rest} ->
        {first, rest}

      :not_found ->
        {nil, list}
    end
  end

  def do_filter_first([], _func), do: :not_found
  def do_filter_first([h | t], func) do
    if func.(h) do
      {h, t}
    else
      case do_filter_first(t, func) do
        {first, rest} ->
          {first, [h | rest]}

        :not_found ->
          :not_found
      end
    end
  end

(if you squint just a little, you can see a Maybe monad happening here)

HOWEVER

If you have code that’s regularly removing elements from the middle of very long lists - the sort of lists where this optimization might be useful - consider picking a better data structure to support that operation.

hst337 · July 24, 2022, 5:44pm

There are two problems with original code.
First, it works only with lists.
Second, it is recursive, but it’s not tail recursion (or [item | recursive(tail)] kind of recursion). So it’ll perform some extra concatenations and will have all list items present on the stack

krstfk · July 24, 2022, 8:24pm

Pathex is a joy to work with, I put it in my deps by default. (Thank you)

eksperimental · July 24, 2022, 9:33pm

Sounds like a problem for Enum.reduce_while/3

al2o3cr · July 24, 2022, 10:13pm

Using a function that returns a list on non-list Enumerables is possible, but it’s going to 100% have to traverse the whole structure to create all the list cells.

For instance:

  def filter_first(enum, func) do
    case Enum.reduce(enum, {:not_found, []}, &reduce_filter_first(&1, &2, func)) do
      {:ok, result, list} ->
        {result, Enum.reverse(list)}

      {:not_found, list} ->
        {nil, Enum.reverse(list)}
    end
  end

  defp reduce_filter_first(el, {:ok, result, rest}, _func) do
    {:ok, result, [el | rest]}
  end

  defp reduce_filter_first(el, {:not_found, rest}, func) do
    if func.(el) do
      {:ok, el, rest}
    else
      {:not_found, [el | rest]}
    end
  end

This uses Enum.reduce to accept anything that speaks the Enumerable protocol.

However, it has a couple of efficiency concerns:

has to traverse every element of the input list, even if the value has already been found since the result from reduce contains the list in reverse order
returns a new list even if the input was a list and the function never returned true. (for the same reason)
it uses less stack, but creates more garbage on the heap (the entire input of Enum.reverse will become garbage)

The situation on the BEAM is more complicated than “body recursion BAD, tail recursion GOOD”.

See a recent post where I went much farther down that rabbithole than I expected to:

BUT TO REITERATE

If you have enough elements in the list for any of this to really matter, you should be looking for a better data structure - one that supports deletions from arbitrary positions faster than O(length).

Changxin · July 24, 2022, 11:52pm

Great buddies, Thank you all!
I just switched to Elixir from Rust & Julia recently. So I am still accustomed to the pursuit of performance. I may need to take a new mind to think in Elixir way.

eksperimental · July 25, 2022, 1:52am

Enum.reduce_while will not work, but you can implement your own version of reduce using recursion.

defmodule Bar do
  def filter_first(list, fun, acc \\ [])

  def filter_first([head | tail], fun, acc) do
    if fun.(head) do
      {head, Enum.reverse(acc) ++ tail}
    else
      filter_first(tail, fun, [head | acc])
    end
  end

  def filter_first([], _fun, acc) do
    {nil, Enum.reverse(acc)}
  end
end

Changxin · July 25, 2022, 8:44am

I gonna to learn and add it to my deps!

hst337 · July 25, 2022, 9:23am

you should be looking for a better data structure - one that supports deletions from arbitrary positions faster than O(length).

This is a misunderstanding of original task. @Changxin was just looking for solution to delete the first element he has found satisfying the predicate during traversal of the structure.

So, first of all, the traversal of the structure is already a O(length) task. O(length) to delete, doesn’t change complexity

Secondly, list is one of the fastest structures which can delete value during traversal (second one is tuple).

So the most efficient solution is:

defp do_delete_first([], _func), do: []
defp do_delete_first([head | tail], func) do
  if func.(head) do
    Process.put(:"__delete_first_found__", head)
    tail
  else
    [head | do_delete_first(tail, func)]
  end
end

def delete_first(list, func) do
  list = do_delete_first(list, func)
  {Process.delete(:"__delete_first_found__"), list}
end

Because it has nothing in the stack, it traverses until reaches the value and it uses this trick with tail recursion where return value is moved into the head of the list once the function returns. But this function looks ugly, because it uses pdict.

That’s why implementation of this function is a job for a library

Considering other arguments,

Using a function that returns a list on non-list Enumerables is possible, but it’s going to 100% have to traverse the whole structure to create all the list cells.

Some strucutres can be traversed and ordered (consider tuples and keywords). For these structures you also shouldn’t traverse the whole structure

The situation on the BEAM is more complicated than “body recursion BAD, tail recursion GOOD”.

Yeah, I know that, and I’ve read about it in “7 myths of erlang efficiency”, but I’ve described why this particular case of recursion is worse than just a tail recursion. The reason were: extra concatenations and variables on the stack

Math problem. How to code `Xn+1 = Xn + Xn * S` - #10 by al2o3cr

This is a great post, thanks!

Changxin · July 25, 2022, 9:30am

Excellent!
Well, Pathex.pop is what I want! Thx!

LostKobrakai · July 25, 2022, 9:31am

There’s no reason to use pdict here. Just change the return value fo do_delete_first to handle returning both the list as well as a potentially found value.

hst337 · July 25, 2022, 9:40am

If so, the trick with tail recursion won’t work

sabiwara · July 25, 2022, 11:41am

Interesting trick. I think you meant body recursion

Just for the sake of completion, here is the tail recursive version:


  def find_delete(list, fun) when is_function(fun, 1) do
    do_find_delete(list, fun, [])
  end

  defp do_find_delete([head | tail], fun, acc) do
    if fun.(head) do
      {head, :lists.reverse(acc, tail)}
    else
      do_find_delete(tail, fun, [head | acc])
    end
  end

  defp do_find_delete([], _fun, acc) do
    {nil, :lists.reverse(acc)}
  end

This rough benchmark gives me a slightly faster time for the tail-recursive version, but takes slightly more memory (results). Of course this would vary based on OS, list size, OTP version…

hst337 · July 25, 2022, 2:23pm

I’ve tested on bigger (10000 items) lists where 5000th element is searched and median differencies are just 1%, but memory difference is 2x

Name                     ips        average  deviation         median         99th %
tail recursive       10.40 K       96.15 μs    ±16.50%      100.23 μs      127.70 μs
process dict          9.95 K      100.47 μs    ±10.58%      101.63 μs      132.07 μs

Name              Memory usage
tail recursive       156.29 KB
process dict          78.20 KB - 0.50x memory usage -78.08594 KB

So I think that your solution is more efficient in time for shorter lists, but pdict solution is always more efficient in memory

hst337 · July 25, 2022, 2:32pm

Hmm, the strange thing is that for very short lists of 10 elements pdict is faster

Name                     ips        average  deviation         median         99th %
process dict          3.33 M      299.86 ns  ±2907.65%         247 ns         647 ns
tail recursive        2.70 M      371.06 ns  ±5666.75%         254 ns         599 ns

Name              Memory usage
process dict             240 B
tail recursive           360 B - 1.50x memory usage +120 B

I have no explanation for this

Changxin · July 26, 2022, 3:01am

emmm, since Pathex.pop performs double lookup, thus sometimes it may worse than Enum.group_by?

hst337 · July 27, 2022, 11:28am

Yeah, it performs double lookups. I’ve just added support of the pop operation, but I haven’t performed any benchmarks for this. I have a plan for 2.3 version to improve pop, I just need to decide how.

This is a big question based mostly on heuristics. For example, I can use pdict to store the value during deletion (but this will break the inter-process paths). I can move the value onto stack, but this will significantly increase the time of delete operation. I can have both pop and delete operations in the path, but this will increase the amount of generated code.

So right now I am just gathering information about the best solution for pop, but I can assure you it’ll be optimized in the next version.