String.split vs Enum.split_with

There seems to be no way to split an Enum like String.split does. Actually those functions have very different semantics.

String.split("123045067809", "0") #=> ["123", "45", "678", "9"]

function I’d like to have:

l =  [1,2,3,0,4,5,0,6,7,8,0,9]
Enum.split(l, &(&1 == 0)) #=>[[1, 2, 3], [4, 5], [6, 7, 8], [9]]

this is there, but not what I want:

Enum.split_with(l, &(&1 == 0)) #=> {[0, 0, 0], [1, 2, 3, 4, 5, 6, 7, 8, 9]}

close:

Enum.chunk_by(l, &(&1 == 0)) #=> [[1, 2, 3], [0], [4, 5], [0], [6, 7, 8], [0], '\t']

I think this is the first time, I miss a funciton in stdlib, that I expected to be there.

String.split(string, pattern, options \\ [])

Divides a string into parts based on a pattern. [split]

Enum.split_with(enumerable, fun)

Splits the enumerable in two lists according to the given function fun. [split_with]

You can just pipe the chunk_by result like this: |> Enum.reject(&1 == [0])?

2 Likes

Actually, this is not that hard to write. Just

def split(list, splitter, acc \\ [])
def split([], _, []), do: []
def split([], _, acc), do: [:lists.reverse acc]
def split([splitter | tail], splitter, acc) do
  [:lists.reverse(acc) | split(tail, splitter, [])]
end
def split([item | tail], splitter, acc) do
  split(tail, splitter, [item | acc])
end
2 Likes

Not much shorter, but using Enum instead of recursion:

l =  [1,2,3,0,4,5,0,6,7,8,0,9]

Enum.chunk_while(l, [], fn 
  0, acc -> {:cont, Enum.reverse(acc), []}
  element, acc -> {:cont, [element | acc]}
end, fn
  [] -> {:cont, []}
  acc -> {:cont, Enum.reverse(acc), []}
end)
# [[1, 2, 3], [4, 5], [6, 7, 8], [9]]
1 Like

Thats what I’m doing. Was just wondering why Enum does not do this and if s.o. else bothers.

The answer is probably in the question: because it’s fairly easy to assemble the desired solution and because it’s not preferable to devise a lot of list/stream combinators that can confuse people.

iex(2)> l |> Enum.join() |> String.split("0")
["123", "45", "678", "9"]

:troll:

Interesting though, especially as Enum.intersperse/2 exists.

2 Likes

:see_no_evil:

Here you go:

defmodule Example do
  def sample(list) when is_list(list) do
    # we start with one empty list
    List.foldr(list, [[]], fn
      # in case we got 0
      # we are adding new empty list at beginning of result
      0, acc -> [[] | acc]
      # otherwise we are appending element
      # as a head of first list in result
      element, [head | tail] -> [[element | head] | tail]
    end)
  end
end

[1, 2, 3, 0, 4, 5, 0, 6, 7, 8, 0, 9]
|> Example.sample()
|> IO.inspect(charlists: :as_lists)
# [[1, 2, 3], [4, 5], [6, 7, 8], [9]]

See List.foldr/3 documentation.

3 Likes

Obligatory benchmarks.

Name                      ips        average  deviation         median         99th %
recursion              3.98 M      251.49 ns ±11214.50%         188 ns         456 ns
foldr                  2.99 M      334.99 ns ±12387.85%         223 ns         506 ns
reduce                 2.70 M      370.10 ns  ±9843.80%         258 ns         567 ns
chunk_while            1.40 M      712.55 ns  ±4515.89%         532 ns         968 ns
chunk_by_reject        0.86 M     1164.60 ns  ±2888.22%         896 ns        1427 ns

Comparison:
recursion              3.98 M
foldr                  2.99 M - 1.33x slower +83.50 ns
reduce                 2.70 M - 1.47x slower +118.61 ns
chunk_while            1.40 M - 2.83x slower +461.06 ns
chunk_by_reject        0.86 M - 4.63x slower +913.11 ns

Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 24 GB
Elixir 1.14.0
Erlang 25.0

Out of curiosity, I included this reduce version too:

Enum.reduce(list, {_group = [], _acc = []}, fn
  0, {[], acc} -> {[], acc}
  0, {group, acc} -> {[], [Enum.reverse(group) | acc]}
  el, {group, acc} -> {[el | group], acc}
end)
|> case do
  {[], acc} -> Enum.reverse(acc)
  {group, acc} -> Enum.reverse([Enum.reverse(group) | acc])
end
4 Likes