Find the first recurring character in a string

papasmurf22 · April 17, 2018, 2:47pm

hi everyone,
I have the simple problem of “Find the first recurring character in a string” that I can easily think about solving in an OO language with a for loop and a counter variable but am wondering how you would solve this “the functional way”.

i.e: If “ABCDAHWX” was passed in to the function as argument you would return the first recurring character “A”.

What is the FP way to do this in elixir? Would u use recursion or maybe matching on function parameters?

Thank you for your time

idi527 · April 17, 2018, 2:53pm

Same way, probably.

defmodule Test do
  def first_recurring(str) do
    do_first_recurring(str, [])
  end

  defp do_first_recurring(<<letter, rest::bytes>>, chars) do
    if letter in chars do
      <<letter>>
    else
      do_first_recurring(rest, [letter | chars])
    end
  end
  defp do_first_recurring("", _chars), do: nil
end

Not particularly efficient, but should provide you with a general idea.

iex(2)> Test.first_recurring("ABCDAHWX")
"A"

LostKobrakai · April 17, 2018, 2:58pm

I’d use Enum.reduce_while:

result = Enum.reduce_while(String.graphemes("ABCDAHWX"), [], fn character, prev ->
  if character in prev, do: {:halt, character}, else: {:cont, [character | prev]}
end)

with char when is_binary(char) <- result do
  char
else
  [_ | _] -> :no_duplicates
end

papasmurf22 · April 17, 2018, 3:53pm

awesome thank you. This taught me a bunch of things. Just to be clear, you do “(<<letter, rest::bytes>>, chars)” in the first private function because you’re matching a binary of unknown size?

Thank you

idi527 · April 17, 2018, 3:54pm

Just to be clear, you do “(<<letter, rest::bytes>>, chars)” in the first private function because you’re matching a binary of unknown size?

Yes, I thought letter <> rest would work, but it didn’t …

peerreynders · April 17, 2018, 4:07pm

Elixir and Unicode, Part 1: Unicode and UTF-8 Explained
Part 2: Working with Unicode Strings

jamesnorton · April 17, 2018, 4:50pm

I would uses a regular expression for this.

iex(1)> reg = ~r/(.).*?\1/
~r/(.).*?\1/
iex(2)> s = "ABCDAHWX"
"ABCDAHWX"
iex(3)> Regex.run(reg, s)
["ABCDA", "A"]

The regular expression uses a capture group to match a single character and then a back reference to the captured value to match the repeated character.

papasmurf22 · April 17, 2018, 6:58pm

this seems like a super elegant solution. thank you. if one were a regex beginner are there any resources besides the Elixir Regex docs that you’d recommend?

peerreynders · April 17, 2018, 7:05pm

https://www.regular-expressions.info/

https://regexr.com/ set the RegEx Engine to PCRE (Perl Compatible Regular Expressions)

jamesnorton · April 18, 2018, 1:39am

I’m fond of https://elixre.lpil.uk/ as well.

Qqwy · April 18, 2018, 7:38am

When doing it in pure Ellixir, the solutions by @idi527 and @LostKobrakai are, when you use a (map)set to check for duplicates rather than a list, about as fast as you can do it. Of course, I expect the regular expression version to be faster, because it will run closer to the metal (its internal algorithm will however be very similar).

idi527 · April 18, 2018, 7:50am

Of course, I expect the regular expression version to be faster

iex(2)> :timer.tc fn -> Enum.each 1..1_000, fn _ -> Test.first_recurring("ABCDAHWX") end end
{1876, :ok}
iex(4)> reg = ~r/(.).*?\1/
~r/(.).*?\1/
iex(6)> :timer.tc fn -> Enum.each 1..1_000, fn _ -> Regex.run(reg, "ABCDAHWX") end end
{5585, :ok}
iex(7)> :timer.tc fn -> Enum.each 1..1_000, fn _ -> Regex.run(reg, "ABCDAHWX") end end
{3823, :ok}
iex(8)> :timer.tc fn -> Enum.each 1..1_000, fn _ -> Regex.run(reg, "ABCDAHWX") end end
{4058, :ok}

Although the regex pattern is not compiled, but the elixir version (above) is not optimized either. I usually find regexes to be much slower than erlang’s pattern matching since they do much more work.

Qqwy · April 18, 2018, 8:33am

Yes, you are absolutely right. I stand corrected !