Alphabet Iterator

lists
enumerables

#1

Hello, Elixir gang. I’m starting to get the hang of Elixir, which has been challenging since I’m not coming from Ruby. I have been struggling to find a solution for a particular problem. I need to generate a list of strings which increments over the alphabet. So["A", "B", "C"] is a very basic example.

When the end of the alphabet is reached, I want the result to look like this: [..."X", "Y", "Z", "AA", "BB"...]. The goal is to create a function like this: generate_list(first_letter, count).

I tried using the Alphabetify package, but it was pretty slow, and didn’t quite give me the results I needed. String.duplicate got me almost there, but I still wasn’t getting the right output.

Does anyone have an idea or package suggestion for this problem? I’ve hit a wall on this one.

Where I’m at :thinking:

letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" |> String.split("", trim: true)

def generate_list(first_letter, count)
when is_binary(first_letter)
when is_integer(count) do
    # letters
    # |> Stream.cycle
    # |> Stream.take(count)

    starting_index = letters |> Enum.find_index(fn x -> x == first_letter end)

    # 0..count
    # |> Enum.map(fn x ->  end)
end

#2

This should do it. The first_letter part isn’t super efficient, but the rest should be.

EDIT: Fix divisor

def generate_list(first_letter \\ "A", count) when is_binary(first_letter) and is_integer(count) do
  ?A..?Z 
  |> Enum.map(&to_string([&1])) 
  |> Stream.cycle() 
  |> Stream.with_index() 
  |> Stream.map(fn {string, index} ->
    String.duplicate(string, div(index, 26) + 1)
  end)
  |> Stream.drop_while(&(&1 != first_letter))
  |> Enum.take(count)
end

#3

:exploding_head: This is great, though it’s duplicating Z on the first pass.

iex(11)> generate_list("W", 10)
["W", "X", "Y", "ZZ", "AA", "BB", "CC", "DD", "EE", "FF"]

I changed div(index, 25) to div(index, 26) and the problem is solved. It correctly appends additional characters too:

iex(15)> generate_list("W", 10)
["W", "X", "Y", "Z", "AA", "BB", "CC", "DD", "EE", "FF"]
iex(20)> generate_list("ZZ", 60)
["ZZ", "AAA", "BBB", "CCC", "DDD", "EEE", "FFF", "GGG", "HHH", "III", "JJJ", "KKK", "LLL", "MMM", "NNN", "OOO", "PPP", "QQQ", "RRR", "SSS", "TTT", "UUU", "VVV", "WWW", "XXX", "YYY", "ZZZ", "AAAA", "BBBB", "CCCC", "DDDD", "EEEE", "FFFF", "GGGG", "HHHH", "IIII", "JJJJ", "KKKK", "LLLL", "MMMM", "NNNN", "OOOO", "PPPP", "QQQQ", "RRRR", "SSSS", "TTTT", "UUUU", "VVVV", "WWWW", ...]

Thank you @blatyo for this clean solution!


#4

That’s indeed a really good solution. But I got interested by the possibility of improving its performance and here are the results:

defmodule LettersAndNumbers do
  def list(from \\ 0, count) do
    from
    |> stream()
    |> Enum.take(count)
  end

  def stream(from \\ 0)

  def stream(from) when is_bitstring(from) or is_list(from) do
    from
    |> letters_to_number()
    |> stream()
  end

  def stream(from) do
    from
    |> Stream.iterate(&(&1 + 1))
    |> Stream.map(&number_to_letters/1)
  end

  def number_to_letters(number) do
    [rem(number, 26) + 65]
    |> to_string()
    |> String.duplicate(div(number, 26) + 1)
  end

  def letters_to_number(letters) when is_bitstring(letters) do
    letters
    |> String.to_charlist()
    |> letters_to_number()
  end

  def letters_to_number([char | _] = letters) do
    ((length(letters) - 1) * 26) + (char - 65)
  end
end

PS.: I also did some changes to the API, but basically, instead of cycling through the alphabet and navigating it with an index, I mathematically transform numbers into letters by adding 65 to them and duplicating them div(number, 26) + 1 times like you did.


#5

@kelvinst Nice! I like this approach and I can’t wait to benchmark these two solutions.


#6

Yeah, it would be nice to benchmark them, since I’m not sure my solution will really be faster. I mean, of course it’s faster with a high “first_letter” because it will not iterate discarding everything, but I’m not sure if the calcs I’ve done didn’t add any extra complexity per iteration.


#7

This is my first foray into benchmarking, so these results may be unreliable.

Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.6.1
Erlang 20.2.4

Benchmark suite executing with the following configuration:
warmup: 3 s
time: 7 s
memory time: 0 μs
parallel: 1
inputs: 100K, 1K, 1M
Estimated total run time: 1 min


Benchmarking LettersAndNumbers.list/2 with input 100K...
Benchmarking LettersAndNumbers.list/2 with input 1K...
Benchmarking LettersAndNumbers.list/2 with input 1M...
Benchmarking generate_list/2 with input 100K...
Benchmarking generate_list/2 with input 1K...
Benchmarking generate_list/2 with input 1M...

##### With input 100K #####
Name                               ips        average  deviation         median         99th %
generate_list/2                   1.44      692.72 ms     ±4.56%      682.07 ms      747.01 ms
LettersAndNumbers.list/2          1.43      697.56 ms     ±4.66%      687.10 ms      787.46 ms

Comparison:
generate_list/2                   1.44
LettersAndNumbers.list/2          1.43 - 1.01x slower

##### With input 1K #####
Name                               ips        average  deviation         median         99th %
generate_list/2                 3.31 K      302.43 μs    ±23.01%         285 μs      695.32 μs
LettersAndNumbers.list/2        2.17 K      459.98 μs    ±14.67%         439 μs      750.16 μs

Comparison:
generate_list/2                 3.31 K
LettersAndNumbers.list/2        2.17 K - 1.52x slower

##### With input 1M #####
Name                               ips        average  deviation         median         99th %
generate_list/2                 0.0141       1.18 min     ±0.00%       1.18 min       1.18 min
LettersAndNumbers.list/2        0.0141       1.18 min     ±0.00%       1.18 min       1.18 min

Comparison:
generate_list/2                 0.0141
LettersAndNumbers.list/2        0.0141 - 1.00x slower

Benchmark Code:

defmodule TicketMarket.MutationBenchmarks do
  alias TicketMarket.LettersAndNumbers
  alias TicketMarket.MutateLayout

  inputs = %{
    "1K" => 1_000,
    "100K" => 100_000,
    "1M" => 1_000_000,
  }

  Benchee.run %{
    "LettersAndNumbers.list/2" => fn(count) -> LettersAndNumbers.list(0, count) end,
    "generate_list/2" => fn(count) -> MutateLayout.generate_list("A", count) end
  },
    formatters: [
      # Benchee.Formatters.HTML,
      Benchee.Formatters.Console
    ],
    time: 7,
    warmup: 3,
    inputs: inputs
end

It’s interesting that both approaches get exponentially slower as the count increases. I do like the simplicity of generate_list/2, so that may be the way I end up going. I don’t expect the count to be higher than 1000, so either function would be usable.

If you have any thoughts about the benchmark, let me know.
Thanks again!


Hello Elixir World (Introductions thread)
#8

So, the thing is that you didn’t bench the case were mine is supposedly faster, which would be with a high starting letter.


#9

This is true. I’ll run it again tonight with a higher starting letter.


#10

Also, be sure to get a high letter like “O” |> String.duplicate(40000) which is the number 1039988.


#11

Indeed you are correct. Starting with a crazy high letter benefits the LettersAndNumbers approach. Thi scenario would not ever happen. The highest letter would be something like "ZZZZ". Fun experiment nonetheless.

inputs = %{
    # "10" => 10,
    # "100" => 100,
    "1K" => 1_000,
    "100K" => 100_000,
    "1M" => 1_000_000,
  }

  Benchee.run %{
    "LettersAndNumbers.list/2" => fn(count) -> LettersAndNumbers.list(1039988, count) end,
    "generate_list/2" => fn(count) -> MutateLayout.generate_list("O" |> String.duplicate(40_000), count) end
  },
    formatters: [
      # Benchee.Formatters.HTML,
      Benchee.Formatters.Console
    ],
    time: 7,
    warmup: 3,
    inputs: inputs

RESULTS:

Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.6.1
Erlang 20.2.4

Benchmark suite executing with the following configuration:
warmup: 3 s
time: 7 s
memory time: 0 μs
parallel: 1
inputs: 100K, 1K, 1M
Estimated total run time: 1 min


Benchmarking LettersAndNumbers.list/2 with input 100K...
Benchmarking LettersAndNumbers.list/2 with input 1K...
Benchmarking LettersAndNumbers.list/2 with input 1M...
Benchmarking generate_list/2 with input 100K...
Benchmarking generate_list/2 with input 1K...
Benchmarking generate_list/2 with input 1M...

##### With input 100K #####
Name                               ips        average  deviation         median         99th %
LettersAndNumbers.list/2        0.0712       0.23 min     ±0.00%       0.23 min       0.23 min
generate_list/2                 0.0132       1.27 min     ±0.00%       1.27 min       1.27 min

Comparison:
LettersAndNumbers.list/2        0.0712
generate_list/2                 0.0132 - 5.40x slower

##### With input 1K #####
Name                               ips        average  deviation         median         99th %
LettersAndNumbers.list/2          8.40    0.00199 min     ±2.23%    0.00198 min    0.00209 min
generate_list/2                 0.0162       1.03 min     ±0.00%       1.03 min       1.03 min

Comparison:
LettersAndNumbers.list/2          8.40
generate_list/2                 0.0162 - 518.33x slower

##### With input 1M #####
Name                               ips        average  deviation         median         99th %
LettersAndNumbers.list/2       0.00465       3.59 min     ±0.00%       3.59 min       3.59 min
generate_list/2                0.00365       4.57 min     ±0.00%       4.57 min       4.57 min

Comparison:
LettersAndNumbers.list/2       0.00465
generate_list/2                0.00365 - 1.27x slower

#12

Yeah, but the difference on the first benchmark are really not very convincing. For 1K items 132 microseconds and for 100K is 2 milliseconds, and I guess my code can also totally be optimized on each iteration… Maybe I’ll do some testing and push it to the limit sometime… Funny experiment, and thanks for the benchmarks :slight_smile:


#13

The fastest I could come up with is:

defmodule DirectIteration do
  def list(from \\ "A", count) when is_binary(from) and is_integer(count) and count >= 0 do
    char = :binary.at(from, 0)
    len = byte_size(from)
    build_list(char, len, count)
  end

  defp build_list(_char, _len, 0), do: []
  defp build_list(?Z, len, count) do
    [String.duplicate(<<?Z>>, len) | build_list(?A, len + 1, count - 1)]
  end
  defp build_list(char, len, count) do
    [String.duplicate(<<char>>, len) | build_list(char + 1, len, count - 1)]
  end
end

It’s about twice as fast as the LettersAndNumbers implementation.

Also - when you benchmark, if you get low IPS and short benchmark time, this means it run the code only once - this is not a good measurement. Try increasing the benchmark time, so that the iteration runs at least couple times.


#14

@michalmuskala thanks for the benchmarking tip and the code!