Integer.to_string_with_underscores?

At7heb · October 17, 2023, 12:22pm

I would like to have integers converted to strings with underscores. So instead of “77777777”, I would have “77_777_777”. Do I have to write my own?

(I’m writing a user program simulator for the SDS 940, a 24 bit machine with all documentation in octal.)

sabiwara · October 17, 2023, 12:53pm

Maybe this snippet taken from the Elixir formatter can help:

        digits
        |> String.to_charlist()
        |> Enum.reverse()
        |> Enum.chunk_every(3)
        |> Enum.intersperse(~c"_")
        |> List.flatten()
        |> Enum.reverse()
        |> List.to_string()

At7heb · October 17, 2023, 7:29pm

Looks good. Thank you.

adamu · October 18, 2023, 4:55am

Here’s a version that does it by calculating the offset and building up the string in a single pass.

  def annotate(str) do
    length = byte_size(str)
    offset = rem(length, 3)
    {acc, rest} = String.split_at(str, offset)
    do_annotate(acc, rest)
  end

  defp do_annotate(acc, ""), do: acc
  defp do_annotate("", <<next::binary-3, rest::binary>>), do: do_annotate(next, rest)

  defp do_annotate(acc, <<next::binary-3, rest::binary>>),
    do: do_annotate(<<acc::binary, ",", next::binary>>, rest)

In my benchmarks it’s about 3x faster than the formatter implementation for 7-digit strings (8x for 1kb strings but that’s probably not a realistic use-case). A fun little challenge but I know people aren’t a fan of the bitstring syntax

Name                ips        average  deviation         median         99th %
adamu            5.46 M      183.02 ns  ±8688.03%         125 ns         250 ns
formatter        1.97 M      508.32 ns  ±3909.15%         334 ns         542 ns

Comparison:
adamu            5.46 M
formatter        1.97 M - 2.78x slower +325.30 ns

Memory usage statistics:

Name         Memory usage
adamu             0.40 KB
formatter         1.89 KB - 4.75x memory usage +1.49 KB

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.15.4
Erlang 26.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 500 ms
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 15 s

sabiwara · October 18, 2023, 1:50pm

Nice! I tried a spin-off based on binary comprehension out of curiosity, but it couldn’t beat recursion in the benchmarks (link)

length = byte_size(str)
offset = rem(length - 1, 3) + 1
{acc, rest} = String.split_at(str, offset)
for <<group::binary-3 <- rest>>, into: acc, do: <<"_", group::binary-3>>

Results:

##### With input 7 digits #####
Name                    ips        average  deviation         median         99th %
Comprehension        3.95 M      253.43 ns ±12903.32%         167 ns         292 ns
Recursive            3.70 M      270.12 ns ±12392.13%         166 ns         292 ns

Comparison: 
Comprehension        3.95 M
Recursive            3.70 M - 1.07x slower +16.69 ns

Memory usage statistics:

Name             Memory usage
Comprehension           680 B
Recursive               488 B - 0.72x memory usage -192 B

##### With input 30 digits #####
Name                    ips        average  deviation         median         99th %
Recursive            3.00 M      333.77 ns  ±3331.98%         292 ns         458 ns
Comprehension        1.91 M      522.30 ns  ±3697.19%         459 ns         584 ns

Comparison: 
Recursive            3.00 M
Comprehension        1.91 M - 1.56x slower +188.53 ns

Memory usage statistics:

Name             Memory usage
Recursive             0.95 KB
Comprehension         1.95 KB - 2.06x memory usage +1 KB

At7heb · October 18, 2023, 2:17pm

nice! I see a way to fairly easily avoid “-_654_321”. (I know digits are supposed to be just digits…)

In late '69 or early '70 I sped up a snobol program (the “compositor” on the SDS 940 at NOAA in Boulder) that formatted text for publication. The function to add spaces between words to make the line a give width was really slow. Among other things, it would call reverse() twice on every other line so the white space looked balanced. I used a bit of arithmetic to do the job without creating temporary strings of the line of text. (The garbage collector was really slow.)

At7heb · October 18, 2023, 2:19pm

Looking at the code, I feel like an imposter (sad face).