`Float.round` performance in contrast to "Erlang-way"

I have written simple comparison between Elixir’s Float.round and custom solution made by utilising :io_lib:

defmodule Bench.Float do
  def round(float, 0) when is_float(float), do: :erlang.round(float) |> :erlang.float()
  def round(float, precision) when is_float(float) and precision in 1..15 do
    str = :io_lib.format('~.*f', [precision, float])

    :erlang.list_to_float(str)
  end
end

and benchmarking code:

small = 5.55
med = :rand.uniform |> Float.round(10)
large = :rand.uniform * 10000

inputs = %{
  "Trivial | small precision" => {small, 1},
  "Medium | small precision" => {med, 1},
  "Large | small precision" => {large, 1},
  "Trivial | med precision" => {small, 5},
  "Medium | med precision" => {med, 5},
  "Large | med precision" => {large, 5},
  "Trivial | large precision" => {small, 12},
  "Medium | large precision" => {med, 12},
  "Large | large precision" => {large, 12},
}

Benchee.run(%{
  "built in" => fn {f, p} -> Float.round(f, p) end,
  "erlang" => fn {f, p} -> Bench.Float.round(f, p) end,
}, inputs: inputs)

And I get almost constant improvement over current algorithm:

Operating System: macOS"
CPU Information: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Number of Available Cores: 4
Available memory: 8 GB
Elixir 1.7.3
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: Large | large precision, Large | med precision, Large | small precision, Medium | large precision, Medium | med precision, Medium | small precision, Trivial | large precision, Trivial | med precision, Trivial | small precision
Estimated total run time: 2.10 min


Benchmarking built in with input Large | large precision...
Benchmarking built in with input Large | med precision...
Benchmarking built in with input Large | small precision...
Benchmarking built in with input Medium | large precision...
Benchmarking built in with input Medium | med precision...
Benchmarking built in with input Medium | small precision...
Benchmarking built in with input Trivial | large precision...
Benchmarking built in with input Trivial | med precision...
Benchmarking built in with input Trivial | small precision...
Benchmarking erlang with input Large | large precision...
Benchmarking erlang with input Large | med precision...
Benchmarking erlang with input Large | small precision...
Benchmarking erlang with input Medium | large precision...
Benchmarking erlang with input Medium | med precision...
Benchmarking erlang with input Medium | small precision...
Benchmarking erlang with input Trivial | large precision...
Benchmarking erlang with input Trivial | med precision...
Benchmarking erlang with input Trivial | small precision...

##### With input Large | large precision #####
Name               ips        average  deviation         median         99th %
erlang        203.10 K        4.92 μs   ±379.86%           4 μs          13 μs
built in      136.72 K        7.31 μs   ±621.54%           6 μs          17 μs

Comparison: 
erlang        203.10 K
built in      136.72 K - 1.49x slower

##### With input Large | med precision #####
Name               ips        average  deviation         median         99th %
erlang        238.65 K        4.19 μs   ±508.17%           4 μs           9 μs
built in      176.05 K        5.68 μs   ±315.60%           5 μs          11 μs

Comparison: 
erlang        238.65 K
built in      176.05 K - 1.36x slower

##### With input Large | small precision #####
Name               ips        average  deviation         median         99th %
erlang        259.83 K        3.85 μs   ±884.75%           3 μs          11 μs
built in      213.63 K        4.68 μs   ±557.26%           4 μs           9 μs

Comparison: 
erlang        259.83 K
built in      213.63 K - 1.22x slower

##### With input Medium | large precision #####
Name               ips        average  deviation         median         99th %
erlang        141.67 K        7.06 μs  ±1693.57%           4 μs          30 μs
built in      135.84 K        7.36 μs   ±381.06%           6 μs          22 μs

Comparison: 
erlang        141.67 K
built in      135.84 K - 1.04x slower

##### With input Medium | med precision #####
Name               ips        average  deviation         median         99th %
erlang        229.73 K        4.35 μs   ±895.45%           4 μs          14 μs
built in      162.83 K        6.14 μs   ±297.60%           6 μs          13 μs

Comparison: 
erlang        229.73 K
built in      162.83 K - 1.41x slower

##### With input Medium | small precision #####
Name               ips        average  deviation         median         99th %
erlang        247.05 K        4.05 μs   ±883.74%           3 μs          12 μs
built in      201.51 K        4.96 μs   ±372.99%           5 μs          10 μs

Comparison: 
erlang        247.05 K
built in      201.51 K - 1.23x slower

##### With input Trivial | large precision #####
Name               ips        average  deviation         median         99th %
erlang        203.42 K        4.92 μs   ±602.69%           4 μs          15 μs
built in      166.37 K        6.01 μs   ±372.15%           6 μs          12 μs

Comparison: 
erlang        203.42 K
built in      166.37 K - 1.22x slower

##### With input Trivial | med precision #####
Name               ips        average  deviation         median         99th %
erlang        244.21 K        4.09 μs   ±765.31%           3 μs          12 μs
built in      164.72 K        6.07 μs   ±427.71%           5 μs          16 μs

Comparison: 
erlang        244.21 K
built in      164.72 K - 1.48x slower

##### With input Trivial | small precision #####
Name               ips        average  deviation         median         99th %
erlang        241.01 K        4.15 μs   ±987.70%           3 μs          14 μs
built in      140.80 K        7.10 μs  ±1761.24%           4 μs          17 μs

Comparison: 
erlang        241.01 K
built in      140.80 K - 1.71x slower

I remember that when I lastly brought up this topic @josevalim mentioned some drawbacks of this method, but now I cannot recall them (and IIRC it was on IRC which is currently unlogged).

I have also written stream_data test with some manual “edge cases” I could think of but it have showed no difference between these two implementations. Have I missed something?

defmodule Bench.FloatTest do
  use ExUnit.Case
  use ExUnitProperties

  cases = [
    {0.1, 0},
    {0.4, 0},
    {0.5, 0},
    {0.9, 0},
    {0.49, 0},
    {0.0, 0},
    {0.01, 1},
    {0.04, 1},
    {0.05, 1},
    {0.09, 1},
    {0.049, 1},
    {0.00, 1},
    {0.0000000000000001, 15},
    {0.0000000000000004, 15},
    {0.00000000000000049, 15},
    {0.0000000000000005, 15},
    {0.0000000000000009, 15},
    {0.00000000000000001, 15},
    {0.00000000000000004, 15},
    {0.000000000000000049, 15},
    {0.00000000000000005, 15},
    {0.00000000000000009, 15}
  ]

  for {f, p} <- cases do
    test "return the same as built in for #{f} and -#{f} with precision #{p}" do
      f = unquote(f)
      p = unquote(p)
      assert Float.round(f, p) == Bench.Float.round(f, p)
      assert Float.round(-f, p) == Bench.Float.round(-f, p)
    end
  end

  property "Float.round and Bench.Float.round works in the same way" do
    check all f <- float(),
      p <- integer(0..15)
    do
      assert Float.round(f, p) == Bench.Float.round(f, p)
    end
  end
end

There is a very long discussion in the issues tracker. You can start here.

I am ok with changing the implementation but before we would need to carefully study which algorithm io_lib format uses and guarantee they will always return the same results and have the same rounding properties. Having the empirical validation with property based testing is a great first step, but, as we can see in the discussion linked above, I would like a theoretical confirmation too.

Another idea is to profile the current implementation and try to further optimize it.

8 Likes

I mean, if there is difference between Elixir’s Float.round/2 and Erlang’s io_lib:format/2 then I think it mean that one of these implementations is wrong. So even if the Erlang implementation is wrong then it should be fixed.

And to be honest, Erlang implementation is referencing one of the papers you have linked as the source of the implementation.

As the linked discussion says at some point (I know, it is long :P), there are two forms of conversions and one is preferred by IEEE, but the other is also valid. So if it happens they are using different forms, we can’t say the other one is wrong. I am 100% sure though that Elixir is using the preferred by IEEE. I can’t say about the Erlang one as I have not checked it.

1 Like

From my investigation in the Erlang source code it seems that io_lib_format:float_data/1 calls float_to_list/1 and then just check if the last number in the precision is equal or greater to 5. With that revelation I have improved my code by using float_to_list/2 with second parameter decimals to specify precision, I have done as the same for float_to_binary/2 as well:

defmodule Bench.Float do
  defmodule IoLib do
    def round(float, 0) when is_float(float), do: :erlang.round(float) |> :erlang.float()
    def round(float, precision) when is_float(float) and precision in 1..15 do
      str = :io_lib.format('~.*f', [precision, float])

      :erlang.list_to_float(str)
    end
  end

  defmodule ToBin do
    def round(float, 0) when is_float(float), do: :erlang.round(float) |> :erlang.float()
    def round(float, precision) when is_float(float) and precision in 1..15 do
      str = :erlang.float_to_binary(float, [decimals: precision])

      :erlang.binary_to_float(str)
    end
  end

  defmodule ToList do
    def round(float, 0) when is_float(float), do: :erlang.round(float) |> :erlang.float()
    def round(float, precision) when is_float(float) and precision in 1..15 do
      str = :erlang.float_to_list(float, [decimals: precision])

      :erlang.list_to_float(str)
    end
  end
end

Benches:

small = 5.55
med = :rand.uniform |> Float.round(10)
large = :rand.uniform * 10000

inputs = %{
  "Trivial | small precision" => {small, 1},
  "Medium | small precision" => {med, 1},
  "Large | small precision" => {large, 1},
  "Trivial | med precision" => {small, 5},
  "Medium | med precision" => {med, 5},
  "Large | med precision" => {large, 5},
  "Trivial | large precision" => {small, 12},
  "Medium | large precision" => {med, 12},
  "Large | large precision" => {large, 12},
}

Benchee.run(%{
  "built in" => fn {f, p} -> for _ <- 1..1000, do: Float.round(f, p) end,
  "io_lib" => fn {f, p} -> for _ <- 1..1000, do: Bench.Float.IoLib.round(f, p) end,
  "to binary" => fn {f, p} -> for _ <- 1..1000, do: Bench.Float.ToBin.round(f, p) end,
  "to list" => fn {f, p} -> for _ <- 1..1000, do: Bench.Float.ToList.round(f, p) end,
}, inputs: inputs)

Results:

Operating System: macOS"
CPU Information: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Number of Available Cores: 4
Available memory: 8 GB
Elixir 1.7.3
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: Large | large precision, Large | med precision, Large | small precision, Medium | large precision, Medium | med precision, Medium | small precision, Trivial | large precision, Trivial | med precision, Trivial | small precision
Estimated total run time: 4.20 min


Benchmarking built in with input Large | large precision...
Benchmarking built in with input Large | med precision...
Benchmarking built in with input Large | small precision...
Benchmarking built in with input Medium | large precision...
Benchmarking built in with input Medium | med precision...
Benchmarking built in with input Medium | small precision...
Benchmarking built in with input Trivial | large precision...
Benchmarking built in with input Trivial | med precision...
Benchmarking built in with input Trivial | small precision...
Benchmarking io_lib with input Large | large precision...
Benchmarking io_lib with input Large | med precision...
Benchmarking io_lib with input Large | small precision...
Benchmarking io_lib with input Medium | large precision...
Benchmarking io_lib with input Medium | med precision...
Benchmarking io_lib with input Medium | small precision...
Benchmarking io_lib with input Trivial | large precision...
Benchmarking io_lib with input Trivial | med precision...
Benchmarking io_lib with input Trivial | small precision...
Benchmarking to binary with input Large | large precision...
Benchmarking to binary with input Large | med precision...
Benchmarking to binary with input Large | small precision...
Benchmarking to binary with input Medium | large precision...
Benchmarking to binary with input Medium | med precision...
Benchmarking to binary with input Medium | small precision...
Benchmarking to binary with input Trivial | large precision...
Benchmarking to binary with input Trivial | med precision...
Benchmarking to binary with input Trivial | small precision...
Benchmarking to list with input Large | large precision...
Benchmarking to list with input Large | med precision...
Benchmarking to list with input Large | small precision...
Benchmarking to list with input Medium | large precision...
Benchmarking to list with input Medium | med precision...
Benchmarking to list with input Medium | small precision...
Benchmarking to list with input Trivial | large precision...
Benchmarking to list with input Trivial | med precision...
Benchmarking to list with input Trivial | small precision...

##### With input Large | large precision #####
Name                ips        average  deviation         median         99th %
to binary        2.71 K        0.37 ms    ±28.20%        0.35 ms        0.68 ms
to list          2.21 K        0.45 ms    ±19.05%        0.42 ms        0.80 ms
io_lib           0.22 K        4.49 ms    ±10.90%        4.33 ms        6.48 ms
built in        0.144 K        6.96 ms    ±19.78%        6.55 ms       12.57 ms

Comparison: 
to binary        2.71 K
to list          2.21 K - 1.23x slower
io_lib           0.22 K - 12.17x slower
built in        0.144 K - 18.88x slower

##### With input Large | med precision #####
Name                ips        average  deviation         median         99th %
to binary        3.18 K        0.31 ms    ±17.14%        0.30 ms        0.60 ms
to list          2.64 K        0.38 ms    ±20.37%        0.35 ms        0.66 ms
io_lib           0.26 K        3.82 ms    ±10.58%        3.69 ms        5.44 ms
built in        0.161 K        6.21 ms    ±23.81%        5.54 ms       11.91 ms

Comparison: 
to binary        3.18 K
to list          2.64 K - 1.20x slower
io_lib           0.26 K - 12.15x slower
built in        0.161 K - 19.76x slower

##### With input Large | small precision #####
Name                ips        average  deviation         median         99th %
to binary        3.46 K        0.29 ms    ±30.02%        0.27 ms        0.56 ms
to list          3.09 K        0.32 ms    ±17.33%        0.31 ms        0.54 ms
io_lib           0.29 K        3.46 ms    ±11.45%        3.32 ms        5.05 ms
built in         0.21 K        4.75 ms     ±9.13%        4.59 ms        6.82 ms

Comparison: 
to binary        3.46 K
to list          3.09 K - 1.12x slower
io_lib           0.29 K - 11.98x slower
built in         0.21 K - 16.42x slower

##### With input Medium | large precision #####
Name                ips        average  deviation         median         99th %
to binary        2.92 K        0.34 ms    ±10.05%        0.33 ms        0.52 ms
to list          2.18 K        0.46 ms    ±20.18%        0.42 ms        0.81 ms
io_lib           0.23 K        4.41 ms    ±10.39%        4.28 ms        6.17 ms
built in        0.160 K        6.25 ms    ±10.73%        6.02 ms        9.04 ms

Comparison: 
to binary        2.92 K
to list          2.18 K - 1.34x slower
io_lib           0.23 K - 12.88x slower
built in        0.160 K - 18.27x slower

##### With input Medium | med precision #####
Name                ips        average  deviation         median         99th %
to binary        3.36 K        0.30 ms     ±9.91%        0.29 ms        0.45 ms
to list          2.63 K        0.38 ms    ±23.31%        0.34 ms        0.73 ms
io_lib           0.27 K        3.65 ms    ±12.10%        3.51 ms        5.63 ms
built in        0.176 K        5.67 ms     ±9.51%        5.48 ms        7.84 ms

Comparison: 
to binary        3.36 K
to list          2.63 K - 1.28x slower
io_lib           0.27 K - 12.26x slower
built in        0.176 K - 19.05x slower

##### With input Medium | small precision #####
Name                ips        average  deviation         median         99th %
to binary        3.69 K        0.27 ms    ±16.21%        0.25 ms        0.47 ms
to list          3.68 K        0.27 ms    ±23.09%        0.25 ms        0.54 ms
io_lib           0.31 K        3.28 ms    ±11.35%        3.14 ms        4.75 ms
built in         0.30 K        3.29 ms    ±11.16%        3.16 ms        4.77 ms

Comparison: 
to binary        3.69 K
to list          3.68 K - 1.00x slower
io_lib           0.31 K - 12.09x slower
built in         0.30 K - 12.12x slower

##### With input Trivial | large precision #####
Name                ips        average  deviation         median         99th %
to binary        3.02 K        0.33 ms     ±9.39%        0.32 ms        0.50 ms
to list          2.17 K        0.46 ms    ±31.21%        0.41 ms        0.94 ms
io_lib           0.24 K        4.15 ms    ±10.24%        4.02 ms        6.03 ms
built in        0.168 K        5.97 ms    ±11.75%        5.70 ms        9.58 ms

Comparison: 
to binary        3.02 K
to list          2.17 K - 1.39x slower
io_lib           0.24 K - 12.54x slower
built in        0.168 K - 18.02x slower

##### With input Trivial | med precision #####
Name                ips        average  deviation         median         99th %
to binary        3.41 K        0.29 ms     ±9.15%        0.29 ms        0.44 ms
to list          2.68 K        0.37 ms    ±20.16%        0.35 ms        0.66 ms
io_lib           0.28 K        3.58 ms    ±15.74%        3.42 ms        5.58 ms
built in        0.187 K        5.34 ms     ±8.14%        5.19 ms        7.20 ms

Comparison: 
to binary        3.41 K
to list          2.68 K - 1.27x slower
io_lib           0.28 K - 12.22x slower
built in        0.187 K - 18.23x slower

##### With input Trivial | small precision #####
Name                ips        average  deviation         median         99th %
to binary        3.78 K        0.26 ms     ±9.40%        0.26 ms        0.40 ms
to list          3.73 K        0.27 ms    ±16.61%        0.25 ms        0.46 ms
io_lib           0.31 K        3.20 ms    ±27.41%        3.02 ms        5.78 ms
built in         0.24 K        4.14 ms     ±9.88%        4.00 ms        5.82 ms

Comparison: 
to binary        3.78 K
to list          3.73 K - 1.01x slower
io_lib           0.31 K - 12.07x slower
built in         0.24 K - 15.63x slower

So it seems that float_to_binary/2 is the fastest solution while manual tests and property testing still do not show any behaviour difference, at least on macOS.

About the algorithm:

  • as I said, all Erlang solutions are in the end calling float_to_* functions
  • these functions are NIFs which in the end are defined here
  • this in the end calls erts_printf_format (through erts_snprintf

So it seems that used algorithm depends on used libc as standard do not specify how output should behave (C99, C11, C18). However as far as I correctly understand IEEE 754:2008 5.12.2 there should be only one way to represent binary floating-point (binary64 to be exact in case of the Erlang) as an external format:

  • Conversions from a supported binary format bf to an external character sequence and back again results in a copy of the original number [emphasis mine] so long as there are at least Pmin (bf ) significant digits specified and the rounding-direction attributes in effect during the two conversions are round to nearest rounding-direction attributes.

So in this case incorrect results, the bug should be sent to the libc maintainers instead of Erlang/Elixir maintainers.

Am I wrong somewhere?

1 Like