Decimal bug - please be careful with negative coefficients - fix is on pull request

Hi everybody,

I found a bug or structural problem in Decimal what could have some big impact:
wrong calculation (maybe with money!!!) or even infinite loop ;-(

My suggestion for a fix is on pull request:

Does anybody have another idea - there are some possible ways to handle validation for structs - but what would you suggest is the best way?
Any suggestions are welcome! :wink:

Here are one solution example for division:

defmodule Decimal do
  @type coefficient :: non_neg_integer | :NaN | :inf
  @type exponent :: integer
  @type sign :: 1 | -1

  @type t :: %__MODULE__{
          sign: sign,
          coef: coefficient,
          exp: exponent
        }
  defstruct sign: 1, coef: 0, exp: 0

...

  def div(%Decimal{coef: coef1}, %Decimal{})
      when coef1 < 0,
      do: error(:invalid_operation, "dividend (#{coef1}) must be > 0", %Decimal{coef: :NaN})

  def div(%Decimal{}, %Decimal{coef: coef2})
      when coef2 < 0,
      do: error(:invalid_operation, "divisor (#{coef2}) must be > 0", %Decimal{coef: :NaN})

the orig pull request text - if don’t want to follow the link:

Hi,

I found a bug or structural problem in Decimal what could have some big impact under special circumstances. Here an example from the Money lib where Decimal is used as main backbone:


iex(15)> m = Money.new(:USD, Decimal.new(%Decimal{sign: -1, coef: -12000000000}))

#Money<:USD, --12000000000>

iex(16)> Money.mult(m, 3)

{:ok, #Money<:USD, --36000000000>}

iex(17)> Money.mult!(m, 3 )|> Money.to_string

{:ok, "$--*,000,000,000.00"}

Of course it’s a special edge case … normally the best practice is:


iex(18)> m = Money.new(:USD, "-12000000000")

#Money<:USD, -12000000000>

but you never know … maybe someone using it …it’s possible! it’s valid Code ! NO compiler warning! NO runtime error!

And the worst case, if you are using any division with this kind of Decimal number like this:


iex(19)> m = Money.new(:USD, Decimal.new(%Decimal{sign: -1, coef: -12000000000}))

#Money<:USD, --12000000000>

iex(20)> Money.mult!(m, 3) |> Money.to_decimal |> Decimal.to_float

-> you end up in an infinite loop!!!

The reason for this is clear: the %Decimal{} struct - with no validation for input -> so you can input a negative coefficient.

The type spec and docu are perfect - but do not prevent any misuse:

Type spec : @type coefficient :: non_neg_integer | :NaN | :inf

Documentation: The coefficient of the power of 10. Non-negative because the sign is stored separately in sign.

So my suggestion for a solution you can find in this pull request: mostly checks for this special edge case.

And this time I included the corresponding tests. :wink:

Kind regards and I hope it helps anybody to use Decimal in a safer way!

Ralph

2 Likes

I think the intended use here and for most structs is to not construct them yourself with raw internals but to use the library as the authority on validating/constructing… in this case, Decimal.new handles this by setting coef to an absolute value https://github.com/ericmj/decimal/blob/master/lib/decimal.ex#L1086

I’ll let @ericmj or other maintainers decide if they’re also interested in this additional safeguard, and I’ve been burned by constructing a struct incorrectly myself like this before, but, worth knowing as a general rule in elixir.

5 Likes

The coefficient type is clearly typed as non negative integer. Even though I’d not necessarily expect the library to crash on malformed data, I’d at least not expect it to behave correctly.

So any PR “fixing” this should do so by crashing because of function clause errors or badarg at least.

6 Likes

yes - the fix does exact that - anyway I was interested on your opinions and ideas. Thanks.

Maybe the Norm lib:

Nitpick: that specific code

defmodule Derp do
  @spec bad_decimal() :: Decimal.t()
  def bad_decimal() do
    Decimal.new(%Decimal{sign: -1, coef: -12000000000})
  end
end

will be detected by Dialyzer:

lib/derp.ex:3:no_return
Function bad_decimal/0 has no local return.
________________________________________________________________________________
lib/derp.ex:4:call
The function call will not succeed.

Decimal.new(%Decimal{:coef => -12_000_000_000, :exp => 0, :sign => -1})

breaks the contract
(decimal()) :: t()

because Dialyzer can be 100% certain that -12_000_000_000 is not a non_neg_integer().

However, using a variable gets it to pass:

defmodule Derp do
  @spec bad_decimal(pos_integer()) :: Decimal.t()
  def bad_decimal(n) do
    Decimal.new(%Decimal{sign: -1, coef: -n})
  end
end

BUT I’m not sure how far library code should go to defend its invariants against intentional manipulation.

For instance, it’s possible to produce a MapSet with strange behavior at runtime:

iex(28)> x = MapSet.new([:a, :b])                        
#MapSet<[:a, :b]>

# NOTE: writing this will trigger a Dialyzer error because `MapSet` is opaque, but works at runtime
iex(29)> y = %MapSet{map: %{c: "wat"}}
#MapSet<[:c]>
iex(30)> z = MapSet.union(x, y)       
#MapSet<[:a, :b, :c]>
iex(31)> z2 = MapSet.new([:a, :b, :c])
#MapSet<[:a, :b, :c]>
iex(32)> MapSet.equal?(z, z2)         
false

but that code will always cause a Dialyzer error because of the @opaque setting on MapSet.

There are some gotchas with @opaque, notably in pattern matching and module attributes which may explain why it isn’t used in Decimal.

7 Likes

Yes exactly. Elixir is dynamically typed. If you abuse that to create malformed types they won’t work properly. This is true of literally every struct.

5 Likes

Thanks all for commenting.

@benwilson512 : Of course! I know. But does that mean we should not fix error edge cases? -> I mean NO. And I think you agree with me and if we can we should make it easier to use and more safe for everybody.

As NobbZ wrote : So any PR “fixing” this should do so by crashing because of function clause errors or bad arg at least.

And thats exactly what it does. So back to the focus :

I think too the best bevavior for the existing Decimal lib and all the depending libs are: let it crash if wrong inputs:

  # wrong sign
  assert_raise Decimal.Error, fn ->
    Decimal.new(%Decimal{sign: 2, coef: 1, exp: 2})
  end

  # wrong coef
  assert_raise Decimal.Error, fn ->
    Decimal.new(%Decimal{sign: 1, coef: -1, exp: 2})
  end
iex(1)> Decimal.new(%Decimal{sign: 1, coef: 1, exp: 2})  
#Decimal<1E+2>
iex(2)> Decimal.new(%Decimal{sign: 2, coef: 1, exp: 2})
** (Decimal.Error) : wrong decimal number: (%Inspect.Error{message: "got ArgumentError with message \"argument error\" while inspecting %{__struct__: Decimal, coef: 1, exp: 2, sign: 2}"}) 
    (decimal 2.0.0-rc.0) lib/decimal.ex:1151: Decimal.new/1
iex(3)> Decimal.new(%Decimal{sign: 1, coef: -1, exp: 2})
** (Decimal.Error) : wrong decimal number: (%Inspect.Error{message: "got ArgumentError with message \"argument error\" while inspecting %{__struct__: Decimal, coef: -1, exp: 2, sign: 1}"})
    (decimal 2.0.0-rc.0) lib/decimal.ex:1151: Decimal.new/1 ```

all other wrong cases are handled by function clauses

So I think better then the current behavior with all the consequences of prolonging the error (see above):

iex(1)> Decimal.new(%Decimal{sign: 1, coef: 1, exp: 2})  
#Decimal<1E+2>
iex(2)> Decimal.new(%Decimal{sign: 2, coef: 1, exp: 2})
#Decimal<1E+2>
iex(3)> Decimal.new(%Decimal{sign: 1, coef: -1, exp: 2})
#Decimal<-.1E+3>

I think in a dynamically typed language, it is possible for a library to detect many misuses and edge cases at runtime, but there is a runtime cost to doing that. Documenting proper usage and data types allows to lower that runtime cost (assuming the documentation and type specifications are being followed)

3 Likes

yes something like this…

I think as much as possible without breaking the target. And it’s possible in this case.

Thanks for your thoughts!

yes you are right! There is a cost for it. I will try to measure it …

before the fix:

##### With input 30 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int      318.01 K        3.14 μs   ±771.75%           3 μs           5 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int         4.26 KB

**All measurements for memory usage were the same**

##### With input 6 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int      903.01 K        1.11 μs  ±3457.31%           1 μs           2 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int           592 B

**All measurements for memory usage were the same**

##### With input 606 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int       43.65 K       22.91 μs    ±73.77%          22 μs          44 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int         3.73 KB

with the fix:

##### With input 30 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int      316.97 K        3.15 μs   ±659.73%           3 μs           5 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int         4.26 KB

**All measurements for memory usage were the same**

##### With input 6 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int      930.26 K        1.07 μs  ±3092.95%           1 μs           2 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int           592 B

**All measurements for memory usage were the same**

##### With input 606 Digits Integer #####
Name                           ips        average  deviation         median         99th %
Decimal.add()    Int       43.93 K       22.77 μs    ±29.74%          22 μs          39 μs

Memory usage statistics:

Name                    Memory usage
Decimal.add()    Int         3.73 KB

As you can see the cost in this case is not relevant. :wink:

2 Likes

I think the largest cost here isn’t in performance, or even in the code required for this specific PR, but in what this implies about coding strategy. The principle you’re articulating is that every public function on a public API should validate the invariants of the datastructure. I don’t think this is a sound principle. It happens to be reasonably cheap in this case, but that is by no means guaranteed. It also puts a massive burden on library authors.

If you aren’t trying to articulate a general principle, then the question becomes, why enforce these checks here and not elsewhere?

The right pattern is to enforce invariants at the point of documented constructors.

4 Likes

Yes, thank you very much. Theoretically perfect formulated!

But thats the point.

Theoretically we can use Decimal.new as this kind of single entry point and close all other - but thats in the practical case of the current architecture of this Decimal lib not possible - it’s used a lot in other projects (money, ecto, …)- of course change radical everything - maybe possible … I wrote a small side project (for fun) the lib Apa - Arbitrary Precision Artithmetic with a different architecture (tuple vs. struct / integers and strings vs lists)… but thats not the case here … I want to help to make the original Decimal lib more safe! :wink:

by the way … it already does it a lot:

  def add(%Decimal{coef: :inf}, %Decimal{coef: :inf}),
    do: error(:invalid_operation, "adding +Infinity and -Infinity", %Decimal{coef: :NaN})

:wink:

But really I appreciate this discussion - thanks a lot!

Can you clarify why this isn’t possible? The money and Ecto apps should always use either 1) use the safe constructors or 2) take complete responsibility for using the structs wrong.

Those aren’t protections against intentionally broken data structures, that’s traditional app logic. You can generate infinite numbers via the safe public API, and in that case the add function handles it. You’re proposing a more extensive model where the safeguards need to handle not just the domain of valid decimals but the domain of all nonsense decimals too.

3 Likes

Again - theoretically you are absolute right - we could discuss if 0 or NaN or Inf are nonsense too … but no hairsplitting … :wink:

I like your really correct thinking. And if we find both a way for a new solution - wonderful!

What would you suggest?

If I would be radical - as you suggested - I would remove the struct

%Decimal{sign: sign, coef: coef, exp: exp}

and use tuples of arbitrary integers:
{integer(), integer()}
{-12, -3} == -0.012

no limits no restrictions, all possible math numbers are constructible

and an add function for example like this:
@spec add({integer(), integer()}, {integer(), integer()}) ::{integer(), integer()}

composable, piping (of course) and nice performance… :wink:

with well defined entry points …

… but it’s sounds like a dream … I would break a lot of old stuff :wink:

How does this make any difference? An end user can construct hostile tuples in the same way that they can construct hostile or nonsense decimals.

But they aren’t, they’re a well defined value, you just can’t do all the operations to them, constructible via the public constructors.

1 Like