Defining custom generators with PropCheck

alfert · July 25, 2016, 2:27pm

I do not know ExCheck in any relevant detail. But - if appropriate - you could take a look into PropCheck, which is a shiny new library around PropEr. Both systems (ExCheck and PropCheck) are providing property based testing inspired by QuickCheck.

PropCheck provides different means to define custom generators, but it is very common to do so and you find a host of functions and macros for that purpose, including examples in the tests.

I tested triq some years ago and had similar problems during shrinking (literately taking infinite time). This was my reason to change to PropEr for Erlang code (and developed PropCheck as a layer around PropEr in the last couple of weeks).

HTH Klaus.

svarlet · July 26, 2016, 5:09pm

Thanks @alfert, I will look into it.

This is the link for anyone else reading https://github.com/alfert/propcheck

svarlet · August 5, 2016, 4:08pm

Hello @alfert,

Would you mind telling me why:

when_fail doesn’t display anything although my test is failing ?
why aggregate doesn’t display/do anything either ?

The source file:

defmodule MyProject.Leaderboard do
  use GenServer

  alias MyProject.Submission

  @initial_state []

  #
  # CLIENT API
  #

  def start_link(name) do
    GenServer.start_link(__MODULE__, @initial_state, name: name)
  end

  def at(leaderboard, index) do
    GenServer.call(leaderboard, {:at, index})
  end

  def record(leaderboard, submission) do
    GenServer.cast(leaderboard, {:record, submission})
  end

  def capacity do
    Application.get_env(:myproject, :leaderboard_capacity)
  end

  #
  # SERVER API
  #

  def init(state) do
    {:ok, state}
  end

  def handle_call({:at, index}, _, submissions) do
    {:reply, Enum.at(submissions, index), submissions}
  end

  def handle_cast({:record, submission}, submissions) do
    new_submissions =
      submissions
      |> insert(submission)
      |> Enum.take(capacity)
    {:noreply, new_submissions}
  end

  defp insert([], a_submission) do
    [a_submission]
  end

  defp insert([sub | rest] = submissions, a_submission) do
    case Submission.compare(a_submission, sub) do
      :lt -> [sub | insert(rest, a_submission)]
      :eq -> [sub | insert(rest, a_submission)]
      :gt -> [a_submission | submissions]
    end
  end
end

The test file:

defmodule MyProject.LeaderboardProperty do
  use ExUnit.Case, async: true
  use PropCheck
  use PropCheck.StateM

  alias MyProject.Submission

  @usernames ~w(Sebastien Hugo Darren Paul Daniel)

  defp name_gen, do: elements(@usernames)

  defp score_gen, do: nat

  defp word_gen, do: binary(20)

  defp index_gen, do: nat

  defp submission_gen do
    let [name, score, word] <- [name_gen, score_gen, word_gen] do
      %Submission{username: name, word: word, score: score}
    end
  end

  property "leaderboard is rock solid" do
    forall cmds <- commands(__MODULE__) do
      trap_exit do
        Leaderboard.start_link(:sut)
        {history, state, result} = run_commands(__MODULE__, cmds)
        Leaderboard.stop(:sut)

	(result == :ok)
          |> when_fail(
                              IO.puts """
					History: #{inspect history, pretty: true}
					State: #{inspect state, pretty: true}
					Result: #{inspect result, pretty: true}
			      """)
        |> aggregate(command_names cmds)

      end
    end
  end

  #
  # COMMANDS, PRE/POST-CONDITIONS
  #

  defstruct submissions: []

  def initial_state, do: %__MODULE__{}

  def command(_state) do
    frequency([{1, {:call, Leaderboard, :at, [:sut, index_gen]}},
               {1, {:call, Leaderboard, :record, [:sut, submission_gen]}},
               {1, {:call, Leaderboard, :capacity, []}}
					    ])
  end

  def precondition(state, {:call, Leaderboard, :at, [:sut, index]}) do
    Enum.count(state.submissions) > index
  end

  def precondition(_state, _call) do
    true
  end

  def next_state(state_before, _result, {:call, Leaderboard, :record, [:sut, a_submission]}) do
    %__MODULE__{state_before | submissions: do_insert(state_before.submissions, a_submission)}
  end

  def next_state(state, _result, _call) do
    state
  end

  defp do_insert([], a_submission) do
    [a_submission]
  end

  defp do_insert([head | tail] = submissions, a_submission) do
    cond do
      head.score < a_submission.score -> [a_submission | submissions]
      head.score >= a_submission.score -> [head | do_insert(tail, a_submission)]
    end
  end

  def postcondition(state, {:call, Leaderboard, :at, [:sut, index]}, result) do
    case index < Enum.count(state.submissions) do
      true -> result != nil
      false -> result == nil
    end
  end

  def postcondition(_state, {:call, Leaderboard, :record, [:sut, _a_submission]}, result) do
    result == :ok
  end

  def postcondition(_state, {:call, Leaderboard, :capacity, []}, result) do
    result == Application.get_env(:myproject, :leaderboard_capacity)
  end
end

The output:

...............

  1) property leaderboard is rock solid (MyProject.LeaderboardProperties)
     test/my_project/leaderboard_property.exs:24
     Property leaderboard is rock solid failed. Counter-Example is:
     [[]]

     code: nil
     stacktrace:
       test/myproject/leaderboard_property.exs:24: (test)



Finished in 0.3 seconds
1 property, 15 tests, 1 failure

Randomized with seed 340578

Thanks

alfert · August 8, 2016, 8:32pm

Hi @svarlet,

this is an apparently not to well documented feature of the property macro. The default behavior is to be :quiet, but if you set it to :verbose, you should see enough information. Change your property to:

property "leaderboard is rock solid", [:verbose] do
  forall ...
end

An example is shown in test code master_statem_test.exs and pingpong_statem_test.exs. If you want to use the aggregate or similar functions, you need to execute the property in :verbose mode. This is a feature of PropEr, because the output of aggregate etc is controlled by PropEr.

Hope that helps,
Klaus.

svarlet · August 9, 2016, 1:01pm

Thanks, that helped me to fix it! Good to know.

Also, in ExUnit, if you pass the context to your test, you can access the test name:

test "xxxxxx", context do
  context.name # the test name!
  ...
end

It’s quite handy if you want to start a GenServer and assign it a name. Is there such a thing in PropCheck ?

alfert · August 12, 2016, 11:17am

yes, it is - but I never tried. The context is the third (optional) parameter of a property and is fed unchanged into the underlying ExUnit unit test. It should work similar as in your example:

property "xxxxx", [:verbose], context do
  conext.name # the property name
  ....
end

I will update the documentation with more examples for these cases.

svarlet · August 12, 2016, 5:27pm

Great !

Also, I struggle a lot to understand how the statistics gathering work, would you mind explaining or pointing me to documented examples ?

For example, this is a sample from your repository:

        (result == :ok)
        |> when_fail(
            IO.puts """
            History: #{inspect history, pretty: true}
            State: #{inspect state, pretty: true}
            Result: #{inspect result, pretty: true}
            """)
        |> aggregate(command_names cmds)
        |> collect(length cmds)

I can’t find what is the return value of when_fail therefore I’m not sure what aggregate takes as a first parameter.
aren’t aggregate and collect similar ? what about classify and measure ?

Also

what’s the difference between the utf8 generator and the binary generator ? aren’t all strings utf8 in elixir ?
is an utf8 generator only generating empty and non empty strings ? or is it sometimes generating nil strings ? (same question about lists ?)
if they are not generating nil values, how should I enable this ? my guess is to go with oneOf(nil, binary())…
it looks like the shrinking strategy is defined by the generator itself. If I make my own Person struct generator (let’s say it generates a struct of type Person with a random firstname and a random lastname) how will it shrink ?
Is it going to shrink the firstname until it can’t shrink it anymore and then start shrinking the lastname ?
Is it going to shrink firstname and lastname simulatenously ?
ultimately, would it shrink the struct to a nil value ?

I’m getting a much better understanding of property based testing thanks to your library and support, cheers @alfert !!

alfert · August 13, 2016, 8:52pm

Some answers to your questions:

aggregate and the like take a value of type test as first parameter. This can something simple such as a boolean expression or something wrapped by PropEr - which is what aggregate and the like do. PropEr calls the variations sometimes outer_test, cooked_test and inner_test.
measure provides statistics (such as mean, min and max) suitable for number values (e.g. lengths of lists) whereas the other statistics work on arbitrary values and provide histograms (i.e. counts of matched categories).
when_fail wraps the test, it first parameter, and returns it. Otherwise the pipeline wouldn’t work.
Elixir binaries are sequences of byte values. Strings in Elixir are those binaries, which follow the utf8 encoding (this is different to Erlang where Strings are lists of (utf8) characters). So, man y binaries are utf8 strings, but not all.
nil values do not fit to binaries, since they are not lists. Shrinking of binaries goes towards the empty binary, i.e. <<>>. If you require nil and binaries, the oneof(nil, binary) is a good way to go.

Shrinking or modeling of structs is another topic. Currently, maps are not supported in PropEr natively (see here: https://github.com/manopapad/proper/releases/tag/v1.2 where PropEr 2.0 version is vaguely announced) and I did not implemented my own generator. A few thoughts for discussion on this:

you can take a look into the tuple construction which is quite similar (remember, records are tuples of a special form and maps are a nicer version of records). So, at least construction and shrinking strategies can be copied.
Considering the advice of jlouis and others (https://medium.com/@jlouis666/quickcheck-advice-c357efb4e7e6#.wdaup7m8b), it may be enough to use a much simpler model (look into the safetyvalve example to get the idea). But this certainly depends on your API and if you need a struct here, this might be a problem.
you might emulate a struct by using a tuple (or a single value) in the forall part and then use a simple function or expression to create the struct:
```
  forall {f <- utf8, l <- utf8} do 
      p = %Person{firstname: f, lastname: l}
      is_valid_person(p)
  end  
```

Take a look into the tree example to see more complex generators with tuples using let_shrink to define growth and shrinking strategies. But my general impression is that you either test rather simply-typed basic functions or you need to test a system with state. In the former case, I doubt that shrinking a person to something with shorter names is a useful strategy to find bugs in your program. In the latter case you need to find a proper model of that system state (which is always simpler than the original one) for testing interesting and relevant properties of the system. Take a look into finite state machines in this case.

svarlet · June 14, 2017, 3:25pm

Hello @alfert,

Is there a way to generate a sample from a generator in iex to observe the generated data?

alfert · June 15, 2017, 6:40am

Hi @svarlet,
this is a feature of the underlying Proper Erlang application, which is not yet ported to PropCheck, but it is a good suggestion.

You can call :proper_gen.sample and also :proper_gen.sampleshrink. For documentation, look into proper_gen in http://proper.softlab.ntua.gr/doc/

Here are simple examples:

:proper_gen.sample(PropCheck.BasicTypes.integer)                        
7
13
-38
10
3
9
7
-16
-311
-79
121
:ok
iex(7)> :proper_gen.sampleshrink(PropCheck.BasicTypes.integer)
6
0
:ok

Sampling lists works the same way:

:proper_gen.sampleshrink(PropCheck.BasicTypes.list(PropCheck.BasicTypes.integer))
[14,6,-13,19,-9]
[14,6,-13]
[6,-13]
[-13]
[]
:ok

And you can use it with you own custom generators as well.

svarlet · June 25, 2017, 8:13pm

Hi @alfert,

Before I shoot my question, is there a better canal to send you questions about PropCheck? this thread started with ExCheck so that may not be the best place to talk.

Anyway, I was wondering if you could tell me what happens when I write a test like this:

property "something" do
  forall i <- integer(1, 5) do
    forall j <- integer(i, 5) do
      # something using i and j
    end
  end
end

is there a difference with this:

def pair_of_ordered_ints_below_5() do
  let i <- integer(1, 5) do
    let j <- integer(i, 5) do
      {i, j}
    end
  end
end

property "something" do
  forall {i, j} <- pair_of_ordered_ints_below_5() do
    # something with i and j
  end
end

It seems that the generated data will be the similar. Will the be a difference in the number of tests? in the algorithmic complexity? anything else?

alfert · June 26, 2017, 8:10pm

To be honest, I am only convinced that their is no difference, but I am not entirely sure. As far as I understand proper (which is still the machinery behind PropCheck), the only potential effect I can think of is that shrinking a tuple might be different from shrinking two nested integers. I would guess that the inner integer is shrinked first, whereas in the tuple case the first integer is shrinked first.

Did you try out the shrinking? That could give you an hint. Otherwise I would not expect any difference.

Regarding the thread name, @AstonJ, can we change the thread name some how? Or after the third post, i.e. when the questions concerning PropCheck begun?

Thanks,
Klaus.

AstonJ · June 26, 2017, 9:44pm

Done

svarlet · June 27, 2017, 9:44am

@alfert I didn’t try the shrinking. I noticed a difference in the reporting of errors though. When a counter example is found, only the output of the wrapping generator is shown.

For example:

forall i <- integer(1, 2) do
  forall j <- integer(100, 1000) do
    false
  end
end

That will only show the value of i.

svarlet · July 5, 2017, 5:59pm

Hi @alfert,

Can you recommend a way to generate a sublist of N unique elements from a list?

I have a couple of ideas but none seem to be very efficient. For example, to generate 8 unique numbers between 1 and 104:

Solution 1

such_that numbers <- vector(8, integer(1, 104)), when: uniques?(numbers) #define the uniques?/1 helper too

Solution 2
Make a recursive generator that generate a number N0 from 1 to (104 - 8), then generate N1 from (N0 + 1) to (104 - 7), then generate N2 from (N1 + 1) to (104 - 6), etc…
My problem with this one:

it seems hard to write such a generator
it is a bit complex
it skews the generated numbers toward the higher range and is unlikely to be uniform

Solution 3
This is my favorite but I’m not sure how to write it as a generator:

def uniques_numbers() do
  1..104
  |> Enum.shuffle()
  |> Enum.take(8)
end

svarlet · July 5, 2017, 6:11pm

Just tried this in the repl and it worked, would be happy to read your opinion though:

def ints_gen(quantity) do
  let all_ints <- exactly(Enum.to_list(1..104)) do
    all_ints
    |> Enum.shuffle()
    |> Enum.take(quantity)
  end
end

Thoughts?

alfert · July 6, 2017, 9:19pm

Looks good for me.

I thought about generating a list of length quantity, feed it into List.unique and check that the size is still quantity. But your approach is better since the generator always constructs a valid sequence (I would use such_that, requiring retries which may fail the entire generation if to many misses are constructed). In addition you can add parameters for the range easily to make the generator more useful.