StreamData: data generation and property testing for Elixir

yes, but i want my tests to run on 1.4, too :wink:

Do not include it in your tests for 1.4 then… You can dynamically append the dependency and you can conditionally compile those files, just make sure you have a proper helper that you use.

1 Like

I know i can do that. i don’t want to maintain 2 test suites or keep in mind that some of my tests do not run on Elixir < 1.5.

It’s not a problem, though. I’ll just keep writing my tests as usual and switch to StreamData when it makes more sense for this library.

Thanks so much for all the links, they were very interesting reads! :slight_smile: The Hypothesis approach is indeed both elegant and powerful but as far as I can tell, it’s not more powerful than the StreamData approach (which as I mention in the README is really Clojure’s test.check approach, credit where credit is due :smiley:).

One thing to note: StreamData is not type-based in any way. It just happens that I defined a bunch of type-ish generators because they are useful :slight_smile:.

I’ll write a blog post about it because it’s very interesting IMO, but the gist is:

  • all StreamData generators take seed as the argument and can use that seed to produce whatever they want. This is analogous to the randomness of the byte stream in Hypothesis.
  • a StreamData generator (which is substantially a function) generates a “lazy tree” when generating a value: a lazy tree is basically a tree where the root is realized (that is, eager, not lazy) and the children are a lazy stream of lazy trees. In StreamData, the root is the generated values and the child subtrees are the shrinks of that value. An easy example is integers: int/0 returns a tree where the root might be 4, and the children might be 0, 2, 3 and 2 might have the child 1 and so on. Shrinking becomes a matter of a somewhat fancy depth-first search in this tree. This also solves exactly the problem that the author of Hypothesis highlights in http://hypothesis.works/articles/integrated-shrinking/.
  • The seed that generators accept as argument can be “split”, meaning we can get two seeds from one seed in a deterministic way. So generators just split seeds when they need to pass down the seed. Since everything is deterministic, to “replay” a run you can just pass the same seed (this is already done to integrate with ExUnit where we use ExUnit’s seed).

Last but not least, binary/0 does not have any problem in shrinking both bytes and the length of the binary. There is actually more work done in order to make it shrink like it does today. Whether it’s the right API or not, it’s a different discussion but I’m very much interested in it so please open up an issue in the StreamData repo and we can discuss with everyone else as well :slight_smile:

Hope this made it somehow clearer, as I said I’ll try to put together a blog post because this is interesting and I spent the last months grokking it so I’d be very happy to share the insights!

6 Likes

Sorry I forgot to answer the original 3 questions:

  1. StreamData is pure Elixir, shrinks like Hypothesis/test.check (so the “good” way), it has a chance of getting merged into Elixir core, is integrated with ExUnit
  2. There are many functions in StreamData to create custom generators. The basics are bind/2, map/2, filter/3, but there’s also StreamData.gen all which is syntactic sugar to create really easy to read generators.
  3. The values are completely random but the testing is driven by a “generation size” (mentioned in the docs) which guarantees we start with smaller values which makes the chance of an empty list very very high

Let me know if you have more questions :slight_smile:

6 Likes

Since it is being considered for inclusion in Elixir itself then it should be written in Elixir - so it fits with everything else provided by the language and with a matching license. Also note that we decided to build on top of the Stream functionality we provide as part of Elixir, so they work both as data generators and as property testing - they are related but independent. However, the other solutions are more mature and they also provide state checking - something we don’t plan to explore at the moment.

5 Likes

I’ve read the blog post about test.check by the author. It does look more more less equivalent to what Hypothesis does if you squint a little.

Well, a seed is even easier to serialize than the Hypothesis byte streams. With a way to serialize text data, you could (and should) save the examples that falsify the tests, so that they could be tested in a next run. And also do other cool things that depend on having serializeable examples.

I think you hsould shrink the examples as much as possible, and that includes shrinking <<16, 16>> to <<0>>… I’ll open an issue tomorrow.

Well, if including specific examples would complicate the code too much, then merely a high probability might be acceptable, especially if combined with a database of failed examples.

Nice job with the library!

This looks really great @whatyouhide! It seems similar in nature to Dave Thomas’s quixir which I’ve been using for a bit. I’m excited to see if I can make this work with some existing state machine properties that I have. I love that these tools may make their way into elixir itself.

Makes it much easier to test GenServer’s though with this. That is something I like about Quivic’s (sp?!?) QuickCheck, it was able to reduce GenServer and remote process calls so nicely. :slight_smile:

I’ve been building model based properties with quixir for a while. It works pretty well. Granted, you have to build pre-condition and post-condition checking yourself and the shrinking is much less optimized for model properties then eqc. Plus you don’t get the PULSE scheduler for inducing and controlling race-conditions which is really the most critical piece IMO.

Yes, I would support this. My PropCheck library (https://hexdocs.pm/propcheck/readme.html) uses PropEr to do the major work, but provides Elixir-based APIs, including the state-based stuff for GenServers and the like. For practical purposes, you would like to have this available. Otherwise, property based testing boils down to data structure testing only. PropEr has the disadvantage that it does not support maps (and this structs), is not updated a longer time (read: it is stable ;-)), and has a complicated code basis. Therefore, a fresh alternative looks very promising.

Tthe fundamental part for state-based property testing is data generation (which is quite easy) and a good shrinking strategy (which is way harder). With that available, a state-based approach should be easily implementable on top.

This is in quite contrast to Pulse from Quivic, where they manipulate the Erlang scheduler cleverly to find race conditions. While again based on generation and shrinking, here the infrastructure is much more complicated, which becomes evident, if you read their research papers.

I just published:

which is a post where I describe in detail the inner workings of StreamData for those of you who might be interested. :slight_smile:

10 Likes

Great article!

It could benefit from a fleshed out shrinking session in which you explicitly show how the tree is traversed.

Also, some details for people who are curious about the Hypothesis approach:

  1. Hypothesis’ approach is very elegant. But you pay for it with more complexity in the core, and making it harder to write generators. As @whatyouhide says, primitive generators must be written in a specific way, and even though the idea is simple, the details are quite involved. EDIT: composing already-built generators is very easy, though.

  2. Describing hypothesis as taking a random bytestring and generating examples out of it makes it look way simpler than it is: doing this would require an mapping that can generate an example out of any random byte stream thrown at it. This is not true. Sometimes hypothesis cheats and “generates” the bytes it needs in order to satisfy some data constraints (a list with at least 5 elements, for example; in this case hypothesis might force some bytes to have the value it wants). The corollary is that during shrinking, some byte strrams might generate invalid examples, which must be discarded. These are not major problems, but they must be managed carefully, and that takes more code.

  3. Shrinking is not done by only shrinking the byte stream. That would generate invalid data too often, and waste time on impossible shrinks. Hypothesis has a way of tagging some bytes as special, so that they will be shrunk in specific ways. In practice, shrinking has some restrictions. This again makes the core much more complex than just “eat some bytes and generate some objects”.

My undeestanding is that Hypothesis might unlock some shrinks that are hard to unlock when using the Lazy Tree approach, but I don’t know if that’s relevant in practice.

If you ever decide you want to go the byte stream route, I can send you my partial (and probably very buggy - haven’t tested it in ages) port of the Hypothesis core to elixir. The Python code is challenhing to understand in some parts, because it mutates objects like crazy and bends backwards to undo some mutations when it decides to backtrack. An elixir port should definitely return modified “generator state”. That gives you undo for free.

4 Likes

I just tried this example:

property "lists of integers don't contain 42" do
  check all list <- list_of(int()) do
    assert 42 not in list
  end
end

And it shrinked the list down to this, which initially through me off quite a bit until I remembered charlists :smiley:

list <- list_of(int())
#=> '*'

Maybe there could at least be a warning, because the module does seem to know it’s dealing with integer values and not with charlists.

How could it know? Can elixir distinguish characters from integers? I think it can’t…

And that shrink isn’t actually very impressive. The minimum example should have been the empty list, right?

It can’t know from the list itself, but it might be able from the generator used.
Also the shrink is correct. [] does not include 42, so the assert is fulfilled.

1 Like

Oh, sorry, ?* is 42. Read that in a hurry and thought it was trying to find a list that didn’t contain 42, instead of the one that does :stuck_out_tongue:

Thanks for StreamData and PropertyTest - going to prove very useful for me. I’ve taken it out for a spin but I can’t seem to influence the number of generations. My test looks like this:

  import PropertyTest
  property "checks that a date fits within the start and end dates for that year" do
    check all  day   <- StreamData.int(1..28),
               month <- StreamData.int(1..12),
               year  <- StreamData.int(1..3000),
               max_runs: 1_000
    do
      {:ok, date} = Date.new(year, month, day)
      starts = struct(Date, Cldr.Calendar.ISOWeek.first_day_of_year(date))
      ends = struct(Date, Cldr.Calendar.ISOWeek.last_day_of_year(date))
      assert Date.compare(starts, date) in [:lt, :eq] and Date.compare(ends, date) in [:gt, :eq]
    end
  end

I thought either the clauses would act like they do for a comprehension (seems they don’t, which makes sense) or would have been influenced by :max_runs. But no matter what I put for :max_runs the result of running this test is always 1 property, 125 tests, 0 failures. Any guidance would be much appreciated.

1 Like

Hey @kip, the “1 property” that you see when tests finish is just the number of properties that were run, not the number of times each single property was run. Actually a property is not related to check all so you can put a check all inside a test "something" and it will call it a test. :max_runs should be working in your example, to verify you can add something like IO.write("·")s in the body of the property and it will visually show you each time the body of a check all is tested.

1 Like

I’ve to say the way to compose generators with stream_data is really nice. I’ve created generators for the date/time structs of elixir (here) and some more custom ones and after a bit of getting used to them it’s quite simple to create them.

Also it’s great for DRYing up test setups. In a testcase I had tests for multiple different properties of returned values, but the setup was always the same, so I simply create a private helper function to hold the whole generator and just return the few values I needed for the tests.

    property "date pairs are always on the same day" do
      check all {from, until, _diff} <- consecutive_datetimes_with_date_diff() do
        # [some tests]
      end
    end
    # [ like 4 more of those ]

  defp consecutive_datetimes_with_date_diff do
    import StreamData

    gen all from <- date(),
            to_add <- integer(0..30),
            time_a <- time(),
            time_b <- time(),
            [time_from, time_to] = [time_a, time_b] |> Enum.sort do
      from = date_to_datetime(from, time_from)

      to =
        from
        |> Date.add(to_add)
        |> date_to_datetime(time_to)

      {from, to, to_add}
    end
  end
1 Like