StreamData: data generation and property testing for Elixir

whatyouhide · August 11, 2017, 3:00pm

Hello folks,

I am super excited to finally share what I’ve been working on in the last few months: StreamData, an Elixir library for data generation and property testing. StreamData is a candidate to be included in Elixir itself but we wanted to start off with a library to first give people the chance to give it a try and to get the interface right (we did something similar with GenStage, which ended up remaining outside of Elixir core).

StreamData is still in its infancy but it’s ready to be tested by a wider audience (let’s say it’s exiting alpha and entering beta), so I invite the Elixir community to give it a try. Open issues, send PRs, and spread the word!

Andrea

Edit - I just published a blog post that talks about the inner workings of this library for those of you who might be interested:

tmbb · August 12, 2017, 3:15pm

This seems interesting, but on a cursory reading of the docs, there is something I don’t understand. What’s the real advantage of this library over other libraries we already have for both Erlang and Elixir? At a first glance it doesn’t seem to bring anything new. Maybe there are improvements under the hood, like better shrinking, or maybe it’s more convenient because it has better generators. Or maybe it makes it easier for users to create their own generators or shrinking strategies. I can’t find any documentation on these topics as of now, but I assume that must be possible right? Working with a predetermined set of generators, even if a language with as few types as Elixir, seems very restrictive.

I think these points I’ve raised should be addressed in the documentation:

Why should I care about StreamData instead of other options?
How do I write my own custom generators?
Are the generated values completely random, or do we have any guarantees? For example, when I test my code on a random list, I’d like to always test it on the empty list.

tmbb · August 12, 2017, 3:24pm

You seem to have written type-specific generators which can be composed. This is probably the most common approach in property testing, but I’d like to point to a Python library which does things differently. Please note that maybe I’m misrepresenting the way your library works - maybe the LazyTree module takes the palce of the random bytes in my explanation below…

In Hypothesis, Instead of writing custom generators for each type (they call this approach “type-based shrinking”), they generate a stream of random bytes, and write generators to turn the random bytes into useful types. Instead of shrinking the generated data structure, they shrink the byte stream (first by deleting bytes, then by trying to decrease the number of bytes). The generators are written in such a way that most of the time shrinking the byte stream also shrinks the generated value. This decouples the shrinking and generating parts of the code. It also makes it very easy for users to write their own custom generators, which will shrink automatically and do the right thing most of the time. It’s also very elegant.

Additionally, this has the advantage that all values can be serialized (just store the random bytes and rebuild the value from the byte sequence). This allows hypothesis to save failing examples in a database and test them again in the next round.

In his blog, the author claims that this approach is superior to type-based shrinking (something I’m mostly convinced it’s true, but don’ have any data do back my intuition). This post tries to explain some of the differences: http://hypothesis.works/articles/how-hypothesis-works/

The core of Hypothesis seems to be more complex than the “core” of StreamData, but Hypothesis seems to be smarter with shrinking. It’s still quite simple, though, and I’ve been working on and off trying to port it to Elixir. For example, you write: https://hexdocs.pm/stream_data/StreamData.html#binary/0-shrinking. I don’t know if this is a purposeful design decision or a technical limitation of the way you’ve written StreamData, but in Hypothesis, a smart bytestring generator could very easily shrink both the length AND the bytes (which I think would be the correct API).

EDIT: Something that Hypothesis makes easy is to do something like generating an integer according to a distribution and then generating a list with that number of items, or even more complex operations, which I’ve never done in practice. For example, it can generate a pair of integers (n, m) and then generate and (n, m) matrix of items defined by another generator, and the matrix will shrink properly. I don’t think StreamData supports that except by writing a generator from scratch, right?

manukall · August 12, 2017, 3:30pm

I’ve given this a little try today and written about it at https://blog.pryin.io/first-impressions-of-property-testing-with-stream-data/.

This is my first time with a property testing library, so it’s a very basic write up.

I’ve enjoyed playing with StreamData very much, but I just realized that it requires Elixir 1.5 and my library needs to support older versions. So I probably won’t merge my test changes yet.

OvermindDL1 · August 12, 2017, 3:36pm

Except you can depend on it only in :test, then users don’t require it. ^.^

manukall · August 12, 2017, 3:38pm

yes, but i want my tests to run on 1.4, too

NobbZ · August 12, 2017, 3:44pm

Do not include it in your tests for 1.4 then… You can dynamically append the dependency and you can conditionally compile those files, just make sure you have a proper helper that you use.

manukall · August 12, 2017, 3:49pm

I know i can do that. i don’t want to maintain 2 test suites or keep in mind that some of my tests do not run on Elixir < 1.5.

It’s not a problem, though. I’ll just keep writing my tests as usual and switch to StreamData when it makes more sense for this library.

whatyouhide · August 12, 2017, 6:09pm

Thanks so much for all the links, they were very interesting reads! The Hypothesis approach is indeed both elegant and powerful but as far as I can tell, it’s not more powerful than the StreamData approach (which as I mention in the README is really Clojure’s test.check approach, credit where credit is due ).

One thing to note: StreamData is not type-based in any way. It just happens that I defined a bunch of type-ish generators because they are useful .

I’ll write a blog post about it because it’s very interesting IMO, but the gist is:

all StreamData generators take seed as the argument and can use that seed to produce whatever they want. This is analogous to the randomness of the byte stream in Hypothesis.
a StreamData generator (which is substantially a function) generates a “lazy tree” when generating a value: a lazy tree is basically a tree where the root is realized (that is, eager, not lazy) and the children are a lazy stream of lazy trees. In StreamData, the root is the generated values and the child subtrees are the shrinks of that value. An easy example is integers: int/0 returns a tree where the root might be 4, and the children might be 0, 2, 3 and 2 might have the child 1 and so on. Shrinking becomes a matter of a somewhat fancy depth-first search in this tree. This also solves exactly the problem that the author of Hypothesis highlights in http://hypothesis.works/articles/integrated-shrinking/.
The seed that generators accept as argument can be “split”, meaning we can get two seeds from one seed in a deterministic way. So generators just split seeds when they need to pass down the seed. Since everything is deterministic, to “replay” a run you can just pass the same seed (this is already done to integrate with ExUnit where we use ExUnit’s seed).

Last but not least, binary/0 does not have any problem in shrinking both bytes and the length of the binary. There is actually more work done in order to make it shrink like it does today. Whether it’s the right API or not, it’s a different discussion but I’m very much interested in it so please open up an issue in the StreamData repo and we can discuss with everyone else as well

Hope this made it somehow clearer, as I said I’ll try to put together a blog post because this is interesting and I spent the last months grokking it so I’d be very happy to share the insights!

whatyouhide · August 12, 2017, 6:13pm

Sorry I forgot to answer the original 3 questions:

StreamData is pure Elixir, shrinks like Hypothesis/test.check (so the “good” way), it has a chance of getting merged into Elixir core, is integrated with ExUnit
There are many functions in StreamData to create custom generators. The basics are bind/2, map/2, filter/3, but there’s also StreamData.gen all which is syntactic sugar to create really easy to read generators.
The values are completely random but the testing is driven by a “generation size” (mentioned in the docs) which guarantees we start with smaller values which makes the chance of an empty list very very high

Let me know if you have more questions

josevalim · August 12, 2017, 8:03pm

Since it is being considered for inclusion in Elixir itself then it should be written in Elixir - so it fits with everything else provided by the language and with a matching license. Also note that we decided to build on top of the Stream functionality we provide as part of Elixir, so they work both as data generators and as property testing - they are related but independent. However, the other solutions are more mature and they also provide state checking - something we don’t plan to explore at the moment.

tmbb · August 12, 2017, 10:40pm

I’ve read the blog post about test.check by the author. It does look more more less equivalent to what Hypothesis does if you squint a little.

Well, a seed is even easier to serialize than the Hypothesis byte streams. With a way to serialize text data, you could (and should) save the examples that falsify the tests, so that they could be tested in a next run. And also do other cool things that depend on having serializeable examples.

I think you hsould shrink the examples as much as possible, and that includes shrinking <<16, 16>> to <<0>>… I’ll open an issue tomorrow.

Well, if including specific examples would complicate the code too much, then merely a high probability might be acceptable, especially if combined with a database of failed examples.

Nice job with the library!

keathley · August 13, 2017, 3:48pm

This looks really great @whatyouhide! It seems similar in nature to Dave Thomas’s quixir which I’ve been using for a bit. I’m excited to see if I can make this work with some existing state machine properties that I have. I love that these tools may make their way into elixir itself.

OvermindDL1 · August 13, 2017, 4:03pm

Makes it much easier to test GenServer’s though with this. That is something I like about Quivic’s (sp?!?) QuickCheck, it was able to reduce GenServer and remote process calls so nicely.

keathley · August 13, 2017, 4:26pm

I’ve been building model based properties with quixir for a while. It works pretty well. Granted, you have to build pre-condition and post-condition checking yourself and the shrinking is much less optimized for model properties then eqc. Plus you don’t get the PULSE scheduler for inducing and controlling race-conditions which is really the most critical piece IMO.

alfert · August 14, 2017, 11:13am

Yes, I would support this. My PropCheck library (https://hexdocs.pm/propcheck/readme.html) uses PropEr to do the major work, but provides Elixir-based APIs, including the state-based stuff for GenServers and the like. For practical purposes, you would like to have this available. Otherwise, property based testing boils down to data structure testing only. PropEr has the disadvantage that it does not support maps (and this structs), is not updated a longer time (read: it is stable ;-)), and has a complicated code basis. Therefore, a fresh alternative looks very promising.

Tthe fundamental part for state-based property testing is data generation (which is quite easy) and a good shrinking strategy (which is way harder). With that available, a state-based approach should be easily implementable on top.

This is in quite contrast to Pulse from Quivic, where they manipulate the Erlang scheduler cleverly to find race conditions. While again based on generation and shrinking, here the infrastructure is much more complicated, which becomes evident, if you read their research papers.

whatyouhide · August 15, 2017, 6:27pm

I just published:

which is a post where I describe in detail the inner workings of StreamData for those of you who might be interested.

tmbb · August 15, 2017, 8:48pm

Great article!

It could benefit from a fleshed out shrinking session in which you explicitly show how the tree is traversed.

Also, some details for people who are curious about the Hypothesis approach:

Hypothesis’ approach is very elegant. But you pay for it with more complexity in the core, and making it harder to write generators. As @whatyouhide says, primitive generators must be written in a specific way, and even though the idea is simple, the details are quite involved. EDIT: composing already-built generators is very easy, though.
Describing hypothesis as taking a random bytestring and generating examples out of it makes it look way simpler than it is: doing this would require an mapping that can generate an example out of any random byte stream thrown at it. This is not true. Sometimes hypothesis cheats and “generates” the bytes it needs in order to satisfy some data constraints (a list with at least 5 elements, for example; in this case hypothesis might force some bytes to have the value it wants). The corollary is that during shrinking, some byte strrams might generate invalid examples, which must be discarded. These are not major problems, but they must be managed carefully, and that takes more code.
Shrinking is not done by only shrinking the byte stream. That would generate invalid data too often, and waste time on impossible shrinks. Hypothesis has a way of tagging some bytes as special, so that they will be shrunk in specific ways. In practice, shrinking has some restrictions. This again makes the core much more complex than just “eat some bytes and generate some objects”.

My undeestanding is that Hypothesis might unlock some shrinks that are hard to unlock when using the Lazy Tree approach, but I don’t know if that’s relevant in practice.

If you ever decide you want to go the byte stream route, I can send you my partial (and probably very buggy - haven’t tested it in ages) port of the Hypothesis core to elixir. The Python code is challenhing to understand in some parts, because it mutates objects like crazy and bends backwards to undo some mutations when it decides to backtrack. An elixir port should definitely return modified “generator state”. That gives you undo for free.

LostKobrakai · August 16, 2017, 4:28pm

I just tried this example:

property "lists of integers don't contain 42" do
  check all list <- list_of(int()) do
    assert 42 not in list
  end
end

And it shrinked the list down to this, which initially through me off quite a bit until I remembered charlists

list <- list_of(int())
#=> '*'

Maybe there could at least be a warning, because the module does seem to know it’s dealing with integer values and not with charlists.