Hello folks,
There has been some doubts regarding StreamData and PropertyTesting in Elixir so we have decided to open up a thread to answer common questions we have been asked in person and seen around.
What is property-based testing?
Generally when we write tests, we write example-based tests. We need to come up with values when writing our test cases:
assert String.contains?("foobar", "foo")
The limitation of example-based testing is that they are entirely dependent on us in coming up with corner cases and we often make mistakes or fail to see important corner cases. With property-based testing, we define properties and let those properties generate random data for our tests:
check all left <- string(),
right <- string() do
assert String.contains?(left <> right, left)
assert String.contains?(left <> right, right)
end
Now every time you run the property, 100 examples will be generated. Common corner cases, such as ""
, will be tested frequently and help you find bugs in your code. The tricky part behind property-based testing is to find the properties we want our code to hold. Once a property is found, we can use those properties to complement our example-based tests.
At ElixirConf US 2017, we have announced that a property testing library will be part of Elixir v1.6. Our goal with this post is not to answer the technical questions behind StreamData but rather explain why it is being added to the language. For more information on property testing per se, the first three chapters of Fredâs book is a great starting point. To learn more about StreamData itself, see its announcement.
Why the core team decided to add Property Testing to Elixir?
There are usually two reasons why something is added to Elixir:
- We need it for building Elixir itself
- We believe it is an important concept/feature for the community
Property testing fits both.
For example, we had inconsistencies in Elixirâs standard library that would not exist if we had properties when implementing those functions. In Elixir v1.1 we deprecated String.contains?/2
with an empty string as a pattern, such as String.contains?(string, "")
because we were unsure of how it should behave. Then we added it back on Elixir v1.2 because @ThomasArts showed us a property that revealed String.contains?/2
should return true for empty strings:
check all left <- string(),
right <- string() do
assert String.contains?(left <> right, left)
assert String.contains?(left <> right, right)
end
Now imagine that right is ""
, then we get that:
assert String.contains?(left <> "", left)
assert String.contains?(left <> "", "")
If we had used properties since day one, we would have avoided this back and forth on the Elixir API. Then it became clear to us that property-based tests would not only help us find bugs in your code but also improve the design of our APIs. It can help us and the whole community write better software.
Isnât adding property testing to Elixir going to make it harder to learn?
Yes and no.
We should not expect all developers to learn property testing on their first day on the job. But, by adding it to the language, we are saying that if you want to be a proficient Elixir developer, then you should eventually learn property-based testing. We believe this important because we strongly believe you will write better software if you have property-based testing in your toolbox.
Learning a new programming language and its ecosystem is a journey and we care a lot about this journey. We are making this journey a bit longer but the extra miles will be worth it.
We also understand there is a limited amount of features we can add to the language before making the journey too long or the language too big. Adding something now means not including something else later. As an exercise, letâs see a counter example of when we didnât add something to the language: GenStage.
GenStage
is a solution to a particular problem: interfacing with external systems. We donât need it to build Elixir itself and we donât believe all developers need to know GenStage unless they are facing the particular problem GenStage is meant to address. It is a tool you reach for. In fact, we even made GenStage
less necessary in our day to day work by adding parallel processing of collections directly to Elixir with a single function called Task.async_stream/2
.
Why have our own implementation of property testing instead of using an existing implementation?
The main reasons are:
- Since we want to bundle it as part of Elixir, the code should be open source with an appropriate license
- We wanted to add both data generation and property testing to Elixir. Thatâs why the library is called stream_data instead of something named after property tests. The goal is to reduce the learning curve behind property testing by exposing the data generation aspect as streams, which is a known construct to most Elixir developers. We had this approach in mind for a while and the first library we saw leveraging this in practice was @pragdaveâs pollution
- Finally, since the core team are taking the responsibility of maintaining property testing as part of Elixir for potentially the rest of our lives, we want to have full understanding of every single line of code. This is non-negotiable as it guarantees we can continue to consistently improve the code as we move forward
We understand rolling our own implementation has its downsides, especially since it lacks maturity compared to alternatives, but we balance it by actively seeking input from knowledgeable folks and by listening to the feedback that comes from the community, which we are very thankful for.
Finally, it is also important to add that Stream Data does not fully replace existing solutions. The first version of Stream Data provides only stateless properties. Other property testing libraries also include stateful testing. QuickCheck comes with even more advanced features such as a randomizing scheduler for the Erlang VM called Pulse which makes it great for finding race conditions in concurrent code.
Our hope is that property-based testing in Elixir also works as a stepping stone for developers looking for more complete solutions.
Your turn
I hope this initial discussion provides some insight of why stream data / property testing is being added to Elixir. It certainly was not a decision done on a whim nor it is an attempt of the Elixir team to chase buzzwords. It has been an area of interest for a while and we are glad we are now finally able to work towards its inclusion on Elixir v1.6.
if you have questions, please let us hear them.