josevalim
Questions about Property Testing / Stream Data
Hello folks,
There has been some doubts regarding StreamData and PropertyTesting in Elixir so we have decided to open up a thread to answer common questions we have been asked in person and seen around.
What is property-based testing?
Generally when we write tests, we write example-based tests. We need to come up with values when writing our test cases:
assert String.contains?("foobar", "foo")
The limitation of example-based testing is that they are entirely dependent on us in coming up with corner cases and we often make mistakes or fail to see important corner cases. With property-based testing, we define properties and let those properties generate random data for our tests:
check all left <- string(),
right <- string() do
assert String.contains?(left <> right, left)
assert String.contains?(left <> right, right)
end
Now every time you run the property, 100 examples will be generated. Common corner cases, such as "", will be tested frequently and help you find bugs in your code. The tricky part behind property-based testing is to find the properties we want our code to hold. Once a property is found, we can use those properties to complement our example-based tests.
At ElixirConf US 2017, we have announced that a property testing library will be part of Elixir v1.6. Our goal with this post is not to answer the technical questions behind StreamData but rather explain why it is being added to the language. For more information on property testing per se, the first three chapters of Fred’s book is a great starting point. To learn more about StreamData itself, see its announcement.
Why the core team decided to add Property Testing to Elixir?
There are usually two reasons why something is added to Elixir:
- We need it for building Elixir itself
- We believe it is an important concept/feature for the community
Property testing fits both.
For example, we had inconsistencies in Elixir’s standard library that would not exist if we had properties when implementing those functions. In Elixir v1.1 we deprecated String.contains?/2 with an empty string as a pattern, such as String.contains?(string, "") because we were unsure of how it should behave. Then we added it back on Elixir v1.2 because @ThomasArts showed us a property that revealed String.contains?/2 should return true for empty strings:
check all left <- string(),
right <- string() do
assert String.contains?(left <> right, left)
assert String.contains?(left <> right, right)
end
Now imagine that right is "", then we get that:
assert String.contains?(left <> "", left)
assert String.contains?(left <> "", "")
If we had used properties since day one, we would have avoided this back and forth on the Elixir API. Then it became clear to us that property-based tests would not only help us find bugs in your code but also improve the design of our APIs. It can help us and the whole community write better software.
Isn’t adding property testing to Elixir going to make it harder to learn?
Yes and no.
We should not expect all developers to learn property testing on their first day on the job. But, by adding it to the language, we are saying that if you want to be a proficient Elixir developer, then you should eventually learn property-based testing. We believe this important because we strongly believe you will write better software if you have property-based testing in your toolbox.
Learning a new programming language and its ecosystem is a journey and we care a lot about this journey. We are making this journey a bit longer but the extra miles will be worth it.
We also understand there is a limited amount of features we can add to the language before making the journey too long or the language too big. Adding something now means not including something else later. As an exercise, let’s see a counter example of when we didn’t add something to the language: GenStage.
GenStage is a solution to a particular problem: interfacing with external systems. We don’t need it to build Elixir itself and we don’t believe all developers need to know GenStage unless they are facing the particular problem GenStage is meant to address. It is a tool you reach for. In fact, we even made GenStage less necessary in our day to day work by adding parallel processing of collections directly to Elixir with a single function called Task.async_stream/2.
Why have our own implementation of property testing instead of using an existing implementation?
The main reasons are:
- Since we want to bundle it as part of Elixir, the code should be open source with an appropriate license
- We wanted to add both data generation and property testing to Elixir. That’s why the library is called stream_data instead of something named after property tests. The goal is to reduce the learning curve behind property testing by exposing the data generation aspect as streams, which is a known construct to most Elixir developers. We had this approach in mind for a while and the first library we saw leveraging this in practice was @pragdave’s pollution
- Finally, since the core team are taking the responsibility of maintaining property testing as part of Elixir for potentially the rest of our lives, we want to have full understanding of every single line of code. This is non-negotiable as it guarantees we can continue to consistently improve the code as we move forward
We understand rolling our own implementation has its downsides, especially since it lacks maturity compared to alternatives, but we balance it by actively seeking input from knowledgeable folks and by listening to the feedback that comes from the community, which we are very thankful for.
Finally, it is also important to add that Stream Data does not fully replace existing solutions. The first version of Stream Data provides only stateless properties. Other property testing libraries also include stateful testing. QuickCheck comes with even more advanced features such as a randomizing scheduler for the Erlang VM called Pulse which makes it great for finding race conditions in concurrent code.
Our hope is that property-based testing in Elixir also works as a stepping stone for developers looking for more complete solutions.
Your turn
I hope this initial discussion provides some insight of why stream data / property testing is being added to Elixir. It certainly was not a decision done on a whim nor it is an attempt of the Elixir team to chase buzzwords. It has been an area of interest for a while and we are glad we are now finally able to work towards its inclusion on Elixir v1.6.
if you have questions, please let us hear them. ![]()
Most Liked
josevalim
To show more examples that we have always been proactive in discussing the major new features in every release, I will link to examples of flag features that have been proposed and discussed with the community:
v1.5 - Behaviours, defimpl and overridable
and a follow up later on about child specs in Goggle Groups. We also explicitly reached out to many other members in the community, such as book authors.
v1.4 - Registry
The Registry is one of my favorite examples because the community stepped up to validate that the Registry was indeed scalable, running benchmarks on machines up to 40 cores.
v1.4 - On GenStage and Flow not being added to core
When we decided GenStage and Flow should not be part of core, we communicated that too:
v1.4 - A declined proposal: removing char lists
This is an actually nice example of where we forgot to communicate why we decided to not add a feature. Then folks asked for clarification and we rectified it.
v1.3 - Calendar
This one we worked directly with Paul and Lau which were responsible for the existing calendar implementations and we came up with a proposal for the data types:
This one has no discussion on Elixir Forum because I believe that’s about the time we were starting to ramp up on the forum. That’s why you won’t find previous proposals on the forum although they have always been on the mailing list.
All other proposals above can be found in the Elixir News section in the forum (which is what I did right now) or by searching in the mailing list.
Everything else
You would find the other features and bug fixes in previous releases in the issues tracker. Those don’t have proposals because they are smaller in scope. You will also see that many of them have been implemented by the community and not the core team.
v1.6 - A counter-example: the code formatter
There is actually a counter-example where the development happened behind “closed doors”. This was a deliberate decision because style discussions can be quiet opinionated and heated. So unless it is clear to everyone that the goal is consistency and not personal preference, it can be very hard to make progress (and we did have some heated discussions in the core team!
).
Still, when the prototype was done, we merged it into Elixir and got more than 200 PRs from 84 people over the course of 3 days to format the Elixir codebase. The feature is in master now, about 3 months before the next release, for anyone who wants to give it a try.
josevalim
Sorry for keeping on adding replies but this is a thread to answer questions after all. ![]()
I definitely have this concern too. What if I am ran over by bus? What if get bored of development a la Office Space?
It is one of the reasons why development and communication is open. If I disappear, hopefully the language goals and ideals have echoed to many developers who will be able to carry the torch.
That’s also why we have a core team and I actively delegate areas of the codebase that I have been the only person to work on to the other team members. In Elixir v1.5 there has been an effort in improving the Elixir compiler internals where I put little work exactly with this concern in mind. Overall the code is well tested and well documented.
And that’s also why I love initiatives like Elixir School and Elixir Forum, because they are run by the community and for the community, without any involvement of the Elixir team.
Finally, It is also one of the reasons why I said many times, including in this thread, that we can only add so much to the language:
A big language does not only fragment the community and makes it harder to learn but it is also harder to maintain. It is also why the language was designed to be extensible: so the community could build what is necessary without a push to make everything part of the language.
This leads me to the next topic.
For those who joined Elixir early on, it definitely has become harder and harder to participate in the language evolution. I don’t dispute that. However, I don’t think it is because of lack of communication, which hopefully I showed above is still on going and present, but because of many other factors:
- The language is no longer changing as rapidly so there are fewer opportunities to get involved
- The number of features we are adding to the language is reducing (as it should)
- The changes have become more focused and specialized (which requires more time investment to participate)
- The community growth means the more accessible issues are addressed really fast because there is always someone ready to contribute
- The community growth means discussions develop fast. If you join late, you will need to catch up on a big backlog (which is why I am very vocal about not side-tracking discussions)
When someone asks me how to contribute to Elixir, I always talk about the community and the ecosystem. That’s where the focus should be.
AstonJ
I started typing a reply to this last night but fell asleep as it was 5am… José has covered everything but I just wanted to add that we’ve witnessed somewhat the opposite to Sean here on the forum and since it’s been raised, what we are trying to do to help José and Chris as the community grows.
With regards to community involvement, José has consistently posted in the elixir-news section with lots of threads seeking feedback (many more in his post above). Chris has done the same in the phoenix-forum and quite often discussions that have taken place have led to quite fundamental changes or even minor (but helpful nonetheless) tweaks.
In fact many a time I have heard people mention that it’s a huge plus having the core team active in the community - which is in contrast to other languges such as Ruby where the majority of the core team do not speak English.
Having said that, I completely appreciate that as a community grows it does indeed become more challenging to get everyone involved. Again José has highlighted many reasons why that might be (and the fact that there’s more people to get stuff done is a great problem to have) but I think it’s also worth adding that when it comes to discussing important decisions - as a community grows it becomes significantly more time consuming reviewing (and sometimes responding to) everything everyone says, particularly when you’re discussing things with people of different backgrounds and levels of understanding or when discussions diverge. Just as an example, if you look at José’s last few posts in this thread these must have taken at least an hour or two of his time.
There are tools and strategies we have and can put in place (such as post likes) that can help. By also putting the onus on to people to do their best to be constructive and demonstrate their pov as effectively as they can (so the good ideas attract lots of likes, and the not so good ideas can be openly challenged) it somewhat alleviates some of the pressure from the core team as they can quickly see which ideas are gaining traction, and hopefully someone has already addressed issues that might otherwise require a response from them (as might have happened in this thread had I not fallen asleep
)
I’m acutely aware of the burden a growing community can have on the core team/s and part of the goal of this forum is to help and support them, José and Chris as the community grows in a way that eases as much of that burden as we can while still remaining open and receptive to new ideas while moving the conversation forward
It can be a tricky path but I think we’ve done a good job so far but I’ll let everyone else be the judge of that ![]()
The only other thing I wanted to comment on, is that again from my perspective, I sense a huge sense of community - at least here on the forum. I often get messages from members saying how much they love being part of this community and how it is one of the best communities they’ve ever been a part of ![]()
Popular in News
Other popular topics
Categories:
Sub Categories:
Forums
Popular Tags
- #ecto
- #liveview
- #troubleshooting
- #learning-elixir
- #deployment
- #library
- #erlang
- #testing
- #genserver
- #mix
- #absinthe
- #remote-other
- #otp
- #plug
- #how-to-question
- #macros
- #postgres
- #channels
- #elixirconf
- #exunit
- #discussion
- #javascript
- #code-sync
- #podcasts
- #onsite
- #dialyzer
- #docker
- #authentication
- #umbrella
- #full-time-contract
- #podcasts-by-brainlid
- #ecto-query
- #elixir-ls
- #phoenix_html
- #iex
- #blog-post
- #graphql
- #genstage
- #ai
- #websockets
- #supervisor
- #advent-of-code
- #elixirconf-us
- #distillery
- #processes
- #forms
- #api
- #metaprogramming
- #security
- #performance








