To (Gen)Serve or not to (Gen)Serve

tmbb · August 5, 2017, 2:07pm

There is a python library named Hypothesis, which I’d like to have in elixir, so i’m thinking about porting it. That library is a property testing framework with a somewhat different approach to generating random test data.

Basically, the only data it generates is sequences of bytes, and has strategies to turn those bytes into meaningful datatypes (like maps, lists, lists of lists, etc).

The test case generators draw random bytes (the number of bytes might depend on the bytes that came before). This is easy to do in Elixir, of course (there is a random number module, so I’ve got that covered).

But them during test case shrinking I have to shrink the sewuence of bytes and generate the test case from that sequence of bytes insead. It’s important to have a generic interface to extract bytes from either a random source or a fixed source (a binary I’ve already generated). That way I can just call get_bytes(n) and have n new (maybe random, maybe fixed) bytes to work with, independently of their source.

Python being python, Hypothesis seems to handle this by defining objects with internal state from which you can read bytes. Those objects can return either random data or fixed data that was put there at initialization time.

So, how to port this to Elixir?

The naive way is implementing a GetBytes behavior and write a GenServer/Agent, that implements such behavior. Then, to generate the test cases, I use a random server that returns actual random data (while saving the data to later use). When I begin shrinking, I can just spawn a server with the right byte sequence as state and draw from there. That way, the rest of my code doesn’t need to know about where the bytes are coming from.

This seems very easy, and quite similar from the Python version. But on the other hand I don’t know if this is the best use case for a GenServer. It’s basically being used like a python object after all. But I can’t think of any other abstraction that allows me to keep drawing a sequence of bytes as if they were random, while being actually deterministic.

zazaian · August 5, 2017, 4:10pm

I’ll respond more thoroughly to your questions about implementation later, but for the moment, have you looked into any of the existing property-based testing frameworks for Erlang? While I think Quickcheck was originally written for Haskell, it basically found its home in the Erlang community because of its industrial background, and to my knowledge it should be possible to integrate into an Elixir project: http://www.quviq.com/products/erlang-quickcheck/

There’s also a very advanced property-based testing library called Proper, which was written and is maintained by Kostis Sagonas and his team in his lab at the National Technical University of Athens. That may be worth looking into as well: http://proper.softlab.ntua.gr/.

Additionally it seems Dave Thomas has released a pure-elixir property based testing lib, though I haven’t dug into it. May be worth taking a look at: https://github.com/pragdave/quixir

I certainly don’t mean to discourage you from writing another pure-elixir property-based testing lib, but I thought this might help ground you if you’re not already familiar with these well-established libraries (especially quickcheck and proper).

peerreynders · August 5, 2017, 4:17pm

Is the following approach viable?

{bytes, new_source} = (MyModule.get_bytes source, n)

where source/new_source is simply an opaque structure that captures the “source’s” current state.

FYI - there is also: PropEr: a QuickCheck-inspired property-based testing tool for Erlang

tmbb · August 5, 2017, 4:27pm

Yes, I’ve looked at all of them briefly.

Quivic is probably amazing, but none of my projects in elixir are likely to generate enough income to pay for it.

The other twoo implementations all seem to generate types directly instead of following the approach I’d like to try - generating test cases from strings of random bytes. This has some advantages I’d like to explore, even though such advantages might not be as important in Elixir as they are in Python, because most (all?) elixir terms can be serialize, unlike python’s objects.

Among the free ones, PropEr was the one that looked better on paper, but it’s GPL, which might be too restrictive for me to use… And in any case it generates objects directly, instead of generating them from byte streams, and I’d really like to explore the byte stream approach.

But anyway, thanks for the tips.

tmbb · August 5, 2017, 4:51pm

That looks nice. Yes, it might be viable And it can do “the same” as a process. It is not as concise as using a PID, but that’s probably a bad reason to use processes.

tmbb · August 5, 2017, 6:06pm

But even if this is viable and more obvious than a process, what would be the actual advantages? Isn’t the whole point of agents to help us fake mutable variables and internal state where convenient?

peerreynders · August 5, 2017, 7:37pm

facilis descensus Averno
the descent to Avernus [the underworld] is easy : the road to evil is smooth
Virgil, Aeneis, Book VI, 126.

or more plainly “convenience can be the root of all sorts of evils”.

https://twitter.com/jessitron/status/333228687208112128?lang=en

Kernel.spawn/3, Kernel.send/2 and Kernel.receive/1 are called concurrency primitives - i.e. they were created to enable and support multiple, independent flows of execution - the end game wasn’t to furnish the building blocks for a container of mutable state inside an immutable environment.

Now both Agents and GenServers use recursion to maintain state - but that shouldn’t be mistaken as a license to go on a mutable state rampage à la OO (actually Agents get mixed reviews - easy to learn but not all that useful).

Mutable state is necessary to “get stuff done” but it is also useful to minimize the reliance on mutable state - even when it is inconvenient - in order to keep things “easy to reason about”. In the Clojure community you often come across the notion that good (immutable) functional design “pushes the (impure aspects) to the edges of the system” - that type of design should still be a desirable goal with BEAM languages.

Coming from OO it’s very tempting to look for “object equivalents” but in FP it’s all about data transformation; processes are essentially services.

tmbb · August 5, 2017, 8:26pm

Oh, I certainly don’t like agents and this is the only time I’ve considered using them. My first accepted contribution to an open source Elixir project was basically me replacing an Agent with a GenServer

I would never do this for my own convenience as the library developper. It’s because it would be more convenient for the end user, although Elixir users would also find you returning a new source natural.

The thread you linked to was very educational for me. I am on the “Agents are bad” side, even if I have considered doing the pact with the devil just this once. And I might have used a genserver for that.

But the best part of that thread are the discussions of the Process dictionary and the way the random module uses it to avoid having to thread the state. It’s still mutable state, but it looks like a better fit.

It is way more self contained than an Agent (state is local to the current process), and might be useful for my use case. I’m now very undecided.

Your approach is certainly tbe virtuous and pure one, while using the process dictionary does lead down the smooth road to hell… I’ll have to think about it, thanks!

I’ll just wait a little longer for other perspectives before marking your answer ias a solution.

peerreynders · August 5, 2017, 8:46pm

FYI - for some background: