Property-based testing slow when hitting the database

linusdm · October 3, 2023, 3:25pm

I’m trying to test an Ecto query that joins a few tables, and that a lot of where conditions that have to satisfy before selecting rows. I was thinking about how property based testing would be a nice way to generate a lot of permutations to make sure the filtering works as expected. The setup involves creating some entities, and calling context functions that also hit the database. Generating the data works, but I see the test takes quite some time. I could conclude that property based testing is just not suited when hitting the database, because it’s too slow to generate all the required state. But I’m not sure if that’s really true. Is this still a viable option somehow (for example, by turning down the :max_runs setting for these kinds of tests)?

I’m new to property based testing, so I appreciate all feedback.

benwilson512 · October 3, 2023, 3:55pm

We have a few property based tests that hit the Database, and they are indeed slow. My general tips would be:

use async false and a shared sandbox mode so that you can insert common records ahead of all individual property interactions.
benchmark individual iterations. Maybe your code is actually just slower than it has to be and you can make it faster.
accept the slowness and set a time limit. Have this time limit be configurable and periodically test it with larger time limits if you are changing related code and want to be extra thorough.
consider setting up a Postgres database optimized for testing. This is a larger topic but a lot of postgreses data integrity features are unimportant for running tests.

mpope · October 3, 2023, 5:32pm

It could even be possible to use an in-memory SQLite just for the property tests, and have regular tests hit the same database as production.

dimitarvp · October 3, 2023, 6:57pm

PosgtreSQL and SQLite differ too much for those tests to be objective.

linusdm · October 4, 2023, 8:51am

Thanks for sharing those valuable tips and experiences!

I see how I can make some improvements by moving some code to a common setup block that runs before asserting all the generated data.

I’m not sure I understand why the sandbox mode has to be shared though. Am I correct to assume that using the property/3 and check/1 macro’s (I’m using the StreamData library for property-based testing) don’t change anything about the fact that all this code is using the same connection and running in the same, potentially large, transaction? Or does it run each iteration of the generator in one separate transaction?

Regarding a database optimised for testing: does that mean going for a weaker isolation level? Or are you hinting for another relaxation somewhere?

LostKobrakai · October 4, 2023, 9:16am

Depends on how you set it up. The sandbox is checked out and in where you or some code owned by you calls the APIs to do so. You can do both: Sandbox per generated value(s) or sandbox for the whole test.

I’d also add a completely different angle here. It sounds like you have a bunch of business logic, which at the end should result in an appropriate selection of records. You generally cannot really test Ecto.Query structs, as their content is mostly private API. So most often we default to executing the query as well to test the query.

Another option would be separating the “business logic to build up query details” from “turning query details into Ecto.Querys”. You can use a custom struct to aggregate all the information you need to build the final query. That struct can be fully (unit) testable with non-private fields. Also implement the Ecto.Queryable protocol and you can use that custom struct wherever Ecto.Queryable is accepted. The protocol implementation can then be a simpler (not as complex to test) implementation of turning the data in your struct into an Ecto.Query.