How to approach property-based testing for features which heavily rely on the database?

IwoHerka · August 30, 2023, 7:23pm

Let’s say I’m writing a (rather typical) application using Ecto, which heavily relies on a database.
Most features, let’s say, read from or write to the database. Let’s say I have feature “foo” which reads from multiple tables, does some non-trivial transformations and splits out the result. Let’s also say this feature has a simple interface in form of a single public function foo/1.

How to approach property-testing this function?

My approach, so far, is as follows (I’m using PropCheck):

Write some generators
Setup test framework (ExUnit + Ecto.Adapters.SQL.Sandbox)
Spin forall in a single test
Seed generated data into the database
Perform the test
Repeat last three steps X number of times

In code:

property "bla bla bla" do
  forall dataset <- Gen.foo_dataset() do
    Repo.transaction(fn ->
      seed(dataset)
     
      # perfom the test
      assert ... = foo(...)
     
      Repo.rollback(:test_end)
    end)
  end
end

(I’m skipping some non-essential bits)

The problems are:

a lot of data is generated, so tests are very expensive due to all the database inserts
Ecto sandbox is useless here because I’m inside a property (which basically is a loop) and transaction must be used to rollback data at the end of the test

I’m relatively new to Elixir, so I’m wondering, is this acceptable approach? Can this be done better conceptually?

Thanks!

sodapopcan · August 30, 2023, 8:32pm

Is there are any business logic happening the database itself? If not, I would decouple storage from the processing code and unit test the processing code with properties. Then have simple integration tests that interacts with the db.

If you’re concerned with making things public for testing purposes, this is where module hierarchy can come into play. Undocumented modules (@moduledoc false) are conventionally considered private. You can have public foo facade function and break stuff up undernearth into modules. If you want safety and explicitness around this, there is a brilliant library called Boundary.

If your db does do some business logic then I’m sorry I don’t have a good suggestion and am interested in this myself I’m wondering if there is a way to combine mocks with properties so you could mock the database This is something I haven’t looked into.

LostKobrakai · August 31, 2023, 7:21am

You can still use the sandbox for that. You need to manually tell it when to run though: Possibility to give forall a cleanup hook? · Issue #169 · alfert/propcheck · GitHub

a8t · September 2, 2023, 11:04am

You might want to pick a random subset of the data for local dev, and only in CI run all the inserts.