What's the recommended way of testing the behavior of SQL queries generated by Ecto?

Stratus3D · December 26, 2024, 3:15pm

I’ve been reading “Mastering PostgreSQL in Application Development” book by Dimitri Fontaine and recommends testing the behavior of the query code you write with tools like https://pgtap.org/ and GitHub - dimitri/regresql: Regression Testing your SQL queries. I feel like this is useful because business logic is codified in queries, and that logic should be tested. I specifically want to test the behavior of the generated query. I don’t want to test Ecto itself, and I don’t want to just assert that code produces the expected SQL, as the structure of the SQL can change without affecting the results it returns (example: switching from a subquery to a join).

With the Elixir tests I’ve written for past projects I’ve tested everything together. Tests exercise the Elixir code and run the SQL generated by Ecto, and Ecto sandbox is used so the tests run in isolation.

Perhaps I should place Ecto query code in individual functions, and then write Elixir tests that run those queries and assert they return the correct results? Thoughts?

Related threads:

benwilson512 · December 26, 2024, 4:31pm

This is generally the approach I advocate. If there are some specifically complicated SQL, particularly SQL that is doing a lot of internal logic to compute values (eg, window functions, complex aggregates, etc) then extracting the query into something I can easily call from my tests and run against known values / results is also worth it.

In essence I think this question is very similar to “unit tests vs functional tests”. While those definitions can be fuzzy, I do think most people and code bases benefit from a focus on more “end to end” style tests. However any time some specific sub component gets quite complicated (like your SQL) a unit test can be handy.

More and more when I’ve got a component like that I also try to throw in some property tests. While no substitute for some specific test cases, it goes a LONG way towards helping make sure all the edge cases are covered.

dimitarvp · December 26, 2024, 6:03pm

I feel like the pgtap article is written in a language I have no chance of ever learning.

Am I missing something or is the article mostly saying “Hey, let’s just write SQL for our tests where more than one DB operation is involved”? This is a legitimate question and not a hidden snark; I am trying to understand.

And I don’t understand the premise of the “simple database tests” either. (1) they don’t seem like “database” tests to me and (2) they seem like integration tests. I for one don’t want to keep my integration tests simple. I want to keep them readable and easy to edit / extend. If an integration test is simple then it would fail my code review with the comment “mark this as a unit test instead”.

What the author calls “a complex test” seem like very normal sane test making sure a piece of functionality is working.

The idea of doing it all in SQL does not work either; my app is not going to be deployed only on the database so of course I want to test the application code that fetches stuff from DB and serializes it to its own data structures… why would I do it any other way? Then I would not be testing my application.

I feel like either I am missing something extremely obvious and will laugh at myself when somebody points it to me… or that I can’t relate to the problems that the author of the article describes, like at all.

Help?

sbuttgereit · December 26, 2024, 8:04pm

Which article are you referring to? I’m having trouble finding some of the assertions you’re calling out (“simple database tests” or “complex tests”) in the original post’s links.

While I’m a bit rusty, I have used pgTAP a quite a bit over the years and I might be able to provide some insights… but I can’t readily find your references to see where I might add something to help clarify.

jsonbecker · December 26, 2024, 8:50pm

I have used pgTap and I can’t think of why I’d use it or something similar or what I’d gain from that in the context of an Elixir code base.

If you have a function that queries a database and returns a result, you should write a test that calls that function, queries the database, and gets that result in your test. Everything else, by your own admission, is not something you need to test (Ecto itself, which I would say includes how Ecto translates your code into SQL, and one could extend that all the way down to the execution plan of your database itself).

We seed some data, run the function, assert results.

sbuttgereit · December 26, 2024, 9:25pm

I almost agree with this; in fact I do agree in the common cases where Elixir gets used as application language. Most Elixir developers won’t need to worry about pgTAP to test their databases and can safely ignore it.

Where I disagree is that a decision to use pgTAP shouldn’t be predicated at all on the what the primary application language/runtime is: pgTAP usefulness/applicability depends specifically on how the database is built and intended to be used and there are database use cases which go beyond the relatively simple application persistence function which most Elixir applications assume.

To be fair, these use cases, where the database takes on a more substantially central role, tend to exist in more “enterprise” scenarios where Elixir isn’t very common, but that doesn’t mean that Elixir/BEAM doesn’t have anything to offer in these cases. Personally, I believe it does and is part of my interest in this ecosystem. Nevertheless, in these cases, pgTAP may well be useful, not to supplant Elixir’s standard application testing tools and approaches, but to support a more demanding database design than the typical Elixir application sees.

jsonbecker · December 26, 2024, 9:32pm

I’m not sure if “demanding” is the word I’d use. The example I had for using pgTap was when we had a process for data pipeline/ingestion which occurred outside of the application codebase. But to me, that’s still testing where the activity is happening versus separating out my database as a set of tests versus my application. pgTap would be useful, for example, to test stored procedures or database functions which are defined and maintained outside of your application (which I don’t think is a great thing to do, but is sometimes the right solution).

In each case, I’d be writing tests for the code I’m writing and executing, but not for a system. I use pgTap because some part of my application is entirely managed in SQL/PostgreSQL and not my application or service code. From within Elixir, I’m testing that my function calls give me the right results. If part of that is calling a database function that is not being defined in an Ecto Migration, yeah, I may have a test suite with pgTAP where I generate that function and expect to change it, but from my Elixir application perspective, my contract is with getting the result I expect from the DB, not how that happens.

sbuttgereit · December 26, 2024, 11:44pm

It sounds like our disagreements are principally categorical rather than anything that would result in substantial differences in what tests get written or not. There are reasons for these differences existing, including just simple conceptually different frames of reference. Ultimately, in practice we probably end up in much the same place regarding what tests get written. There’s probably a larger discussion that could be had, but probably not useful in this context.

So yes. Use Elixir testing tools to test your Elixir code: this includes those paths which include database elements (queries, updates, etc), No matter what the database is or does, all of that needs to work in that context directly in any scenario. For the majority of Elixir developer’s: you’re done at that point. pgTAP doesn’t help you any more than the internal testing tools will within this scope.

If your database goes beyond several dozen tables and foreign keys, such as databases with many hundreds of tables and supporting views, triggers, database functions, stored procedures, and/or the expectation that the application server code, while perhaps the primary access method, is nonetheless not the only one supported, then tools like pgTAP start to have more relevance for testing the database in its own right.

dimitarvp · December 27, 2024, 1:24am

I am referring to the homepage of pgTAP itself and mostly reacted to this:

with pgTAP, you can keep your database tests simple.

You and @jsonbecker already addressed that pgTAP is indeed a poor fit for application tests. We all agree on that then.

Though I still don’t see how business logic being codified in queries is something that should immediately summon pgTAP to the rescue (I am quoting @Stratus3D’s OP here); f.ex. with an Elixir + Ecto app you simply call the same context / query functions in your tests that your production code calls and if everything passes then all is well, no? Why is pgTAP needed at all?

I can see this being true when the production DB is not to be touched at all; legacy or enterprise systems as you said, in which cases using Ecto migrations on this production DB is absolutely out of the question, maybe?

But even in those cases, you can make as many throwaway databases to test as your heart desires.

Add to this the fact that I’ve written no less than 4 separate database anonymizers in the last 8-9 years that 95% eliminate the problem of a test DB not representing the production DB and… the value of pgTAP to me is very far from obvious.

jsonbecker · December 27, 2024, 1:36am

I’m sure you may be right we end up in similar places, but my experience has never been with something small enough to be just “several dozen tables and some foreign keys”, and have, at times involved all of these features and I’m still struggling to understand when or why pgTAP gets involved due to “complexity”.

I’d test each access method as appropriate from the context it’s accessed and what it can do and its expectations.

I’ve had a great experience with pgTAP when the code I was testing was literally SQL scripts. In that case, testing in PG makes sense because that’s where the code lived. That’s a context where this makes sense.

Our differences may be categorical, I just want to emphatically state that this category difference is not from “complexity”— I’m not speaking purely from very simple 3 month projects. And I’m not speaking from solely Elixir experience.

dimitarvp · December 27, 2024, 1:40am

Yeah, my main takeaway as well. pgTAP seems to be ex_unit (Elixir) or jUnit (Java) for PostgreSQL scripts, more or less.

Stratus3D · December 27, 2024, 1:18pm

sbuttgereit:

So yes. Use Elixir testing tools to test your Elixir code: this includes those paths which include database elements (queries, updates, etc), No matter what the database is or does, all of that needs to work in that context directly in any scenario. For the majority of Elixir developer’s: you’re done at that point. pgTAP doesn’t help you any more than the internal testing tools will within this scope.

If your database goes beyond several dozen tables and foreign keys, such as databases with many hundreds of tables and supporting views, triggers, database functions, stored procedures, and/or the expectation that the application server code, while perhaps the primary access method, is nonetheless not the only one supported, then tools like pgTAP start to have more relevance for testing the database in its own right.

It sounds like this is this is generally the consensus. This seems pretty reasonable, ExUnit and Ecto provide the tools needed to test just queries in isolation if that is desired. Only when database elements exist to serve clients other than a single Elixir app does it make sense to go with a more general SQL testing tool (pgTAP was just an example, I believe there are others). Thanks everyone for your input!

What's the recommended way of testing the **behavior** of SQL queries generated by Ecto?

What's the recommended way of testing the behavior of SQL queries generated by Ecto?