How to effectively test for concurrency problems and race conditions with Ecto

bamorim · May 7, 2019, 1:40am

Hi guys. I’m writing some queries and I want to test for some race conditions and see if my app handles them just fine.
What I’m doing right now is mocking my Repo (using Mock instead of Mox, because I don’t want to create a behaviour for my Repo and do all the dependency injection stuff just for that) and setting some delays specifically for some processes and running them in a Task.async. This is far from ideal but it works and I was able to have tests capturing the race conditions I wanted.

I was thinking if anyone have a better idea on how to approach these kinds of tests.

One thing I was thinking is to create a custom Ecto.Adapter that actually just delegates to another Ecto.Adapter but allows setting some delays. What do you guys think?

Thank you =D

OvermindDL1 · May 7, 2019, 2:46pm

Hmm, interesting idea, you could do a lot of worst-case testing with that… It would make for a good library.

dimitarvp · May 7, 2019, 4:27pm

To be fair, you’d just be testing your own assumptions about Ecto Repo IMO. Or Ecto itself.

Or you meant you wanted to find problems in your code that make faulty assumptions about timing?

I’d start looking into your DB engine of choice’s locks and transaction levels.

al2o3cr · May 7, 2019, 7:01pm

+1 to @dimitarvp - you can only reproduce timing-sensitive behavior you understand, and the really nasty race condition bugs are the ones that you don’t.

bamorim · May 9, 2019, 7:09pm

I agree with that, with the fact the address he nasty bugs are the ones you don’t understand.

But just because there is a lot of behaviour you don’t know it means you can’t write tests about that?

When you are faced with a nasty concurrency problem and spend your weekend trying to figure it out won’t you think it is a good idea to write a regression test so it doesn’t happen again?

I mean, you are basically getting an excuse to not test your software or to not have automated tests.

bamorim · May 9, 2019, 7:12pm

I look at it. There are always trade offs. In psql you have 4 isolation levels and each one has some trade-offs. Sometimes it’s not clear that a phantom read could lead to an inconsistent state. Sometimes you don’t want a serializable isolation level because that means a lot of retries in transaction or implies a queue that serializes the queries in your application level. Sometimes your performance requirements completely removed the possibility for row or table level locks.
Sometimes you found a bug and you just want a way to prove it with code, with a test that can serve as a regression test.

bamorim · May 9, 2019, 7:19pm

Just to clarify, I’m not looking for a way to say: hey, check there are no race conditions. That is a hard task that would require some sort of formal proof or something like that (doable, but hard)
I want a way to say: hey, I know this race condition may occur in some conditions over this sequence of actions, check if it still happens.