Help debugging ecto query issue in tests

I’m trying to debug a sporadic test failure issue. This is my current setup:

  • Elixir 1.6.4
  • Postgres 9.6.10
  • Ecto 2.2.10
  • Postgrex 0.13.5

While the specific query is complex (and uses a lateral join fragment), I’m not convinced that the error is directly related to the query. Plus I’ll need to take a little bit of time to reconstruct the query.

The issue manifests itself as an ordering issue, the rows that I request do not always come back in the order that I expect. The distribution of the errors makes me think that there is something odd going on.

To help reproduce the issue I tried two mechanism. One was running the query multiple times and the second was duplicating the test. Running the query multiple times was not helpful, if the query worked the first time it would work the 1000th time too. However, duplicating the test was more fruitful.

  for i <- 1..400 do
    test "query/2 ordering #{i}" do
      # test the code here
    end
  end

Running this gave me some interesting results:

Run Number # failures
1 19/401
2 0/401
3 0/401
4 29/401
5 100/401
6 137/401
7 67/401
8 0/401
9 1/401
10 0/401
11 0/401
12 0/401
13 0/401
14 0/401
15 0/401
16 0/401
17 0/401
18 0/401
19 0/401
20 123/401
21 157/401
22 0.00
23 0.00
24 271/401

It’s odd to me how there are so many runs that are completely successful and then the ones with failures usually have 20+ errors.

Does anyone have any ideas of what could be causing this?

1 Like

Is everything explicitly ordered in the query? Postgres can and will return things in different orders unless explicitly told not to.

1 Like

Yes you are totally right. I didn’t realize that Postgres does not preserve order after a join, although it appears that it does 99% of the time. Adding an additional ORDER BY clause to the top-level of the query fixed it right up!

2 Likes

That’s the nature of a hashed BTree journaling system! ^.^

2 Likes