Seem to get a weird value when trying to store timestamp in DB

eugenebolshakov · November 13, 2019, 9:43am

I’m trying to debug an intermittent test failure that I’m only getting on CI. I’m fairly new to elixir and might be missing something obvious or not explaining the issue clearly, so would appreciate any advice / tips!

I’m working on a phoenix app and occasionally get this error when running tests on CI:

%ArgumentError{message: ":utc_datetime_usec expects microsecond precision, got: ~U[2018-03-01 18:26:08.003Z]"}

It seems to happen when the app tries to save a timestamp in the DB in various places. In this particular case it happens when it tries to do:

changeset
|> put_change(:date_changed_at, Timex.now()) # :date_changed_at is a :utc_datetime_usec and is the only field of that type updated in this query
|> Repo.insert!()

The error itself is actually clear: the timestamp field is a utc_datetime_usec but the app is trying to save a timestamp with a microsecond precision. What’s extra odd though is that the timestamp seems to be in the past (2018). So I guess there are two questions which might be symptoms of the same problem.

Why is the timestamp seemingly in the past?
Why is it sometimes microsecond precision?

I guess it looks like either Timex.now() returns that weird timestamp for some reason or it gets corrupted somewhere in ecto before it’s stored.

I’ve added a step on the CI that outputs OS time in shell and it’s correct. I can also see a correct timestamp in the phoenix/ecto logs just before the failure:

     18:52:21.588 request_id=FdZ-_DE7pwDhErkAAAwS [debug] QUERY OK db=0.3ms queue=0.2ms
     UPDATE "orders" SET "updated_at" = $1 WHERE "id" = $2 [~N[2019-11-12 18:52:21.588200], 5]
     18:52:21.589 request_id=FdZ-_DE7pwDhErkAAAwS [warn] Checkout failed for order 6NQDDP7B: %ArgumentError{message: ":utc_datetime_usec expects microsecond precision, got: ~U[2018-03-01 18:26:08.003Z]"}

Ecto’s usec timestamps are generated using DateTime.utc_now() and that seems to return a correct value according to the logs above. However Timex.now() effectively does the same things:

DateTime.utc_now/1 (https://github.com/elixir-lang/elixir/blob/v1.9.4/lib/elixir/lib/calendar/datetime.ex#L81-L84):

 @spec utc_now(Calendar.calendar()) :: t
  def utc_now(calendar \\ Calendar.ISO) do
    System.os_time() |> from_unix!(:native, calendar)
  end

Timex.now/0 (https://github.com/bitwalker/timex/blob/3.6.1/lib/timex.ex#L50):

def now(), do: from_unix(:os.system_time(), :native)

and Timex.from_unix/2 delegates to DateTime.from_unix!/2 (https://github.com/bitwalker/timex/blob/3.6.1/lib/timex.ex#L194-L196):

  def from_unix(secs, unit) do
    DateTime.from_unix!(secs, unit)
  end

So I’m at a loss and not sure what else to do to debug this. Does anyone have any thoughts / suggestions?

danj · November 13, 2019, 1:12pm

It looks like the db update succeeds with the right time stamp, the log saying QUERY OK. This looks like a problem after the query runs. Is it possible the type of date_changed_at column is lower precision and this is causing a mismatch in a select or returning elsewhere?

eugenebolshakov · November 13, 2019, 8:21pm

Yes, that query in the log suceeds, but the error happens in a different one - the date_changed_at field that is updated is actually in a different table. I just pasted that log to illustrate that the time in a different query is correct, but for some reason it’s in the past and has microsecond precision in the query that is executed afterwards.

No, date_changed_at is definitely utc_datetime_usec. In fact that test passes / query works most of the time, but just occasionally fails on CI and the same error also happens elsewhere in test suite occasionally as well.

eugenebolshakov · November 14, 2019, 7:13am

Thanks for that! It made me think that I wasn’t actually sure in that particular case if the exception was raised on inserting a record into the database – I just assumed it was (the error is caught and logged and so there was so stacktrace).

I also checked the logs from the previous time it happened and the weird timestamp was exactly the same: ~U[2018-03-01 18:26:08.003Z] so I thought, maybe we do actually create a record with that timestamp somewhere in the test suite. I ran a search and sure enough found this bit in one of the old tests :

use ExUnit.Case, async: true

describe "can_deliver?/2" do 
  setup do
    :meck.new(Timex, [:passthrough])

    :meck.expect(Timex, :now, fn ->
      DateTime.from_naive!(~N[2018-03-01 18:26:08.003], "Etc/UTC")
     end)

     on_exit(&:meck.unload/0)
  end

Now I know why this is happening, but just wondering do you know why that affects other tests in the suite? I.e. why does the mock remain active in other tests sometimes?

Thanks!

al2o3cr · November 14, 2019, 12:05pm

:meck.new works by manipulating and reloading the target module - that’s a global operation, so combining it with async: true tests is going to cause exactly the intermittent failures you’re seeing.

eugenebolshakov · November 14, 2019, 3:38pm

That’s quite obvious if you think about it!

Thanks so much for your help!