Task.await(task, :timer.seconds(3)) fails manytime

Hello as general discussion, I just wanted to know that if anybody see any thing wrong in this code

    try do
      task = Task.async(fn() ->
        datetime =
          timestamp
          |> Calendar.DateTime.Parse.unix!
          |> Calendar.DateTime.to_erl
          |> Ecto.DateTime.cast!
        params = construct_camera(datetime, status, camera.is_online == status)
        changeset = Camera.changeset(camera, params)
        camera = Repo.update!(changeset)
        log_camera_status(camera, status, datetime, error_code)
        broadcast_change_to_users(camera)
        Camera.invalidate_camera(camera)
      end)
      Task.await(task, :timer.seconds(3))
    catch _type, error ->
      Util.error_handler(error)
    end

where as the the logging status code is

  def log_camera_status(camera, true, datetime, nil), do: do_log_camera_status(camera, "online", datetime)
  def log_camera_status(camera, false, datetime, error_code), do: do_log_camera_status(camera, "offline", datetime, %{reason: error_code})

  defp do_log_camera_status(camera, status, datetime, extra \\ nil) do
    spawn fn ->
      case check_last_camera_status(camera.id, status) do
        true ->
          parameters = %{camera_id: camera.id, camera_exid: camera.exid, action: status, done_at: datetime, extra: extra}
          changeset = CameraActivity.changeset(%CameraActivity{}, parameters)
          SnapshotRepo.insert(changeset)
          send_notification(status, camera, camera.alert_emails)
        false -> :noop
      end
    end
  end

I dont know the reasons but sometime may be most of the time… this whole process get crashed at Task.await(task, :timer.seconds(3)) this… Anybody can point me if they see something wrong?

That should be Task.await(task, 3000)
Task.await/2

Update: Should have checked the reference - the call looked to me as if it was some pre-canned shortcut to creating a separate timed event.

Really the first question should be:

    datetime =
      timestamp
      |> Calendar.DateTime.Parse.unix!
      |> Calendar.DateTime.to_erl
      |> Ecto.DateTime.cast!
    params = construct_camera(datetime, status, camera.is_online == status)
    changeset = Camera.changeset(camera, params)
    camera = Repo.update!(changeset)
    log_camera_status(camera, status, datetime, error_code)
    broadcast_change_to_users(camera)
    Camera.invalidate_camera(camera)

Can you factor this out into a separate function and determine whether it runs reliably without being shoved into a Task?

That is what he called:

Task.await(task, :timer.seconds(3))

:timer.seconds(3) returns 3000.

Either way, to know what is happening when it fails, you really need to give use the entire error message (both the _type and the error).

Actually it never goes to that part, where i can say, Okay hmm, that the _type and error, When Ever I see sumoLogic report to see crashes, it only crashes at this line Task.await(task, :timer.seconds(3)), Thats all I have from this code.

My main concern was, was it good to have first an async task and then inside it a spawn as well?

In my local machine, It works and it works very very well. But in production, it fails like a duty and the error is always at Task.await(task, :timer.seconds(3)) this line only. My concerns are: why this line?

May be it is failing because in 3 seconds, the process all written inside it, not being completed?

Also, I am totally unaware why it is even in an async task?

1 Like

await simply returns after the timeout - the task continues to run as long as the process that created it is alive (the death of the creator will take it down) - the spawn just keeps running regardless because it’s not linked.

Is it possible that it’s a production issue because of different hardware drivers (does construct_camera interface with camera hardware somehow)?

If I remember correctly that try will not protect you from the task crashing - that is what trapping exits is for. So when the the task crashes it would take down the creating process via an exit signal.

No, Its only a method construct_camera which is creating params to changeset, No hardware relation with this.

Dont you think its better to just do all the process in spawn? and no use of async?

The typical reply tends to be “redesign it and use a GenServer instead” - don’t mess with primitive processes unless you have a good reason to (or it’s extremely trivial).

Though I still think that a crash in the task function is the best hypothesis so far - the creating process would likely be sitting on Task.await when the exit signal from the crashed task would take it down - however I would expect the error message to mention that there was a process exit.

Try and catch. don’t work the same way as they work in Ruby On Rails? same as the rescue? so we can catch what was the error in try?

If not then the logic needs a full change.

GenServer, It would be good to use GenServer for just a few queries?

Is it possible that there are so many back to back such Async tasks came across, during that 3 seconds and crash all process?

The main issue here is: When a camera is going online offline again and again. then while updating the status of the camera, this Task runs, and maybe its possible that so many such tasks, may become parallel and crash everything? (Sorry for not talking more like a programmer but a theory teller.)

They work the same as long as you are in the same process. Exits from other processes that you are linked to are not controlled that way.

Typically you would write the GenServer to run that functionality periodically or on demand. So rather the spawning a new task every time, you just send it a message “go do it”.

So just to be clear - that task is only working on data that is static while the code is running?

I noticed the camera.is_online == status check - so if the camera is online when the code starts and the camera goes offline while the code is still running - it doesn’t affect the task as it is only working on data that was cached from the camera before it went offline?


camera = Repo.update!(changeset)

is there any chance that the changeset is invalid at this point?

Ecto.Repo.update!/2

Same as update/2 but returns the struct or raises if the changeset is invalid.

If that update raises an exception the task will die, sending an exit signal to the creating process sitting on Task.await and kill it.

1 Like

The intent of your code is probably something like this:

task = Task.async(fn() ->
  try do
    datetime =
      timestamp
      |> Calendar.DateTime.Parse.unix!
      |> Calendar.DateTime.to_erl
      |> Ecto.DateTime.cast!
    params = construct_camera(datetime, status, camera.is_online == status)
    changeset = Camera.changeset(camera, params)
    camera = Repo.update!(changeset)
    log_camera_status(camera, status, datetime, error_code)
    broadcast_change_to_users(camera)
    Camera.invalidate_camera(camera)
  catch _type, error ->
    Util.error_handler(error)
  end
end)
Task.await(task, :timer.seconds(3))

What is the reason reported when it crashes?

	
: EvercamMedia.Snapshot.DBHandler.change_camera_status/4
    (evercam_media) lib/evercam_media/snapshot/db_handler.ex:61: EvercamMedia.Snapshot.DBHandler.update_camera_status/5
    (evercam_media) lib/evercam_media/snapshot/error.ex:124: EvercamMedia.Snapshot.Error.handle/4
2017-07-12 15:39:56.122 [info] [carlo-vdbyz] [update_status] [offline] [unhandled]
2017-07-12 15:39:56.130 [error] Error in process #PID<0.11946.94> on node :"evercam_media@media.evercam.io" with exit value:
{%ArgumentError{message: "repo EvercamMedia.SnapshotRepo is not started, please ensure it is part of your supervision tree"},
 [{Ecto.Query.Planner, :query_lookup, 6,
   [file: 'lib/ecto/query/planner.ex', line: 136]},
  {Ecto.Query.Planner, :query_with_cache, 7,
   [file: 'lib/ecto/query/planner.ex', line: 119]},
  {Ecto.Repo.Queryable, :execute, 5,
   [file: 'lib/ecto/repo/queryable.ex', line: 122]},
  {Ecto.Repo.Queryable, :all, 4,
   [file: 'lib/ecto/repo/queryable.ex', line: 35]},
  {Ecto.Repo.Queryable, :one, 4,
   [file: 'lib/ecto/repo/queryable.ex', line: 68]},
  {EvercamMedia.Snapshot.DBHandler, :check_last_camera_status, 2,
   [file: 'lib/evercam_media/snapshot/db_handler.ex', line: 129]},
  {EvercamMedia.Snapshot.DBHandler, :"-do_log_camera_status/4-fun-0-", 4,
   [file: 'lib/evercam_media/snapshot/db_handler.ex', line: 117]}]}

This is the full trace of the error at the moment when it crashes.

So, I think you are not starting a Repo? :slight_smile:

Actually we are using 2 DBs at the moment, one is EvercamMedia.Repo and other one is EvercamMedia.SnapshotRepo

Both were working properly in past.

The thing is: that camera status logging is in SnapshotRepo, But I have no idea how to start it? as we are just alias EvercamMedia.SnapshotRepo and using it everywere as it is and it works.

So I guess this discussion is about this:
https://github.com/evercam/evercam-server/issues/683

that task is only working on data that is static while the code is running?

Actually data is not static, the whole process depends on a Camera, when it comes online then this code runs, when it goes offline the same code runs but then with different status.

When it comes online the code runs, and the same time if it goes offline, the code runs, both process runs in different threads, thats how I understand it, so that they may complete their process.

is there any chance that the changeset is invalid at this point?

No there is no chance for that.

Thanks for looking into this so deeply.

Yes, you are right. I am discussing this.

The try catch is still in the wrong place. it needs to be inside the task. Any function in there with “!” is capable of emitting an exception (besides trying to access a Repo that isn’t running). Also in terms of intent rescue rather than catch is used in connection with exceptions. catch is used for thrown values or exits originating in the same process.