GenStage causing unexpected application exits

So im in the process of migrating an app to GenStage. i have three major systems that interact and hopefully i can generalize it enough to get a good idea of what is going on…

+`Farmbot.Supervisor` <- main entry point
|
+------- `Farmbot.BotState.Supervisor`
|             |
|             + `Farmbot.BotState` - GenStage producer consumer. subscribe_to: Farmbot.Firmware
|             |
|             + `Farmbot.BotState.Transport.Supervisor`
|                  |
|                  +--- `Farmbot.BotState.Transport.GenMQTT` - ` - genstage consumer subscribes to Farmbot.BotState
|                  |
|                  +---- `Farmbot.BotState.Transport.HTTP` - genstage consumer subscribes to `Farmbot.BotState`
|    
|
+-------`Farmbot.Firmware.Supervisor`
            |
            + `Farmbot.Firmware` - GenStage producer-consumer `subscribe_to: Farmbot.Firmware.Handler`
                  |
                  + `Farmbot.Firmware.Handler` - handles UART communication - GenStage producer

so Farmbot.BotState consumes events from theFarmbot.Firmware which consumes events that get sent over Nerves.UART and sends those events over various transports (MQTT and HTTP) All of these use dispatcher: GenStage.BroadcastDispatcher.

and any transport may receive commands that command the Farmbot.Firmware to do something.
What happens is when a command message comes in over a transport. so for example a transport receives a message to move to a coordinate (0, 1, 2), and that fails for some reason, and Farmbot.Firmware.Handler crashes. this crashes Farmbot.Firmware which crashes Farmbot.BotState which crashes all of the transports. That is all fine, but the app doesn’t ever come back up, In the logs i see it flail around for number of resets, and i can see all the init functions successfully complete (via log messages), but the app still crashes. but some random things are still left running but i can’t see them in Observer…

heres a log that may or may not help.

iex(3)> 

Elixir.Farmbot.BotState.Transport.GenMQTT.Client received %{args: %{label: "4c0723a7-e08b-447b-864a-81eb913491ee"}, body: [%{args: %{speed: 100, x: 1, y: 0, z: 0}, body: [], comment: nil, kind: "move_relative"}], comment: nil, kind: "rpc_request"}

Elixir.Farmbot.CeleryScript doing: #RpcRequest<[:label]>

Elixir.Farmbot.CeleryScript doing: #MoveRelative<[:speed, :x, :y, :z]>

*lixir.Farmbot.Firmware.UartHandler Got unhandled gcode: R08 *G00 X1 Y0 Z0 A100 B100 C100

Elixir.Farmbot.Firmware.UartHandler Got unhandled gcode: 1.00 Y0.00 Z0.00 Q0

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.


18:38:27.594 [error] GenServer Farmbot.Firmware terminating
** (stop) exited in: GenServer.call(#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000)
    ** (EXIT) time out
    (elixir) lib/gen_server.ex:774: GenServer.call/3
    (farmbot) lib/farmbot/firmware/firmware.ex:96: Farmbot.Firmware.handle_call/3
    (gen_stage) lib/gen_stage.ex:1958: GenStage.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.807.0>): {:move_absolute, [%Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100]}
State: %Farmbot.Firmware.State{handler: #PID<0.636.0>, handler_mod: Farmbot.Firmware.UartHandler, idle: true, pins: %{}}
Client #PID<0.807.0> is dead

18:38:27.594 [info]  GenStage consumer Farmbot.BotState is stopping after receiving cancel from producer #PID<0.635.0> with reason: {:timeout,
 {GenServer, :call,
  [#PID<0.636.0>,
   {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.


18:38:27.603 [error] GenServer Farmbot.BotState terminating
** (stop) exited in: GenServer.call(#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000)
    ** (EXIT) time out
Last message: {:DOWN, #Reference<0.2682793687.2540175366.49774>, :process, #PID<0.635.0>, {:timeout, {GenServer, :call, [#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}}
State: %Farmbot.BotState{configuration: %{"auto_sync" => false, "current_repo" => "A", "firmware_hardware" => nil, "first_boot" => false, "first_party_farmware" => true, "first_party_farmware_url" => "https://raw.githubusercontent.com/FarmBot-Labs/farmware_manifests/master/manifest.json", "first_sync" => false, "ignore_external_logs" => true, "os_auto_update" => false, "timezone" => nil}, informational_settings: %{busy: false, commit: "4e20709", controller_version: "6.0.1-alpha.1", env: :dev, sync_status: :sync_now, target: "host"}, jobs: %{}, location_data: %{position: %{x: 0, y: 0, z: 0}, raw_encoders: %{x: 0, y: 0, z: 0}, scaled_encoders: %{x: 0.0, y: 0.0, z: 0.0}}, mcu_params: %{}, pins: %{}, process_info: %{}, user_env: %{}}

18:38:27.604 [info]  GenStage consumer Farmbot.BotState.Transport.HTTP is stopping after receiving cancel from producer #PID<0.639.0> with reason: {:timeout,
 {GenServer, :call,
  [#PID<0.636.0>,
   {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}


18:38:27.604 [info]  GenStage consumer Farmbot.BotState.Transport.GenMQTT is stopping after receiving cancel from producer #PID<0.639.0> with reason: {:timeout,
 {GenServer, :call,
  [#PID<0.636.0>,
   {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}


18:38:27.604 [error] GenServer Farmbot.BotState.Transport.GenMQTT terminating
** (stop) exited in: GenServer.call(#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000)
    ** (EXIT) time out
Last message: {:DOWN, #Reference<0.2682793687.2540175366.49824>, :process, #PID<0.639.0>, {:timeout, {GenServer, :call, [#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}}
State: {%{client: #PID<0.642.0>}, %Farmbot.BotState{configuration: %{"auto_sync" => false, "current_repo" => "A", "firmware_hardware" => nil, "first_boot" => false, "first_party_farmware" => true, "first_party_farmware_url" => "https://raw.githubusercontent.com/FarmBot-Labs/farmware_manifests/master/manifest.json", "first_sync" => false, "ignore_external_logs" => true, "os_auto_update" => false, "timezone" => nil}, informational_settings: %{busy: false, commit: "4e20709", controller_version: "6.0.1-alpha.1", env: :dev, sync_status: :sync_now, target: "host"}, jobs: %{}, location_data: %{position: %{x: 0, y: 0, z: 0}, raw_encoders: %{x: 0, y: 0, z: 0}, scaled_encoders: %{x: 0.0, y: 0.0, z: 0.0}}, mcu_params: %{}, pins: %{}, process_info: %{}, user_env: %{}}}

18:38:27.605 [error] GenServer Farmbot.BotState.Transport.HTTP terminating
** (stop) exited in: GenServer.call(#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000)
    ** (EXIT) time out
Last message: {:DOWN, #Reference<0.2682793687.2540175366.49999>, :process, #PID<0.639.0>, {:timeout, {GenServer, :call, [#PID<0.636.0>, {:move_absolute, %Farmbot.Firmware.Vec3{x: 1, y: 0, z: 0}, 100}, 5000]}}}
State: %{bot_state: %Farmbot.BotState{configuration: %{"auto_sync" => false, "current_repo" => "A", "firmware_hardware" => nil, "first_boot" => false, "first_party_farmware" => true, "first_party_farmware_url" => "https://raw.githubusercontent.com/FarmBot-Labs/farmware_manifests/master/manifest.json", "first_sync" => false, "ignore_external_logs" => true, "os_auto_update" => false, "timezone" => nil}, informational_settings: %{busy: false, commit: "4e20709", controller_version: "6.0.1-alpha.1", env: :dev, sync_status: :sync_now, target: "host"}, jobs: %{}, location_data: %{position: %{x: 0, y: 0, z: 0}, raw_encoders: %{x: 0, y: 0, z: 0}, scaled_encoders: %{x: 0.0, y: 0.0, z: 0.0}}, mcu_params: %{}, pins: %{}, process_info: %{}, user_env: %{}}, public_key: %JOSE.JWK{fields: %{}, keys: :undefined, kty: {:jose_jwk_kty_rsa, {:RSAPublicKey, 22285049158885340376326070958724265578825988326592561891505335517981697112663115627605336016312974648727152712825232672703606901833990104527742064940818896351125254041668220713149710519971929393596540889444662981852109138863362312382240414604690780890284495352157133567803500537833974423927025044579584717594115081637678287985602165208959626135586335293657212186069038239670817395935529798146206303315156219733820787146161466247576911481782975175283080062638007814045263061446090972610985470798813113429879314241226322495616246179054635918379249452581046287333578216120004125518974406573192664202976595616504079460129, 65537}}}, sockets: []}
Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.Bootstrap.Supervisor Beginning authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.Bootstrap.Supervisor Successful authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.Bootstrap.Supervisor Beginning authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.Bootstrap.Supervisor Successful authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.

Elixir.Farmbot.Bootstrap.Supervisor Beginning authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.Bootstrap.Supervisor Successful authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.


18:38:27.923 [info]  Application farmbot exited: shutdown

The first 5 lines are the transport receiving a command,
Elixir.Farmbot.Bootstrap.Supervisor Beginning authorization: Elixir.Farmbot.Bootstrap.Authorization - admin@admin.com - http://localhost:3000 as far as i can tell should not be called. It is much lower in the bootstrap process. This is fetching a JWT.

Elixir.Farmbot.BotState.Transport.GenMQTT.Client Connected to real time services.
is the mqtt client authenticating with a JWT. (that is stored in a sqlite db from that above auth task). I’m not exactly sure why this happens.

3 Likes

Update:
i enabled otp and sasl reports in logger to see more stuff.

  handle_otp_reports: true,
  handle_sasl_reports: true

and found it was a bug in the way i was using plug for my http transport.

i have this in the init function.

    {:ok, web} = Plug.Adapters.Cowboy.http Router, [], [port: @port, dispatch: [cowboy_dispatch()]]
    Process.link(web)

I was under the assumption that when my process crashed, web would exit also, so i never cleaned it up. so when the HTTP transport gets restarted, i got:

** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.1655.0>}}
    (farmbot) lib/farmbot/bot_state/transport/http/http.ex:47: Farmbot.BotState.Transport.HTTP.init/1

which wasn’t showing up in the normal logs. I’ve restructured my HTTP adapter to properly supervise my plug router, and everything is working a ok again. Thanks for reading!

5 Likes