My flame process is failing for some reason, but I can’t debug the cause of it. The process works OK on a local backend and for smaller payloads in production.
I’m uploading and processing images, and this only occurs when I exceed a certain payload size. It is triggered at around 5 large files that are between 20-50 MB.
The basic steps of the process are:
- using live view upload and copy files to a temp folder
- create parent and local file streams so flame calls can access the file (based on the example from here: Rethinking Serverless with FLAME · The Fly Blog)
- process files with
image
library - stream files to s3 with
ExAws
I have enabled debug
logs for the phoenix app and ExAws
as well.
On hint I can get from the logs is that ExAws
is struggling with the upload, it seems like it’s not getting any input. Part of an upload request shows: BODY: ""
.
The only relevant flame log I get is:
2024-02-15T14:14:41Z app[0806250b612738] ams [info]14:14:41.353 [error] GenServer FLAME.Terminator.ChildPlacementSup terminating
2024-02-15T14:14:41Z app[0806250b612738] ams [info]** (stop) killed
2024-02-15T14:14:41Z app[0806250b612738] ams [info]Last message: {:EXIT, #PID<0.2625.0>, :killed}
2024-02-15T14:14:41Z app[148e461a10d638] ams [info]14:14:41.352 [error] GenServer #PID<0.2913.0> terminating
2024-02-15T14:14:41Z app[148e461a10d638] ams [info]** (stop) killed
2024-02-15T14:14:41Z app[148e461a10d638] ams [info]Last message: {:DOWN, #Reference<0.3703568104.3533176833.226316>, :process, #PID<64302.2627.0>
, :killed}
2024-02-15T14:14:41Z app[148e461a10d638] ams [info]State: %{runner: #FLAME.Runner<id: nil, instance_id: nil, private_ip: nil, backend: FLAME.FlyB
ackend, terminator: #PID<64302.2627.0>, node_name: nil, single_use: true, timeout: 30000, status: :booted, log: :debug, boot_timeout: 30000, idle
_shutdown_after: 30000, idle_shutdown_check: #Function<8.81159202/0 in FLAME.Runner.new/1>, ...>, checkouts: %{}, otp_app: :phoenix_albums, backe
nd_state: #FLAME.FlyBackend<host: "https://api.machines.dev", local_ip: ["fdaa:3:e5fc:a7b:c207:6cdf:6e23:2"], cpu_kind: "performance", cpus: 1, m
emory_mb: 4096, gpu_kind: nil, image: "registry.fly.io/phoenix-albums:deployment-01HPPHKRM3RX1WQ2E7DJFTC0KT", app: "phoenix-albums", boot_timeout
: 30000, runner_id: "0806250b612738", remote_terminator_pid: #PID<64302.2627.0>, runner_node_basename: "phoenix-albums-01HPPHKRM3RX1WQ2E7DJFTC0KT
", runner_instance_id: "01HPPHV48EZPS6AMZJPSCJ4B2H", runner_private_ip: "fdaa:3:e5fc:a7b:252:9f38:8dec:2", runner_node_name: :"phoenix-albums-01H
PPHKRM3RX1WQ2E7DJFTC0KT@fdaa:3:e5fc:a7b:252:9f38:8dec:2", ...>}
but this might just be the regular timeout shutdown.
First and foremost I’m looking for ways to debug what is going on in the Flame runners, because the data processing definitely breaks somewhere, but I’m unable to catch any errors.