Hey there,
While trying to temporarily fix a laggy UI (with long DB operations) I am left wondering if assign_async accepts e.g. timeout: :timer.seconds(30) at all?
I am looking at the source of Phoenix.LiveView.Async and it does not seem to use opts for anything outside supervisor and reset, as per its docs. Still, the code uses macro environments and I did not scan the entire code base of LV so I might be missing an implicit context.
Does anybody know? And, assuming the answer is “no”, what options do I have to give assign_async a bigger breathing room for executing its enclosed function?
Fairly sure. The function enclosed in assign_async is doing some very slow DB work and then I get a stacktrace right in the UI that clearly blames Phoenix.LiveView.Async.assign_async almost near the bottom. The bottom-most entry is Task.Supervised.invoke_mfa.
Actually wait, I might have misunderstood you. Task.start_link – which the code ends up using because it does not supply the :supervisor option to assign_async – indeed does not impose a timeout. Must be something higher up, like Ash (which the project uses). Oh well, now I am mystified.
Sorry for the ping @zachdaniel. We’re using Ash.read! inside the body of the function that’s nested in LiveView’s assign_async. We are giving a very generous timeout (minutes) as an argument to Ash.read!, yet Ash – or something else, it’s not yet clear enough – is cancelling the queries long before that – it seems to be 15 seconds by my own manual mental counting but I wouldn’t bet my life on it.
This is a body of a function that’s executed once inside the function nested inside assign_async. And assign_async is itself called N times with each resource.
Here’s a word soup with our app redacted, in case it helps:
One thing to try (it shouldn’t matter) is to do Ash.Query.set_timeout(query, timeout).
Another thing to try is upping your repo timeout in config, that could be the issue. If so, we may actually be missing something which is passing the timeout down into the call to reads that are not in transactions
Something you could try is setting transaction? true on the read action, as that I know changes the way timeouts are handled. Would maybe point us in the right direction.
Since it’s nearly impossible to replicate prod environment locally, we’ll have to try all of those at once in a single deployment and also add more metrics or OTel spans (or both)… and maybe even extra logging.
Thanks. Promise to come back here and let you know.
Yeah, the problem with this is that we are doing a for loop (basically) and doing many assign_async actions and each does Ash.read! – think it’s going to be problematic to modify all of them with the transaction flag.
I already added Ash.Query.timeout – there’s no set_timeout btw, let me know if I am missing something – and waiting for prod deployment.
OK I had another go at this. (I will also add transaction? true).
One spot: AshPostgres.DataLayer.run_query/2 does not support timeouts. We are currently at 2.5.10 so please check this out:
The 3rd argument should be the timeout but it’s just flat out nil, and run_query/2’s arguments don’t allow for a timeout as well.
I am about to task an LLM with making me a partial copy of AshPostgres.DataLayer that passes a module attribute’s value as a timeout, copying any private functions in that new module that run_query/2 needs. Would that be how you would do it?
So, Ash.read should apply the configured timeout. I think the first order of business here is for you to reproduce in iex for example. i.e just run a long running read action with a timeout and provide that it’s Ash ignoring your configured timeout. You can add a preparation to a read action that does :timer.sleep(:timer.seconds(60)), and then run a read action with a timeout of 5 minutes, and see if it does or does not, in fact, timeout.
Ultimately we just don’t know what we’re debugging yet.
In general, if Ash was timing you out, you’d get a specific timeout error, not a database error.
The code you’re looking at does make sense because when a read action gets a timeout it currently just starts a Task with a timeout to honor it, it doesn’t delegate it to the data layer.
I don’t understand the proposal here currently, sorry
Ah, I was desperately trying to just make a copy of AshPostgres.DataLayer for another fix (we had a similar error in our APM but not quite). In any case, it doesn’t work (says that postgres/1 is not found when I replace just one Ash.Resource’s data_layer: ... with the custom module so…)
Ha, that’s an awesome idea, thanks! I’ll try that!
OK that didn’t work. We also use AshPaperTrail and I need to reproduce timeout on prod. I have tried to add a 16-second long preparation to the mixin we already have and it’s correctly applied… but then the UI that shows versions of resources simply waits 16 seconds and then shows everything. So I still can’t reproduce the timeout error.
I have also modified the Ash.Query.timeout and Ash.read! calls in the pipeline to only wait for one second. Still no dice, the UI just shows “loading…” for 16s and then loads everything.
I am thinking I’ll just use plain Ecto for this one.
You would have had to wait inside of a before action hook to get the behavior you were looking for, sorry for leading you astray
Do you have a default timeout configured in Ash anywhere? Are you loading relationships as part of what you’re doing? The timeout for the parent query doesn’t override the timeout for the read actions that back relationships, for example.
Ultimately, I think we’re going to have to be more methodical here. Ash doesn’t make up timeouts for example. I’m going to need to see more code, even if it’s anonymized etc. to really help. I’ll need to see specific error messages, logs, and understand exactly what’s happening. It’s all too vague for me to really help.
Separately, I’ve just done release day which does include a fix for a very niche timeout bug that probably isn’t what you’re hitting, but you never know. I’d suggest updating.