Nice once. Although issues caused by n+1, long database (row) locks and memory limits are mostly non existent in dev environments. Every non-indexed table is working in dev environment
@MadsBoydMadsen If the problem remains and you are able to spin up a test environment, I know a reliable company which can help stress / durability test the app by carefully drafted scenarios.
why a particular action, that our users are doing, is timing out.
Can you tell use the particular action? I like black box riddles
Tip for those with no Product Owner or higher Management: have a CI pipeline which tests performance with a large (generated) dataset every night after changes to main branch (or even PR branch). Store the result and auto compare, raising when change is above a threshold. Once found a “add row element” caused the whole DOM tree rendering to come to an halt as in production it caused a lot of new DOM.
Riddles: A user from a particularly large (maybe even the largest) tenant is trying to make an edit to a particular resource. The user brings up the resource in the UI (Angular), makes the modification here and presses Submit. … and sits there fore a mighty long time and wonders what’s going on.
Performance testing in the CI pipeline ! Wouldn’t that be wonderful. We don’t have it. The UI is auto-tested in Dev and QA over night, but there are no performance tests in there. And it is not run against Prod during prod-releases.
When the issue is in the DB (aka: you can’t find anything in your code) some pointers to possibly pin down the issue.
When that large tenant updates the resource, do other tenants also experience the issue to update their resource? If so: table lock?
Already checked for row (dead)locks due to combine constraints in a transaction? Not all query measurement tools report the constraint lock times properly for transactions which makes the individual updates seem fast but in fact they all had to wait a long time before. Also when testing every query they all fly! It’s the combination in a transaction which kills it.
Databases usually come up with a solution after a loooong time but mostly they hit timeout (or your app does) when there are lot’s of constraints to check (many updates in a transaction, less updates but with many constraints) or loops. Depending on database settings whole tables are locked and all operations suffer as a result.
‘Simple’ test: use and measure the action at a testing env with multiple size datasets, see if it increases. Yes? Disable constraint checks and see if their timings become more constant.
I don’t mean to sound rude or anything: I need to continue the work on my short term solution now. I’ll keep all these lovely suggestions in my bag for another day. Similar problems are bound to appear later, and one might want to be better prepared
A shot in the dark, however if you want to find bottlenecks in your elixir code, you might want to take a look at trace. It allows tracing the number of reductions, functions that are called often etc. It is the official tool to deal with these kind of performance problems.
The only problem is that using this functionality directly is not the easiest thing, if I remember correctly there was a simplified elixir/erlang library for that, but sadly I couldn’t tell the name of it as I never used it.
Here is a video by @sasajuric, where he shows an example of how to apply this in action: