Macro to provide timing for a function call - handle any signature

BartOtten · February 14, 2025, 9:51am

Nice once. Although issues caused by n+1, long database (row) locks and memory limits are mostly non existent in dev environments. Every non-indexed table is working in dev environment

@MadsBoydMadsen If the problem remains and you are able to spin up a test environment, I know a reliable company which can help stress / durability test the app by carefully drafted scenarios.

why a particular action, that our users are doing, is timing out.

Can you tell use the particular action? I like black box riddles

Tip for those with no Product Owner or higher Management: have a CI pipeline which tests performance with a large (generated) dataset every night after changes to main branch (or even PR branch). Store the result and auto compare, raising when change is above a threshold. Once found a “add row element” caused the whole DOM tree rendering to come to an halt as in production it caused a lot of new DOM.

MadsBoydMadsen · February 14, 2025, 10:00am

Riddles: A user from a particularly large (maybe even the largest) tenant is trying to make an edit to a particular resource. The user brings up the resource in the UI (Angular), makes the modification here and presses Submit. … and sits there fore a mighty long time and wonders what’s going on.

MadsBoydMadsen · February 14, 2025, 10:03am

Performance testing in the CI pipeline ! Wouldn’t that be wonderful. We don’t have it. The UI is auto-tested in Dev and QA over night, but there are no performance tests in there. And it is not run against Prod during prod-releases.

BartOtten · February 14, 2025, 10:04am

How are tenants separated? Database, schema or row level (tenant_id foreign key)?

BartOtten · February 14, 2025, 10:05am

Well, then you are the 90% already! Generate a large dataset, simply time the actual app test steps (in total or per step) and compare

It’s not a solution for now though

MadsBoydMadsen · February 14, 2025, 10:05am

Single database. Separated by tenant-id attached to all rows in all tables. … well … nearly all tables.

BartOtten · February 14, 2025, 10:12am

When the issue is in the DB (aka: you can’t find anything in your code) some pointers to possibly pin down the issue.

When that large tenant updates the resource, do other tenants also experience the issue to update their resource? If so: table lock?
Already checked for row (dead)locks due to combine constraints in a transaction? Not all query measurement tools report the constraint lock times properly for transactions which makes the individual updates seem fast but in fact they all had to wait a long time before. Also when testing every query they all fly! It’s the combination in a transaction which kills it.

Databases usually come up with a solution after a loooong time but mostly they hit timeout (or your app does) when there are lot’s of constraints to check (many updates in a transaction, less updates but with many constraints) or loops. Depending on database settings whole tables are locked and all operations suffer as a result.

‘Simple’ test: use and measure the action at a testing env with multiple size datasets, see if it increases. Yes? Disable constraint checks and see if their timings become more constant.

Success hunting!

MadsBoydMadsen · February 14, 2025, 10:13am

I don’t mean to sound rude or anything: I need to continue the work on my short term solution now. I’ll keep all these lovely suggestions in my bag for another day. Similar problems are bound to appear later, and one might want to be better prepared

D4no0 · February 14, 2025, 11:14am

A shot in the dark, however if you want to find bottlenecks in your elixir code, you might want to take a look at trace. It allows tracing the number of reductions, functions that are called often etc. It is the official tool to deal with these kind of performance problems.

The only problem is that using this functionality directly is not the easiest thing, if I remember correctly there was a simplified elixir/erlang library for that, but sadly I couldn’t tell the name of it as I never used it.

Here is a video by @sasajuric, where he shows an example of how to apply this in action:

dimitarvp · February 14, 2025, 12:45pm

Then here’s one more homework I’m afraid: you’ll need to integrate OpenTelemetry in your project.