Looking for help with poor Ecto query performance

We’re at 60+ messages on this thread. Might as well pile on. :smiley:

2 Likes

There goes nothing then. With the tip of your GIT repo being at commit 53cb0560c7dc62558aeb11ff65d22fb13c4610ff:

CPU (cores/threads) Test IPS Duration
Xeon-2150B, 10/20 Elixir create 25295 10s
Xeon-2150B, 10/20 pgbench create 33692 10s
Xeon-2150B, 10/20 Elixir update 41635 30s
Xeon-2150B, 10/20 pgbench update 131118 30s
i7-4870HQ, 4/8 Elixir create 16981 10s
i7-4870HQ, 4/8 pgbench create 30152 10s
i7-4870HQ, 4/8 Elixir update 21701 30s
i7-4870HQ, 4/8 pgbench update 47926 30s

Observations:

  • Overall, most numbers are a little lower than before – interesting. Might be due to thermal throttling, especially on the MBP. But I can’t notice a significant difference overall.
  • pgbench uses less CPU than Elixir which leaves more CPU for Postgres – I noticed Postgres utilising the CPU more in the pgbench tests.

Let me gather my full feedback on the testing methodology:

  • Shouldn’t use .exs scripts, that’s less efficient (although I doubt it amounts to much).
  • Should use the same set of random keys used to insert/update records. It’s going to be a little more scientific and get more consistent results that way IMO.
  • Should use more Elixir processes. Your observation on the numbers when going from 10 to 40 makes sense. Between processes being extremely lightweight and the DB pool limits your tests should IMO do their best to keep the Postgres DB very busy.
  • Again, standardise the output of all commands. You can capture and hide pgbench's output and just output the IPS/TPS number.

Thank you. All good feedback. We are close to the end of our day here, but will follow up on these suggestions tomorrow.

FWIW pghr (the initial experiment) has largely served its purpose for me. As I mentioned a few posts ago, the idea was that I would layer on the complexity in our internal application and find the cliff that we were falling off of.

I found it today.

We were using pg_notify as part of an internal change notification mechanism. When I introduced that (even with no listeners), Ecto performance fell off by a factor of 3. pgbench performance showed no such impact.

I’ll post updated results on the pghr repo soon (hopefully tomorrow). In the meantime, armed with that information, we can achieve the desired performance by using other technology for our change notification.

2 Likes

@scouten be advised that pg_notify uses an O(n^2) algorithm for determining notification uniqueness within a transaction. pg_notify always deduplicates messages within a transaction, and unfortunately the mechanism it has for that can cause significant slowdowns when it does the uniqueness check even if there are no listeners.

That is quite odd, looking forward to the update.

2 Likes

What I found baffling was that beam.smp CPU consumption leveled off at almost exactly 100% according to Mac OS Activity Monitor, compared to ~280% before introducing pg_notify (on two different 4-core machines).

I’ve read recently about the O(n^2) algo in pg_notify so I’ll admit that I’m looking at that with a strong side-eye.

Very weird, how are you using pg_notify? Are you using it directly or are you using some kind of library that may be pushing events through a genserver and creating a bottleneck?

Even stranger: In this benchmark, I’m not consuming the notifications at all. (That was going to be my next step in layering back in complexity.)

If you comment out the PERFORM pg_notify line in the just-added schema, perf goes back to previous levels.

In my “real” application, I have a separate GenServer process that consumes the notifications.

Can you show where you’re using pg_notify? I’m not seeing it in the code base. The performance cost of pg_notify isn’t going to go to 0 when there are no consumers, that isn’t really surprising to me.

Shoot. Didn’t post that code. I’m on the road today, but should have time to share that out tomorrow morning.

OK, got a few spare moments so I was able to post it tonight.

Curious on this one, I gave it a spin here as well.

With pgbench without the pg_notify trigger, I get a bit over 80k TPS. With the trigger, pgbench manages “only” 50k TPS, which is only 62.5% the performance relative to without.This is quite reliable here.

Using ecto, I get 44k TPS without the pg_notify trigger, and 33k with the trigger which is 75% the performance of without.

The drop in performance is less (relatively) in ecto than pgbench, which makes sense since there is overhead in the Elixir code (inc ecto, of course) which gets paid on every transaction, which masks the performance hit due only to the trigger. But the performance drop is there, and significant in both pgbench and the ecto benchmark.

Simplifying the trigger itself by removing the string coercion and concats results in a small improvement in performance, which also makes sense since that code is actually being run in the db. Keeping the trigger but removing the pg_notify eliminates the performance penalty entirely.

Moving to just one thread (vs 10), the penalty is reduced to 83% as fast with vs without the pg_notify, so it certainly costs more as the number of parallel actions occur …

So, seems that the issue is pg_notify itself, and is indeed likely the 0(n^2) behavior showing up when there are multiple messages being broadcast on the same notification channel.

There are patches on the postgresql mailing list that have been rotting for a year or more (depending on the patch). This seems to be an issue that is not going to see easy resolution. If this is indeed the source of your performance problems, it may be time to look at a solution other than pg_notify. :confused:

If your application(s) are entirely Elixir and/or database mutations all go through your Elixir code, you may want to look at phoenix_pubsub. It certainly scales decently, and works lovely as a message bus. If you need horizontal scalability and can not do BEAM clustering, then perhaps move to a full external message queue bus solution or if you have to (or just want to) keep it all in-app perhaps look at something like lasp-pg.

Good luck!

(p.s. Also tried with the straight NOTIFY call instead of a PERFORM pg_notify in case there was some gotcha there, but they are indeed equivalent.)

3 Likes

So, I dug into this, and when I did so I realized there’s a huge difference between the settings being used between pg bench and Ecto. The number of threads used for pgbench is NOT the same thing as the parallelism used in parallel bench.

Threads

Notably, the following section is BEFORE introducing pg_notify

From the pgbench docs:

Number of worker threads within pgbench. Using more than one thread can be helpful on multi-CPU machines. Clients are distributed as evenly as possible among available threads. Default is 1.

This is more akin to the number of BEAM schedulers. It’s the -c Client Count that is the true “parallelism” (which really ought to be called concurrency). Here’s why this matters: The settings for pghr that I downloaded had a pool size of 40, but only a parallelism of 10. This means there are 30 connections just sitting idle at any given time since there are only 10 processes doing work.

Improving the iterations_in_checkout value up to 1000 also makes a big difference. On my computer ecto with these changes is now 87% as fast as pg bench, which is pretty good in my books given that we haven’t really started micro-optimizing the loops or the mechanism we use to check for time.

Randomness

Here you’re using randomness just to make sure that a unique value is entered every time, it doesn’t need to be random in a secure sense. I wanted to make sure that differences in randomness didn’t slow Elixir down, while also keeping things fair. The first thing I did was change the Elixir random value to :erlang.unique_integer([:positive]). This actually had a decent impact, getting me to within ~90% of pgbench. At this point though I wanted to make sure that pgbench wasn’t being slowed down by randomness, so I added a create_item_fast_seq which used a sequence to generate incrementally higher values. This actually slowed pgbench down a fair bit, which makes sense given the additional disk IO.

Pg_notify

Here’s why all of this preliminary stuff was important: If you have the same number of parallelism and client values between ecto and pgbench, the performance penalty of pg_notify is the same. Both drop by a factor of ~3+. I’ve got a PR up with the changes I made so you can confirm.

BUT SURPRISE: it isn’t entirely pg_notify’s fault. If you change PERFORM pg_notify to EXECUTE 'SELECT 1' you still lose 40% throughput just from running the trigger.

8 Likes

If you are willing, could you please make a PR and link it here? Really curious to see your changes.

Ben did post a PR last night (https://github.com/scouten/pghr/pull/17). I’m reading through it now. There were a few distinct changes he’s proposing and I’m thinking about splitting them apart and layering them in separately so we can assess each one independently.

1 Like

Really enjoying this thread (and I suspect I’m one of many lurkers). Thank to you all for digging into this.

6 Likes

Update (probably last one until Monday or Tuesday as I’ll be traveling in places with limited Internet access):

I’ve worked in all of @benwilson512’s feedback and I’m really happy with where we’ve landed. Please see the new README in the pghr repo (https://github.com/scouten/pghr#results). tl;dr:

  • Ecto performance is between 87% and 107% :rocket: of pg_bench write throughput on apples-to-apples comparison of the two systems.
  • pg_notify clearly adversely impacts perf. My team has decided on an approach that doesn’t rely on pg_notify and we have a good idea of the perf we can expect once that work is complete. And that we will be satisfied with that result.

Thank you, everyone, who has participated in this effort. It’s one of the things that makes Elixir a great platform to build upon.

7 Likes

FYI everything that I’ve been working with is now merged into the pghr (short for Postgres hot rod :wink: ) repo so you can check methods and results.

This includes all of Ben’s suggestions, except for the use of sequence in the pg_bench create test, which did not appear to help performance. (See PR #22 in that repo, which was not merged.)

2 Likes

I’ll post a 3rd round of tests on my MBP 2015 and iMac Pro pretty Soon™ as well.

1 Like