System resources used when using Elixir with MySQL vs PostgreSQL

dojap · May 30, 2021, 8:33pm

Have any of you compared MySQL vs PostgreSQL resource usage in your Phoenix app? I would go with Postgres but I am afraid that because my site is mostly about reads (MySQL is better here) and less about writes (Postgres is better here) Postgres would be a bad choice. Also, apparently Postgres is much more RAM hungry and spawns more processes and MySQL can handle more connections with the same amount of RAM.

But this is only stuff I read, I have no real world experience with this.

Have any of you tested both MySQL and Postgres with your Phoenix app? What were the results?

hauleth · May 30, 2021, 8:45pm

Until you grow to the size, where you will be able to hire someone to handle that for you, then you will probably do not see any difference.

It is hard to tell what will work better in your particular use case, so you will need to do benchmarking on your own. However I think that you will still see no difference, and if so, it will be indistinguishable from noise.

dojap · May 30, 2021, 8:52pm

This is from here: PostgreSQL vs. MySQL: 9 key criteria to drive your database decision

PostgreSQL starts a new process with its own memory allocation for each connection it establishes, so it requires a lot of memory on systems with a high number of client connections. MySQL uses a single process and maintains one thread (or path of execution) per connection, which results in less memory pressure.

derek-zhou · May 30, 2021, 9:53pm

In most sane environments there is a connection pool so the number of connections are bounded. The per process in PG overhead is not a big deal.

If I have to some layman comparison between PG and MySQL I would say that PG is more dev friendly, and MySQL is more ops friendly.

dojap · May 31, 2021, 7:44am

Thanks, I think I will go with MySQL then.

The scaling in the future factor is a huge plus of MySQL.

Also Uber went back to MySQL from Postgres. So, if MySQL can handle Uber it will be ok for me, I guess. MySQL also seems to be better configured for common web usage out of the box.

Postgres seems to need much more research and learning how to configure stuff out of the box.

The advantage of being able to embed Postgres with my app because of it’s liberal license is not important for me because it’s a web service hosted on a server and not a stand alone program. So, the MySQL license is ok for me for my web project meaning the SaaS is what I am after anyway.

Also, again, MySQL seems to be focused on web usage a lot, meaning much more reads than writes, as mentioned here: PostgreSQL vs MySQL: The Critical Differences | Integrate.io

PostgreSQL is known to be faster while handling massive data sets, complicated queries, and read-write operations. Meanwhile, MySQL is known to be faster with read-only commands.

Perhaps some realtime data input recording would be a better case for Postgres, I think in my case MySQL is the better option. 98 - 99% of everything in my service will be reads, only a tiny fraction will be writes to the database.

hauleth · May 31, 2021, 8:19am

This shouldn’t be a problem until you really grow.

Because for their particular use it was better. And their usage was quite specific as they are basically using DB as a KV store.

It also could handle Google. On the other hand Heat, Algolia, Netflix, Instagram, etc. use PostgreSQL. Using some piece of technology just because “X uses it and it works for them” isn’t wise. It is following trend, which isn’t always good thing.

I am on the PostgreSQL side, mostly because of it broader support for the SQL standard and also some useful things that make life much easier. For example tuple comparisons, support for array columns, better support for CTEs, more index types, better defaults, better UTF-8 support, etc.

Then maybe you do not need DB at all, and instead, for example, single file that would be loaded into memory would be better solution?

dojap · May 31, 2021, 9:11am

For example tuple comparisons, support for array columns, better support for CTEs, more index types, better defaults, better UTF-8 support, etc.

Thanks. I am curious in this part. I will now read about the utf8 support you linked, but could you explain the other things, or link to explanations if you don’t have time to explain that? Thank you.

Then maybe you do not need DB at all, and instead, for example, single file that would be loaded into memory would be better solution?

My web is using relationships, I use joins. Table structure is perfect for me. I think in tables. One file would be weird. Especially with that amount of data present there.

Also, isn’t the MySQL or Postgres DB loaded in the memory as well? For example, the more DBs I have and use the more RAM the mysql/postgres is using on my computer.

focused · May 31, 2021, 9:28am

Maybe try Clickhouse for high speed “selects”, it is very very fast for reads; updates and deletes (mutations) not designed for frequent use. It has Ecto support, but I did not try it, I wrote plain SQL queries in my service.

dojap · May 31, 2021, 9:36am

Wow, thanks for the tip. Why isn’t this Clickhouse DB more widely known and promoted? Is there a catch? Can you recommend some nice summary or presentation on Youtube about this DB?

focused · May 31, 2021, 9:47am

It is useful primarily for stats and analytics, when you collect and insert user data, and later read it for analysis, but you may use it for any kind of content that requires often reads and almost no updates/deletes. Sorry, I don’t know any presentation or video.

This is how I used it in my microservice:

user visits some content like posts or comments or other users’ profiles - inserts;
later user opens his history with paginated ids and types of content he visited in past (pagination param is a timestamp) - selects;

dojap · May 31, 2021, 9:53am

Thanks. Great. Does Ecto support Clickhouse DB?

focused · May 31, 2021, 9:55am

I did not try it though as I wrote above in my first message.

Jcambass · May 31, 2021, 10:04am

Hi @dojap

I would suggest to stick with Postgres or MySQL unless you know why you need to choose something else.
If you’re trying to build a SaaS you will very likely run into a 1000 other problems before MySQL or Postgres will get in your way.

Personally I would recommend Postgres since IMHO it has more features to offer and good support in Ecto. If you’re already familiar with MySQL then it’s also fine to use that instead.

Instead of focusing on scaling I would focus on features offered, available documentation resources and compatibility. For ecto and Saas style apps Postgres is the first choice for me there.

Both Postgres and MySQL can scale quite well given enough hardware resources. As others have wrote, you will not hit limitations there soon. It’s very likely that app or database design decision you choose will become a problem before the actuall database engine itself.

So my personal recomendation is to go with Postgres (or MySQL if you feel more comfortable with that) and focus on start building your Saas.

For reference:

I have used Postgres in various elixir side projects where it never got into my way and was easy to “set up and forget”.

But I also use it extensively at work where we use it not only for online transactional databases but also stuff like timeseries like databases, analytical use cases and datawarehouses.

While it’s possible the tune certain Postgres settings to reach peak performance it’s not something you need to worry or even know about until you hit those limits.

Feel free to reach out and ask more specific questions. I’m pretty sure you don’t need to focus on scaling but if you know what kind of database workload you’re planning for then a more specific answer might be possible.

dojap · May 31, 2021, 10:27am

Thanks for your reply. Now, I am back to zero . It’s hard to decide. Perhaps I will go with Postgres in the end.

dojap · May 31, 2021, 10:29am

Which language did you use with Clickhouse? Go or something else?

focused · May 31, 2021, 10:30am

Elixir only.

dojap · May 31, 2021, 10:32am

OK, so you wrote direct SQL statements no db mapping, right?

hauleth · May 31, 2021, 10:32am

Remember that Clickhouse will be enormously slow at fetching single rows. So if your application will mostly do stuff like:

SELECT name FROM users WHERE id = 10

Then Clickhouse will make your application terribly slow (nature of column storage).

You still operate in the vague “more reads than writes” - most applications do more reads than writes. Describe directly what you want to do and how many millions of the requests you will need to handle daily. Otherwise I wouldn’t care and I would pick PostgreSQL.

dojap · May 31, 2021, 10:34am

Yeah, perhaps you are right. Clickhouse seems to be good only for analytics.

One question, can you recommend some compact tutorial about Postgres? Not focused on queries but more on settings on Linux (Ubuntu) and pool size etc.?

focused · May 31, 2021, 10:37am

Yes, plain sql and my own helper functions to map results. And as @hauleth mentioned above I select many rows, not one row at a time.