Postgrex Error and Phoenix Memory Usage

Thanks for the swift response. I was actually testing on a low spec database server. I will move the DB to a 16GB, 8Core CPU, I need to handle about 1-2k request per second(will the poolsize still be enough ) - will let you know how it goes. I checked out mojito and looks cool, still benchmarking though as I need a client that’s very fast. I tried curl on linux, it looks a bit faster than mojito.
I observed on my production machine(16Core, 16GB RAM) a lot of beam instance about 50, is this normal. The number seems to reduce on my test box, kindly check below and the memory size is almost 100MB.

Is everything good with indices and the such?

There’s not really any way to tell. My gut says yes, and I have some applications serving that throughput with a smaller pool size. But it’s not really something that I could tell you with the available info.

How many cores are on the server that’s running the application? I don’t think there’s really any red flags right now. The BEAM does have a particular CPU / memory usage profile that can be different than other platforms, but that doesn’t mean it’s bad.

I would recommend a tool like https://github.com/zhongwencool/observer_cli to give you a live view into the application stats at the VM level, rather than an OS view of the VM.

1 Like

Is everything good with indices and the such?

Most of the operations is around :ets - the DB handles only inserts, creating an index might not be ideal as time will be spent in making an entry in the index table.

There’s not really any way to tell. My gut says yes, and I have some applications serving that throughput with a smaller pool size. But it’s not really something that I could tell you with the available info.

I will play around the poolsize whiles running the benchmark

How many cores are on the server that’s running the application? I don’t think there’s really any red flags right now. The BEAM does have a particular CPU / memory usage profile that can be different than other platforms, but that doesn’t mean it’s bad.

16Cores - I also added the observer_cli - it seems to start a lot of processes when I connect via remote and start the observer, nevertheless it’s a wonderful tool. 100 - 120MB was allocated to the application and actual usage was around 70MB, I was wondering why the beam was not releasing the memory back to the OS(inner workings I guess)

@sb8244 - I’m so grateful for your guidance & your book has been a great friend to me.

Ah, I didn’t realize that reads were coming from ETS. Are you running ETS in concurrent read mode? Do you have any single process bottlenecks?

It’s hard to tell exactly where the bottlenecks you’re seeing are. One thing to do is to enable “sort by reductions” in observer CLI and then look if you have a process getting hit hard.

I am not sure that observer_cli starts a lot of processes, but it’s hard to tell exactly what your’e seeing.

Yes, inner workings. It holds onto some allocated memory so that it can internally allocate it without a system allocation. It’s fairly complex, though.

I’m glad to hear that, thanks for sharing. :slightly_smiling_face:

1 Like

Postgres forks a new process for each new client connection which allocates a non-trivial amount of memory (about 10 MB).

No, unless you hold on to some very big strings or parts of them: Memory leaking with long running processes

I’m curious to ask if you are closing your db connection in your script. Because if not, processes will stack up in the background for every opened, and not closed db connection. Also, you should never use root as the user ever in any program deployed to the public.

It looks like that they didn’t make a “Elixir User” and group like they should have when I installed it, but that’s a different issue, but remotely connected because by your screen shot: running the program with root access.

I would try using MySQL as the database engine. It scales better in a lot of instances than PostgreSQL.

MySQL tends to scale extremely poorly except in the cases where it’s mostly used as a KV store (in those cases why use SQL?). For even moderately sized SQL queries PostgreSQL scales significantly better than MySQL.

1 Like

One would think that, but that is why I test my deployments with the two engines. But the one folly with Posgres is the connection timeout doesn’t always work, so unused processes start to stack up. It seems it needs the db to be closed in the script code after querying or else excessive memory leaks happens.

As far as the server, I would never recommend putting it on a web page server, or shared web hosting because those container configurations are typically limited to 5% of the system resources per container. This is how host providers like Godaddy divide a server for several web sites. I do provide a higher cpu allocation in my executive package (20% of 24 cores @ 64 GB ram) but the hosting management software limits my maximum allocation for shared hosting to 20% of resources per web container. If your interested in something like that, you can check it out at: https://www.audiospecific.com . I used to lease my servers for streaming, but I decided to allocate some for other web hosting. I have someone working on an Erlang hosting interface so I can offer Elixir hosting. Which makes me wonder what we should call it: Erlang Hosting, Elixir Hosting, or Phoenix Hosting.

Ah, I didn’t realize that reads were coming from ETS. Are you running ETS in concurrent read mode? Do you have any single process bottlenecks?

I checked the observer_cli, no observable bottleneck, ETS is fine, it’s powered by a GenServer but interaction is done directly without going through the GenServer aspect. write & read concurrency are both set, I purge the :ets table after every 3mins using select_delete

I observed a few DB errors, client were timing out on the connection pool. I’m moving the database to dedicated server, I will run another benchmark and update with my findings.

I tried Mojito as suggested and it looks awesome. Currently going through the memory link to learn more. Keep the resources coming, interesting stuff. Thanks once again.

1 Like

I’m using a connection pooling; basically connections are recycled, something like keepalive. I’m running the app on a test machine with production specs, hence the root account - thanks all the same. It’s an private infrastructure with no direct public access.

1 Like

BEAM hosting, using the name of the runtime system.

Mojito is good for a sealed environment, mainly because Javascript can get flaky on the client side and cross browser compatibility issues that arise from deploying it publicly. I personally think its not good for certain web sites like on line shopping, or social media. But if it doesn’t require high security with a database, its ok.

Mojito is a higher level http client based on Mint. I’m sure we are talking about different things.

1 Like

Mojito is a higher level http client based on Mint . I’m sure we are talking about different things.

Interesting. I wondered why they renamed the Elixir XClient…
Its not a higher level http client. Its one based off of Haskell/binary language instead of the C++, which might have less security issues because it wouldn’t have the same C language exploits.

Yes, I was thinking of the Mojito Framework. I didn’t realize someone re-used the name.

Ecto logs how long the queries are taking. You want to look at two values: query time or queue time. If the query time is high, then it is your queries that are taking long, hence the timeouts. If the queue time is high, then it is that your pool size is low compared to your load. Note increasing the pool size does not necessarily lead to better performance since it leads to a less efficient use of caching.

To be clear for future readers, afaik Mint nor Mojito are based on none of these things. :slight_smile:

4 Likes

Ecto logs how long the queries are taking. You want to look at two values: query time or queue time. If the query time is high, then it is your queries that are taking long, hence the timeouts. If the queue time is high, then it is that your pool size is low compared to your load. Note increasing the pool size does not necessarily lead to better performance since it leads to a less efficient use of caching.

I’m currently setting up a postgresql server to run another benchmark; it was a simple insert using Repo.insert_all with request coming in from siege or wrk (between 1-2k request per second)

Another error I observed was Postgrex.Protocol (#PID<___>) disconnected: ** (DBConnection.ConnectionError) client #PID<___> exited - I would be grateful if you could educate on this error. I’ve made a lot of noise about Elixir; I need to get it right - I must admit that I’m a bit nervous.

In context - I know it’s a bad practice; can connection pooling be turned off, there are instances where you have an application that occasionally talks to a db, keeping a pooling around IMHO is not ideal as resources(however small) is required to maintain connection both ways(client & server).

Kudos for the awesome work, I will revert with my findings once the Postgresql DB is setup.

1 Like

Afaik this is the error when the pid requesting a query result exits before the result of the query was delivered. E.g. if your http request handling process dies because it hit its max execution time, but the query is still queued / the results not yet returned.

5 Likes

I installed postgresql v11.5 on a server with 16cores & 16GB of RAM. I configured siege with a concurrency of 1000. The phoenix app had other modules but I tested only the module that inserts record into the database.

image_2019_11_08T13_24_24_179Z .

I used default phoenix/os configuration with the exception of the file descriptors.

I check the log file and there was no error, the db server wasn’t that busy as most of the processes were idle.
A few observations and will be glad if I can be educated on it.

  1. Allocated Memory within the VM is different from what is seen at the OS level.

  2. From the observer_cli, what is the essence of the numbers 1 to 32.

3.An general advise on deploying application to production.

Thanks a lot.