I set up a very simple TimescaleDB /Postgres 11 instance in a Docker container and then tried running a simple 1 record (3 fields: timestamp, value, device_id) insert from within a Benchee script.
This ends up occasionally causing multiple ‘too many clients’ errors as it runs (but mostly succeeding)? I had a couple pgAdmin windows open, but it seems weird that this wouldn’t just use a single connection for benchmarking? Am I missing a setting somewhere for Postgres (I’m new to it and Ecto).
TimescaleDB supposedly can handle hundreds of thousands of inserts per second. I’m only getting about 700/s. Is this overhead from Ecto or Elixir?
Any ideas or suggestions to look into would be appreciated!
When asking performance questions, details matter. What OS are you running this on? On OS X the docker environment is virtualized which will add a performance penalty. How are the docker volumes mounted? This can also cause significant performance issues.
Ecto will always connect with a pool of connections. You should figure out what the limits are for your postgres configuration and perhaps adjust them.
As far as TimescaleDB’s overall performance goes this is going depend heavily on how you’re inserting records Enormous numbers like that are almost always achieved while using bulk insert methods (COPY or ecto’s insert_all). They’re also unlikely to be achieved while running the database and the Elixir server on the same computer because you’ll be fighting for resources the whole time.
Wow! That is awesome information!
I am indeed on OSX.
I have the Postgres data stored outside the container via docker-compose with default settings such as:
I had dumbly assumed their benchmarks might not be for bulk inserts, but they probably were…
Thanks for the suggestions!
Ah yes so:
imposes a significant performance penalty due to the effort Docker and the VM has to go to to keep everything synchronized.
If benchmarking is your goal I would absolutely make sure to remove docker from the picture on OS X.
Interesting – my eventual production environment would be Redhat Linux.
I’ll have to test there and see the difference.
Ultimately, a single connection inserting a single record at a time will only be able to achieve
1 / insert time inserts per second. For example if the complete roundtrip required to insert a single record is 5 ms, then a single connection can only ever achieve 200 inserts per second. Even if you can do 1 ms, then that’s still just 1000 records per second. Achieving more requires higher levels of concurrency, and shoving more records into each round trip.