DB Alternatives - when speed is imperative with a large volume of reads/writes per second

Maxximiliann · September 25, 2019, 10:32pm

Hi!

I’ve just started learning Elixir and have spent many hours with fascinating YouTube videos bulking up on everything I need to learn for a project I’m working on. A major component involves maintaining a running accounting balance on dozens of different currencies being traded which is updated every few seconds. Naturally, it’s imperative that this data be preserved at all costs so ETS is not an option however speed is also imperative which, given the tremendous volume of reads/writes involved per second, rules our using a database.

Ideally I’d like the app to store this data in a file on the same server it’s running on for immediate access. Given the constraints, it seems that my only option is to have the app create a file (JSON maybe?) which it would then parse and update as needed but that seems awfully cumbersome. What do you think? Am I on track here or am I missing something?

Thanks so much for your insights!

hauleth · September 25, 2019, 10:33pm

Mnesia

mikl · September 25, 2019, 10:39pm

If the data you’re dealing with is important (like financial transactions), I’d recommend using an external database (PostgreSQL or whatever other database software you prefer), at least until you’ve gained a good understanding of OTP and distributed systems.

Mnesia and similar distributed databases are interesting, but not quite as simple to operate. Their distributed nature involve some trade-offs and potential pitfalls. Having to face these before you’re ready will likely not be fun times.

joshtaylor · September 26, 2019, 12:07am

Never underestimate the power of postgres :). You can do some very very powerful integrations with Elixir/Postgres.

kip · September 26, 2019, 1:37am

You may find the TimescaleDB extension to Postgres relevant to your use case. It sounds like you’re inserting a lot of trades on one side and want to calculate running balances on the other. A time series orientation, which implies data sharing along time segments, could be helpful. And of course being Postgres you get the other ACID properties as well.

As you say, it’s the challenge of speed and durability.

sribe · September 26, 2019, 1:24pm

Yep. Start with PostgreSQL properly provisioned for high I/O rates. If you can prove it is inadequate, then consider time series database, InfluxDB, M3, etc.

That creating, writing, reading back, and parsing JSON is unlikely to be fast.

al2o3cr · September 26, 2019, 1:51pm

You didn’t say anything about benchmarking, so you’re missing a whole lot of something.

Start by characterizing what exactly a “tremendous volume of reads/writes” is, and then test the performance of various implementations.

Rewriting a JSON file is very unlikely to be a performant solution, as the format doesn’t append cleanly. You might want to consider some kind of append-only log for safe persistence. For instance, the Postgres WAL

Maxximiliann · September 26, 2019, 11:35pm

Thanks everyone for your help

sribe · September 27, 2019, 12:21am

Oh, BTW, if you do try PostgreSQL, the PG mailing list is super helpful

evadne · September 28, 2019, 1:15am

A database might just do. Nobody ignores the elephant and Postgres gets better at sharding all the time.

What is your RTO/RPO?
Can you reconstruct ledgers from transactions?
Can you reconstruct in-memory state from RDBMS state?
Is your solution read-heavy, write-heavy, or balanced?
Do you serve internal customers (same data-centre) only?

elcritch · September 28, 2019, 4:50am

To expand on the Postgres recommendations, using materialized views in Postgres can provide an easy and performant solution for calculating account balances (here’s an example materialized-view-strategies-using-postgresql). Although you will need to put in some work on refreshing the views, caching, and doing eager updating or triggers.

If Postgres doesn’t scale, you could try a KV store such as RocksDB. I use it for IoT data and it can be pretty fast (even on an embedded device) but only works for one machine. If you really need to scale it, consider Kafka using an event source type pattern (AirBnB example).

pierreabreup · October 1, 2019, 6:05pm

as @mikl sad, depends on the kind of the data. If you need ACID transactions, Postgres and Mysql are good options once Ecto lib is very solid for this kind of database.

If you need a document store, you can use Mongodb (you can check a example here https://github.com/pierreabreup/graphqlapi-with-elixir-and-mongo ).

I already used ETS and Redis but I used as a temp data store, in other words, I saved the data in ETS/Redis and seconds after, I copied to Mysql (Aws RDS). It is an approach when do you need a high-performance request ( < 200 ms).

My advice for you is to start with Mysql/Postgre and improve it according to monitoring metrics and customers feedbacks. Starts small always a good option to avoid to create speculative code and avoid over-engineering

Maxximiliann · October 3, 2019, 5:15pm

Thanks so much for your feedback. Btw, was this you? -> https://www.youtube.com/watch?v=8mXqxBBvNdk

pierreabreup · October 17, 2019, 2:39pm

no

kokolegorille · October 17, 2019, 3:13pm

In fact, the reply was not for You… So it’s not You, but sure it is her