I’ve just started learning Elixir and have spent many hours with fascinating YouTube videos bulking up on everything I need to learn for a project I’m working on. A major component involves maintaining a running accounting balance on dozens of different currencies being traded which is updated every few seconds. Naturally, it’s imperative that this data be preserved at all costs so ETS is not an option however speed is also imperative which, given the tremendous volume of reads/writes involved per second, rules our using a database.
Ideally I’d like the app to store this data in a file on the same server it’s running on for immediate access. Given the constraints, it seems that my only option is to have the app create a file (JSON maybe?) which it would then parse and update as needed but that seems awfully cumbersome. What do you think? Am I on track here or am I missing something?
Thanks so much for your insights!
If the data you’re dealing with is important (like financial transactions), I’d recommend using an external database (PostgreSQL or whatever other database software you prefer), at least until you’ve gained a good understanding of OTP and distributed systems.
Mnesia and similar distributed databases are interesting, but not quite as simple to operate. Their distributed nature involve some trade-offs and potential pitfalls. Having to face these before you’re ready will likely not be fun times.
Never underestimate the power of postgres :). You can do some very very powerful integrations with Elixir/Postgres.
You may find the TimescaleDB extension to Postgres relevant to your use case. It sounds like you’re inserting a lot of trades on one side and want to calculate running balances on the other. A time series orientation, which implies data sharing along time segments, could be helpful. And of course being Postgres you get the other ACID properties as well.
As you say, it’s the challenge of speed and durability.
Yep. Start with PostgreSQL properly provisioned for high I/O rates. If you can prove it is inadequate, then consider time series database, InfluxDB, M3, etc.
That creating, writing, reading back, and parsing JSON is unlikely to be fast.
You didn’t say anything about benchmarking, so you’re missing a whole lot of something.
Start by characterizing what exactly a “tremendous volume of reads/writes” is, and then test the performance of various implementations.
Rewriting a JSON file is very unlikely to be a performant solution, as the format doesn’t append cleanly. You might want to consider some kind of append-only log for safe persistence. For instance, the Postgres WAL
Thanks everyone for your help
Oh, BTW, if you do try PostgreSQL, the PG mailing list is super helpful
A database might just do. Nobody ignores the elephant and Postgres gets better at sharding all the time.
- What is your RTO/RPO?
- Can you reconstruct ledgers from transactions?
- Can you reconstruct in-memory state from RDBMS state?
- Is your solution read-heavy, write-heavy, or balanced?
- Do you serve internal customers (same data-centre) only?
To expand on the Postgres recommendations, using materialized views in Postgres can provide an easy and performant solution for calculating account balances (here’s an example materialized-view-strategies-using-postgresql). Although you will need to put in some work on refreshing the views, caching, and doing eager updating or triggers.
If Postgres doesn’t scale, you could try a KV store such as RocksDB. I use it for IoT data and it can be pretty fast (even on an embedded device) but only works for one machine. If you really need to scale it, consider Kafka using an event source type pattern (AirBnB example).
as @mikl sad, depends on the kind of the data. If you need ACID transactions, Postgres and Mysql are good options once Ecto lib is very solid for this kind of database.
If you need a document store, you can use Mongodb (you can check a example here https://github.com/pierreabreup/graphqlapi-with-elixir-and-mongo ).
I already used ETS and Redis but I used as a temp data store, in other words, I saved the data in ETS/Redis and seconds after, I copied to Mysql (Aws RDS). It is an approach when do you need a high-performance request ( < 200 ms).
My advice for you is to start with Mysql/Postgre and improve it according to monitoring metrics and customers feedbacks. Starts small always a good option to avoid to create speculative code and avoid over-engineering
Thanks so much for your feedback. Btw, was this you? -> https://www.youtube.com/watch?v=8mXqxBBvNdk
In fact, the reply was not for You… So it’s not You, but sure it is her