MnesiaKV - new KV store ontop of RocksDB + ETS

vans163 · November 25, 2020, 12:19am

Pretty much most of the info in the description, but its basically ETS with persistence, for use cases when Mnesia is much too heavy or when you want to actually not lose any data.

There is only 2 operations, merge and delete. All merges are deep.

Also simple subscriptions to changes.

Lemmy know what you think, love it / hate it / suggestions? hehe…

benwilson512 · November 25, 2020, 3:55am

@vans163 cool project, looking forward to dig into it more!

As a minor note, maybe this isn’t the clearest name? There is, iirc, another effort to make actual :mnesia use rocksdb for persistence and this project doesn’t actually work like or otherwise use mnesia.

elcritch · November 25, 2020, 4:10am

Will have to check this out… Its an awesome concept! Rocksdb does a good job as a KV, but its a bummer to miss out on ETS abilities like match. Unfortunately this looks to be using the Rust based RocksDB bindings which makes cross compiling kind of suck.

vans163 · November 25, 2020, 4:56pm

@benwilson512
Yea the name is not the best, should probably rename it… any ideas? The reason I called it Mnesia is because we often use Mnesia in place of Redis, but dont use Mnesia as an actual database, Mnesia is so much more indeed.

@elcritch
Yea the main idea is to just mirror all writes into Rocksdb, all reads still go only to ETS. I started brainstorming ways to do journaling and SSTs, then thought why not just use Rocksdb.

Not sure about the cross compile, maybe I could look into it, the rust-rocksdb does not compile on OSX or Windows? Or :rocker, or?

cmkarlsson · November 25, 2020, 5:16pm

how about RockETS?

vans163 · November 25, 2020, 7:12pm

I ran some preliminary benchmarks, not too happy with what I am seeing in terms of concurrent writers really affecting the performance. Maybe there is some tuning knob?

4 core i5-7500 CPU @ 3.40GHz
ext4, consumer SSD

MnesiaKV.Bench.write_to_file_unsafe(4)
1.6m write tps

MnesiaKV.Bench.mnesia(4)
266k write tps

MnesiaKV.Bench.rocksdb(4)
120k write tps


8/16 core i9-9900K CPU @ 3.60GHz
XFS, PM981 NVME

MnesiaKV.Bench.write_to_file_unsafe(16)
5m write tps
MnesiaKV.Bench.write_to_file_unsafe(12)
5m write tps
MnesiaKV.Bench.write_to_file_unsafe(8)
3.8m write tps

MnesiaKV.Bench.mnesia(16)
640k write tps
MnesiaKV.Bench.mnesia(12)
1.02m write tps
MnesiaKV.Bench.mnesia(8)
1m write tps

MnesiaKV.Bench.rocksdb(16)
160k write tps
MnesiaKV.Bench.rocksdb(12)
189k write tps
MnesiaKV.Bench.rocksdb(8)
228k write tps
MnesiaKV.Bench.rocksdb(4)
260k write tps
MnesiaKV.Bench.rocksdb(1)
330k write tps

mpope · November 25, 2020, 8:12pm

I think it’d be worth trying to integrate Benchee for benchmark running. It makes pretty graphs, and takes care of warming caches, etc. In your bench mark you iterate from 1..100000, I’d try to vary that number and remove the timer and maybe replace ets with a Enum reduction with an accumulator. Timer and ets calls add overhead, which make the benchmark impure. All these things add overhead. Using Benchee, it is much easier to parameterize tests like that, to find the sweet spot and what could use improvement.

mpope · November 25, 2020, 8:27pm

Looking harder, it looks like rocker is a rust nif. It is probably worth timing the rust bindings separately, and gauging whether the nif is properly configured. In some cases, using a yielding nif might help things, or if the benchmarks look really bad a dirty nif. If the nif’s overhead is larger than expected, it can be disastrous as nifs block schedulers. Knowing that the underlying nif code is performant is key for this library. The perf test included writes and reads with very small binaries, while your benchmark uses a map that continually grows. Running your benchmark with inputs of smaller size might highlight weaknesses, as well.

erlang:term_to_binary is called in your benchmark. It has overhead of it’s own, pre-constructing these binaries should be considered as well.

elcritch · November 25, 2020, 9:57pm

Its really more specific to Nerves. The last time I tried it, rustc/cargo was alright cross-compiling stuff with it’s own tool chain. However this means rustc/cargo doesn’t/didn’t properly lookup the $LD environment variable set for the C cross compiler linker & libc. You can hack it and pass cargo the correct flags, but you have to massage the rust arch type and its a hassle. So it really only affects actual cross compiling (e.g. compiling on linux/x86 to linux/arm).

marciol · April 11, 2021, 7:06pm

@vans163 did you already see mnesia_rocksdb from Ulf Wiger from Aeternity? I’m wondering about the differences. Maybe of interest of people from Aeternity as @omar-aeternity.

Exadra37 · April 11, 2021, 8:21pm

Or ETSRocks

vans163 · April 12, 2021, 1:11am

mnesia_rocksdb is a plugin to add rocksdb as a mnesia backend. Mnesia itself is a bottleneck as its roots are a transaction db, for this project we just wanted to make a very vanilla inmemory-diskbacked KV store that allowed select/match_object to quickly grep on values.

The goal of mnesia_rocksdb was to expand/replace on DETS space to sanely store a diskonly blockchain. (100s of GBs)

@Exadra37 ETSRocks sounds awesome!

mruoss · April 12, 2021, 10:33am

ETSOnTheRocks!

vans163 · April 12, 2021, 12:48pm

ETSHoldTheRocks

vans163 · April 15, 2021, 7:43pm

I got some new benchmarks using erlang-rocksdb instead of rocker, erlang-rocksdb carries rocksdb 6.13.3, a much newer release.

AMD EPYC 7502P 32-Core Processor
BTRFS, PM981 NVME

102854 1
185513 4
247566 8
374245 16
387341 32
378695 64

#unordered_writes
101003 1
273978 8
356849 16
479530 32
476486 64


Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
BTRFS, PM981 NVME

178514 1
336631 4
409433 8
564763 16
553180 32

#unordered_writes
202838 1
450190 4
491850 8
695923 16
743349 32

Seems it is finally scaling as it should with more concurrent processes.

the_wildgoose · May 2, 2021, 8:13am

Could you add cubdb to your benchmark please?

vans163 · May 2, 2021, 12:54pm

Wow that DB looks really interesting, fully managed code with btree and append only. Let me play with that ASAP.

elcritch · May 23, 2021, 12:07am

So mnesia_rocksdb can store more memory on disk than available ram? I thought dets had a limitation where at least the keys needed to be all loaded in ram?

vans163 · July 7, 2021, 2:52pm

No it cannot because its just a way to persistent ETS without using :ets.tab2file (tab2file is very expensive and obviously unreliable in the face of a shutdown).

At the end of the day your using ETS except the only difference is using normal ets you do

:ets.insert/2

using mnesiakv your insert becomes

:rocks_db.write/3
:ets.insert/2

Using ETS your lookup is

:ets.lookup/2

Using mnesiakv your lookup is

:ets.lookup/2

(yes mnesiakv all lookups dont query rocksdb)

nothing fancy