CubDB, a pure-Elixir embedded key-value database

lucaong · June 27, 2019, 6:37pm

Yeah I should look into datomic too. I used to have lots of fun with Clojure, but I haven’t been using it in a while.

Curious about what you think of the CubDB API and where you would improve it, coming from your experience with datomic.

dch · July 2, 2019, 10:47am

Looks neat. Consider enabling snappy or similar NIF on each term to gain performance and compression. CowDB is derived from CouchDB and the test suite may be relevant - lots of corner cases to uncover.

lucaong · July 2, 2019, 11:04am

thanks @dch !

Yes, compression would be an interesting thing to add, especially for large values. I will add it to the upcoming features to look into. I would try to keep it Elixir-only though, to make it very convenient to use in embedded scenarios.

You are right about testing and corner cases. I have a suite of property-based tests for the B-tree data structure part that proved very valuable in order to discover and fix bugs during the early phases of development.

The append-only, copy-on-write, paginated B-tree data structure is inspired by CouchDB, but CubDB is not using nor depending on any CouchDB code, so I cannot simply apply its test suite. It’s still interesting to look for specific conditions tested there though.

lucaong · July 2, 2019, 9:39pm

Regarding compression, I actually just pushed version 0.6.0, that performs compression on terms before writing on disk. That’s actually really easy thanks to the :compressed Erlang option on term_to_binary/2.

elcritch · July 3, 2019, 7:58am

I find the datalog “recursive predicate” querying interesting, and especially with the time component, but honestly I’ve only dabbled with it.

My primary interest in Datomic is how columns are defined and can be combined anyway into arbitrary entities. Using pure KV stores makes it hard to verify fields, but SQL doesn’t fit a lot of time series and IoT use cases in my experience.

A Datomic query engine would require a subset of a datalog interpreter on top of the KV store and various indexes. Though I think given the nature of Elixir coming from Erlang the datalog part shouldn’t be impossible to do. I’ve been tempted to start it a few times… but I just can’t spare the time.

asianfilm · July 3, 2019, 11:03am

I’m using version 0.4, and it’s proving much more reliable than a previous solution I was employing.

The documentation could be improved a bit; ideally with an example application in a subdirectory. Also, when upgrading to 0.6, for example, does it upgrade the database automatically to the new compressed version?

Right now, I’m just using get, put, delete and compact and haven’t explored select, etc.

Thanks for the package!

lucaong · July 3, 2019, 11:37am

Thanks @asianfilm, I am happy that you are finding it useful too.

You are absolutely right about the documentation. So far, I mostly focused on the API reference, but I intend to add usage examples and a “Getting Started” before version 1.0.0.

I am following semantic versioning, and I do reserve the possibility to make backward incompatible changes to the file format before 1.0.0 if strictly necessary to introduce an improvement. That said, I will try to avoid that, and in case I will clearly communicate it (currently I list changes in the commit message of version bumps, but will add a changelog and GitHub releases before version 1.0.0). The improvements that I have planned so far should be possible without backward incompatible changes.

In your case, updating from 0.4 to 0.6 does not need any manual operation (non-compressed nodes are read just fine, and newly written ones will benefit from compression).

If you explore select and the auto compaction feature, I would be happy to know your feedback about them

seb5law · July 10, 2019, 12:45pm

@lucaong Is there a limit of the size ofthe value I put into cubdb? As we would like to put big pieces of JSON/Maps into the db (Not GBs but definitely tens of MBs).

Are there any other drawback or suggestions that you can think of?

lucaong · July 10, 2019, 1:12pm

Hi @seb5law,
nice that you are considering using CubDB!

There is no hard limit on the size of the value. That said, reads and writes will of course be slower than with small values. First, because of the higher amount of data to be written and read from disk, and second because CubDB organizes disk space in pages of 1024 bytes each, so it will have to add more page headers. This page size is more optimized for small entries, but there should be nothing preventing bigger values from working.

Writes should still be atomic, even in case a power failure happens half-way through writing a value. What could happen in that case, is that the next time the database might start more slowly: CubDB starts reading from the end of the file and looks for the latest “good” header, so it might have to go through many pages before finding it.

In summary, I do not expect any specific problem, apart from slower writes/reads. That said, I honestly did not run benchmarks with keys as large as tens of MB. I would definitely be interested in your feedback if you try that, and I intend to add some benchmarks and run some tests to detect possible corner cases with large values.

My recommendation is to give it a cautious try in a non-critical environment: if you find issues, I will try to fix them before version 1.0.0 (or otherwise explicitly document the limits).

seb5law · July 10, 2019, 1:16pm

I’ll give it a shot and give feedback. Thank you.

lucaong · July 10, 2019, 1:36pm

Take it as a sanity check more than an exhaustive test, but I just tried saving ~20mb values in a loop in CubDB on my MacBook Pro. I did not incur in specific issues. Performance wise, each write took 0.7 seconds. Reads were much faster. Compaction was also taking more or less 0.7s * N.

I will look into how much of that write performance can be shaved off, but nothing seems to be breaking unexpectedly.

seb5law · July 10, 2019, 3:19pm

Same here, values of size 100mb are no problem at all, it just takes a bit. I read in a file of size 107MB and put it in cubdb:

Get from DB                    1.13      881.76 ms     ±2.06%      881.40 ms      910.80 ms
Write File to db               0.27     3757.36 ms     ±2.09%     3757.36 ms     3812.85 ms

Benchmarked with benchee in iex(should be faster in production code)

lucaong · July 10, 2019, 4:08pm

Interestingly, the real bottleneck seems to be compression, not writing the file. I did not expect that, but the fix would be quite easy. I will work on a release that exposes an option to control compression setting or completely disable it.

lucaong · July 10, 2019, 4:58pm

I released version v0.9.0, that is substantially more performant on large entries, like in @seb5law 's use-case.

Compression is now disabled, as it adds more drawbacks than benefits (large entries experience much slower writes, and small entries do not benefit much from compression anyway). This release is still completely backward compatible.

I will consider re-adding compression as a feature in a later release, but only after careful benchmarking, and with a way for the user to configure its parameters.

I will also add benchmarks with Benchee before version 1.0.0, to avoid these kinds of easily detectable performance regressions.

lucaong · July 10, 2019, 10:23pm

And here is the benchmark for v0.9.0 and default options on my MacBook Pro, showing a big improvement with large values (the full suite of benchmarks can now be run with make benchmarks):

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         30.93       32.34 ms     ±7.37%       31.48 ms       40.38 ms

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        262.95        3.80 ms    ±30.40%        3.77 ms        5.46 ms

##### With input 1kb value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        6.67 K      149.86 μs   ±134.87%         141 μs         293 μs

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        6.95 K      143.92 μs    ±40.53%         136 μs         296 μs

wolfiton · July 10, 2019, 10:44pm

Could you compare this with redis?

lucaong · July 11, 2019, 9:20am

Hi @wolfiton,
thanks for your question. I really admire the engineering work of the Redis author, Salvatore Sanfilippo, so it’s nice for me to see CubDB and Redis discussed in the same context.

That said, the Redis and CubDB have considerably different goals and characteristics, and I think the overlap in use-case is quite small. I will try to clarify that a bit:

Redis is a “data structure server”, to which one connects over the network. It keeps data primarily in memory to ensure very fast operations, and uses the disk to recover after a restart. It offers several different data structures (maps, list, sorted sets, streams, etc.) and is agnostic about the programming language used by the user. So, it is shared (multiple apps/instances can connect to one Redis db), very fast, but data must fit in memory. Common use-cases for Redis are: shared in-memory cache, shared data-structure for queues or parallel computation, shared locks.
CubDB is an embedded database, so it run “inside” your application, with no network connection. It works sort of like a map, but persisted on disk (plus all the sorted lookup operations). It can be used directly only by Elixir or Erlang, but has the convenience of having zero dependencies and storing native Elixir terms without requiring the user to implement serialization/deserialization. It is not shared between different apps/instances (unless you implement yourself a server layer on top of it). It stores data primarily on disk, so it can store more data that can fit in memory. It’s designed for robustness in case of power failures, and simplicity to install and use from Elixir apps. Primary use cases would be data storage for an embedded application (think Nerves running on a Raspberry Pi), or data storage within one app instance.

Of course, one could build a small server on top of CubDB, and expose its features over a network, achieving something comparable to Redis maps. That would be a nice project

Right now I am working on the core, and focusing on doing one thing well: a versatile and robust key/value storage. Hopefully that will enable developers to get creative and build more use cases on top of it.

seb5law · July 11, 2019, 10:35am

I ran the same benchmarks with CubDB Version 0.9.0 with the following results:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Number of Available Cores: 4
Available memory: 15.11 GB
Elixir 1.9.0
Erlang 22.0.4

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value
Estimated total run time: 28 s

Benchmarking CubDB.put/3 with input 10MB value...
280 entries written to database.
Benchmarking CubDB.put/3 with input 1KB value...
9854 entries written to database.
Benchmarking CubDB.put/3 with input 1MB value...
1403 entries written to database.
Benchmarking CubDB.put/3 with input small value...
9928 entries written to database.

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         33.92       29.48 ms   ±180.14%       16.60 ms      372.86 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        5.82 K      171.80 μs   ±202.40%      152.79 μs      492.24 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         87.72       11.40 ms   ±844.87%        2.09 ms      476.15 ms

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        7.20 K      138.92 μs    ±31.21%      133.08 μs      225.80 μs
mix run benchmarks/get.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Number of Available Cores: 4
Available memory: 15.11 GB
Elixir 1.9.0
Erlang 22.0.4

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value
Estimated total run time: 28 s

Benchmarking CubDB.get/3 with input 10MB value...
Benchmarking CubDB.get/3 with input 1KB value...
Benchmarking CubDB.get/3 with input 1MB value...
Benchmarking CubDB.get/3 with input small value...

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3         99.74       10.03 ms    ±10.98%        9.72 ms       14.87 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3       20.19 K       49.52 μs    ±83.73%       42.70 μs      161.51 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        959.79        1.04 ms   ±220.99%        0.89 ms        2.47 ms

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3       24.46 K       40.89 μs    ±43.33%       37.50 μs       81.68 μs

wolfiton · July 11, 2019, 11:07am

Thank you @lucaong for the comparison and also for providing a very clear picture of the possibilities and features of CubDB.

dch · July 24, 2019, 6:54am

I ran a few rounds on my admittedly server-class desktop for comparison, with 3 different storage systems: ramdisk (using UFS-like “disk” format, NVMe drive (zfs), and a ZRAID10 zfs striped mirror. The system is running a development OS kernel FreeBSD 13.0-CURRENT r349991+b5dc7bcdcb12(master) GENERIC amd64, but nonetheless its interesting - it’s a metric shitload faster, and in general the deviation is lower. The box is unfortunately not idle. There is not really any CPU limitation to speak of, so we are really just looking here at impacts of IO. The most noticeable effect is that ZFS really starts to show off when compression matters - the 10KiB & 1MiB range. I expect that using snappy compression will not show the massive slowdowns that are seen using (gzip) based erlang term compression BTW. https://github.com/skunkwerks/snappy-erlang-nif or https://github.com/mururu/zstd-erlang are both optimised for very high throughput, and still show overall good compression.

# ramdisk
Benchmark: put/3
================
mix run benchmarks/put.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value, small value, auto sync
Estimated total run time: 35 s

Benchmarking CubDB.put/3 with input 10MB value...
225 entries written to database.
Benchmarking CubDB.put/3 with input 1KB value...
9869 entries written to database.
Benchmarking CubDB.put/3 with input 1MB value...
1537 entries written to database.
Benchmarking CubDB.put/3 with input small value...
9911 entries written to database.
Benchmarking CubDB.put/3 with input small value, auto sync...
9871 entries written to database.

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         34.34       29.12 ms    ±16.86%       28.13 ms       65.94 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        6.41 K      156.05 μs    ±14.92%      154.44 μs      187.16 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        238.78        4.19 ms     ±3.71%        4.14 ms        4.72 ms

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        6.60 K      151.47 μs    ±14.94%      150.22 μs      179.49 μs

##### With input small value, auto sync #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        6.33 K      157.86 μs     ±9.33%      156.40 μs      188.43 μs

Benchmark: get/3
================
mix run benchmarks/get.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value
Estimated total run time: 28 s

Benchmarking CubDB.get/3 with input 10MB value...
Benchmarking CubDB.get/3 with input 1KB value...
Benchmarking CubDB.get/3 with input 1MB value...
Benchmarking CubDB.get/3 with input small value...

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3         98.01       10.20 ms    ±38.66%        8.79 ms       22.29 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3       18.84 K       53.08 μs    ±43.19%       50.56 μs       78.29 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        1.27 K      790.08 μs    ±13.73%      778.78 μs      945.01 μs

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3       20.38 K       49.06 μs    ±30.11%       46.45 μs       69.86 μs

# NVMe drive
Benchmark: put/3
================
mix run benchmarks/put.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value, small value, auto sync
Estimated total run time: 35 s

Benchmarking CubDB.put/3 with input 10MB value...
42 entries written to database.
Benchmarking CubDB.put/3 with input 1KB value...
5567 entries written to database.
Benchmarking CubDB.put/3 with input 1MB value...
360 entries written to database.
Benchmarking CubDB.put/3 with input small value...
5538 entries written to database.
Benchmarking CubDB.put/3 with input small value, auto sync...
2325 entries written to database.

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3          6.03      165.92 ms     ±9.25%      162.61 ms      248.04 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        1.19 K      839.04 μs    ±22.03%      811.18 μs     1886.29 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         51.69       19.35 ms    ±69.71%       18.34 ms       22.69 ms

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        1.20 K      836.28 μs    ±26.43%      813.54 μs     1442.62 μs

##### With input small value, auto sync #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        351.47        2.85 ms    ±34.43%        2.91 ms        4.62 ms

Benchmark: get/3
================
mix run benchmarks/get.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value
Estimated total run time: 28 s

Benchmarking CubDB.get/3 with input 10MB value...
Benchmarking CubDB.get/3 with input 1KB value...
Benchmarking CubDB.get/3 with input 1MB value...
Benchmarking CubDB.get/3 with input small value...

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3         75.29       13.28 ms    ±48.01%       10.05 ms       22.24 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        9.40 K      106.40 μs    ±32.09%      104.16 μs      145.96 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        1.45 K      689.32 μs   ±190.37%      651.65 μs      974.44 μs

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        9.79 K      102.17 μs    ±16.25%      100.55 μs      128.27 μs

# ZRAID10 striped mirror

Benchmark: put/3
================
mix run benchmarks/put.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value, small value, auto sync
Estimated total run time: 35 s

Benchmarking CubDB.put/3 with input 10MB value...
45 entries written to database.
Benchmarking CubDB.put/3 with input 1KB value...
5508 entries written to database.
Benchmarking CubDB.put/3 with input 1MB value...
400 entries written to database.
Benchmarking CubDB.put/3 with input small value...
5676 entries written to database.
Benchmarking CubDB.put/3 with input small value, auto sync...
699 entries written to database.

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3          6.28      159.35 ms    ±17.98%      149.13 ms      294.63 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        1.19 K      841.11 μs    ±19.13%      808.08 μs     1878.28 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         57.31       17.45 ms    ±21.02%       16.74 ms       36.38 ms

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3        1.21 K      827.99 μs    ±13.90%      807.49 μs     1397.60 μs

##### With input small value, auto sync #####
Name                  ips        average  deviation         median         99th %
CubDB.put/3         96.94       10.32 ms    ±47.55%       11.30 ms       27.47 ms

Benchmark: get/3
================
mix run benchmarks/get.exs
Operating System: FreeBSD
CPU Information: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Number of Available Cores: 8
Available memory: 127.83 GB
Elixir 1.9.1
Erlang 22.0.7

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: 10MB value, 1KB value, 1MB value, small value
Estimated total run time: 28 s

Benchmarking CubDB.get/3 with input 10MB value...
Benchmarking CubDB.get/3 with input 1KB value...
Benchmarking CubDB.get/3 with input 1MB value...
Benchmarking CubDB.get/3 with input small value...

##### With input 10MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        103.71        9.64 ms    ±53.65%        7.19 ms       21.91 ms

##### With input 1KB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        9.39 K      106.50 μs    ±39.19%      104.20 μs      141.42 μs

##### With input 1MB value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        1.34 K      744.06 μs    ±30.24%      695.75 μs     1836.90 μs

##### With input small value #####
Name                  ips        average  deviation         median         99th %
CubDB.get/3        9.77 K      102.36 μs    ±14.72%      101.01 μs      135.02 μs