lucaong

lucaong

CubDB, a pure-Elixir embedded key-value database

Hello Elixir and Nerves community,
I have been working for a while on an open-source embedded key-value database for Elixir, that I called CubDB. I use it for several IoT projects I run using Nerves, where I need to store large-ish amount of data locally to the device.

I am already using it in production, but before I release version 1.0.0 I would love some feedback from the Nerves (and Elixir) community.

You can find the CubDB repository here
And here the API documentation

A quick basic usage example:

{:ok, db} = CubDB.start_link("my/data/directory")

CubDB.put(db, :foo, "some value")
#=> :ok

CubDB.get(db, :foo)
#=> "some value"

CubDB.delete(db, :foo)
#=> :ok

CubDB.put(db, {:keys, "can", :be, 'anything'}, ["and", :values, 'too'])
#=> :ok

# Check out docs for advanced usage with select/3 and get_and_update_multi/4

I know that Elixir comes with ETS/DETS and Mnesia, but:

  • ETS is not persistent across reboots

  • DETS does not offer sorted collections, and is thus not ideal when one needs to select arbitrary ranges of keys, iterate in order, etc.

  • Mnesia is great, but on embedded projects I don’t need distribution

  • Sometimes I really just need a “persisted map”, sorted by key

  • It’s nice to be able to backup the whole DB by just copying one file

The use-cases I am primarily targeting is what described in this blog post by the Nerves team: https://embedded-elixir.com/post/2017-09-22-using-ecto-and-sqlite3-with-nerves/

CubDB is somehow similar to SQLite in which it stores the data locally in a single file, but it is written in Elixir, is key-value and schema-less, and both keys and values can be any Elixir (or Erlang) terms, so no serialization/de-serialization is needed.

The data structure it uses is an append-only immutable B-tree, inspired by CouchDB: that guarantees robustness to data corruption (no in-place mutation), and enables features like concurrent read operations that do not block writes, and atomic transactions.

It was already a lot of fun for me to develop it, but I would love to hear your constructive feedback.

What do you think about it? Do you have a use-case where this could be useful? Do you have feedback about the API?

Thanks in advance :slight_smile:

230 13925 124

Most Liked

lucaong

lucaong

Also, as this thread is now updated again, I take the chance to announce that the release candidate for CubDB 1.0.0 is now on Hex as v1.0.0-rc.1 :rocket: . It is the result of running CubDB in production on embedded devices for the past year, and introduces a few improvements that make the API more solid. The notable changes are:

  • The database is 100% compatible with the previous releases (I have no intention to break compatibility there).
  • Auto compaction and auto file sync are now the defaults: I decided to go for safer defaults, that should be good for the vast majority of cases, and let users tune it for maximum performance in special cases. More info about compaction and file sync are here
  • The timeouts (for example in select or get_and_update_multi) are now enforced on the callee side too, freeing up resources immediately when a timeout elapses. Their API is also slightly different, as timeout is now passed as an option, instead of a separate argument.

For most users, the changes needed to update to v1.0.0(-rc.x) are minimal: just review if the new defaults are ok for you (or set them explicitly), and, only in case you are using explicit timeouts, adapt calls to select and get_and_update_multi. I will write a proper release post on this forum when 1.0.0 is out, but I wanted to inform in advance people that are reading this thread and that expressed interest in CubDB.

lucaong

lucaong

Hi @wolfiton,
thanks for your question. I really admire the engineering work of the Redis author, Salvatore Sanfilippo, so it’s nice for me to see CubDB and Redis discussed in the same context.

That said, the Redis and CubDB have considerably different goals and characteristics, and I think the overlap in use-case is quite small. I will try to clarify that a bit:

  • Redis is a “data structure server”, to which one connects over the network. It keeps data primarily in memory to ensure very fast operations, and uses the disk to recover after a restart. It offers several different data structures (maps, list, sorted sets, streams, etc.) and is agnostic about the programming language used by the user. So, it is shared (multiple apps/instances can connect to one Redis db), very fast, but data must fit in memory. Common use-cases for Redis are: shared in-memory cache, shared data-structure for queues or parallel computation, shared locks.

  • CubDB is an embedded database, so it run “inside” your application, with no network connection. It works sort of like a map, but persisted on disk (plus all the sorted lookup operations). It can be used directly only by Elixir or Erlang, but has the convenience of having zero dependencies and storing native Elixir terms without requiring the user to implement serialization/deserialization. It is not shared between different apps/instances (unless you implement yourself a server layer on top of it). It stores data primarily on disk, so it can store more data that can fit in memory. It’s designed for robustness in case of power failures, and simplicity to install and use from Elixir apps. Primary use cases would be data storage for an embedded application (think Nerves running on a Raspberry Pi), or data storage within one app instance.

Of course, one could build a small server on top of CubDB, and expose its features over a network, achieving something comparable to Redis maps. That would be a nice project :slight_smile:

Right now I am working on the core, and focusing on doing one thing well: a versatile and robust key/value storage. Hopefully that will enable developers to get creative and build more use cases on top of it.

lucaong

lucaong

After extensive testing on a number of test Nerves devices, I was finally able to identify the issue that @Qqwy reported.

It was a bug with the way the most recent database file is chosen, in cases when a restart happens right after a compaction, but before the old file is cleaned up, and CubDB sees more than one database file. The wrong file was chosen, leading to the new records disappearing.

The issue is solved with the latest release, v0.12.0, which is 100% backward compatible. Thanks a lot @Qqwy for reporting and helping. Version 1.0 is getting closer, thanks to valuable feedback from people in this forum :slight_smile:

Where Next?

Popular in Discussions Top

vans163
So useless benchmarks aside, Its possible to write a webserver that can serve 300k requests per second (perhaps more with optimizations)....
New
sashaafm
I’m trying to evaluate the best combo/stack for a BEAM Web app. Right now I’m exploring Yaws a bit, after having dealt with Phoenix for a...
New
lorenzo
Hey everone! I created a prototype for my app using Nodejs for the api. But the framework I chose wasnt great (in general theresnt any g...
New
crispinb
On reading dhh’s latest The One Person Framework it strikes me that Phoenix with LiveView is already pretty much this. However, never hav...
New
Crowdhailer
I’ve been hearing much about the new formatter and it’s something I have been keen to try. I find examples buy far the most illuminating...
248 19204 150
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
RudManusachi
What configs will make sense to put to runtime.exs? – A bit of how I configure apps: I have generic configs in config/config.exs, dev...
New
jsonify
So, is Heroku the only free option for hosting Phoenix/Elixir at this point? I’m not ready to commit to paying monthly and was wondering ...
New
opsb
We’re considering our architecture from a viewpoint of scaling our traffic heavily over the next 6 months. Our current deployment is runn...
New
sergio
Kind of like when jquery came out, it was super necessary. Existing drag and drop libraries have a bunch of baggage to support old browse...
New

Other popular topics Top

sen
Hi All, I set a environment variables in dev.exs , like below code. when i start server, how can i set the ${enable} value? thanks. d...
New
TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
jerry
Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...
New
JorisKok
I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
Qqwy
Original source of discussion: This topic on the Pragmatic Programmers’ Functional Web Development with Elixir, OTP, and Phoenix forum. ...
New
bsollish-terakeet
Credo is smart enough to check for (something like) this: assert length(the_list) == 0 with this response: Checking if an enum is empt...
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
joaquinalcerro
Hi there, I am working with Ecto-Postgresql and I need to call all of the records from a specific table but the table has 40,000 records...
New

We're in Beta

About us Mission Statement