Append only DB

Hello.

First introduction:
we are building new project in Elixir. We use Postgresql, it’s very fast, nice. We reduced tables with JSONB and we really like it. Our project it’s about transactions data, so for example: We have to store(have to connected) some data to account. Let’s imagine articles. Every account can have millions of articles. We do reports about this, add more information, etc, do some statistics informations, count them and data inside, etc. Now we use 1:N relation in PG database. It’s really nice. But we don’t wont to “update” the article, we just add more and more information(new state, statistic information, validation, etc). For example state of the article, we change the state and we have to update the row in PG, change the state, add old state to JSONB structure about old states with reference who changed it, etc… and we have this logic for a lot of attributes, because we have to store all history of changes, addons,…
We have also some other tables like accounts-users(N:N), some comments, etc. We can not lost any change, entry, any data, so we have to store it very carefully.

Question:
I’m thinking about “append only DB”. I’ve never used any “append only db” (nobody in my team), also we use only Redis for NOSQL DB. I found only NOSQL DB have this “logic”. So my question is: What do you suggest to me for solving this issue? Do you think we can use DocumentBase DB or other nosql db? Do you have any suggestion about good DB for use?

Thanks for reading anyway :slight_smile:

UPDATE:
I found this https://barrel-db.org/ what do you think about it?

3 Likes

If I understand you correctly you want to store historical data and have a revision tree of all your changes to your “information state”. This means that you need to store the data for every revision. If I have misunderstood your requirement then you may ignore this post :smiley:

Something like this can be implemented in most databases. It is just a question which is more suitable as there are pros and cons with everything. Datomic http://datomic.com/ might be what you are after but it is commercial.

Because you are already using postgres you can keep using it. Have a table and only do INSERTs and no UPDATES (perhaps even revoke UPDATE privilege on the table). Depending on the size of your data this may be a viable approach especially if indexed properly. Remember that today 100 of GB of data is not considered particularly large.

Perhaps some useful articles:

NoSQL databases may be more suitable for this. Primarily because they use underlying append-only data structures. (most commonly a log-structured merge-tree (https://en.wikipedia.org/wiki/Log-structured_merge-tree)). The benefit of this design is that it allows for fast writes and decent enough read speeds. On the other hand they are not ACID and doesn’t support database transactions. Keeping historical revisions of data though, is still left as an exercise for the user.

If you wanted to implement your revision history in NoSQL database you can store your account number and date of update along side with the data and then do a prefix range scan for the account to get all data back.

Which NoSQL database? Who knows :D. If you only run on 1 server you can use something like RocksDB or LevelDB, both of which has got erlang bindings and use it embedded in your application. But anyone that supports range scans and prefixed keys should do.

1 Like

Append-only makes me think Event-Sourcing might be appropriate: see eventstore and the related CQRS framework commanded.

You could also go for the versioned EAV style used by Datomic to give you the flexible append-only data model, though I’m not sure if there is an Erlang/Elixir API for it.

4 Likes

I’ve ran into a similar requirement and built something on top of amazon’s dynamodb.

Warning though, this is used in my current project but it hasn’t seen serious production traffic yet. Reading/Writing should be perfect, but the stream consumer could be problematic if you need to balance across a huge set of shards. Consider it a reference implementation.

It’s pretty much the same style system as eventstore, however being dynamodb it is cheap, easy, and a throw money at it problem to scale. Also has a TTL on events so we set out data retention policy easily.

That being said, I’d still recommend postgres for your use case. I see no compelling reason from your description that warrants something more exotic.

We are playing with purpose built append only distributed SQL DB at work it has some really nice optimization since data is not updatable obviously not appropriate for all use cases.

I wonder if postgres is “append-only” internally (and vacuum serves as the compaction process). Does anyone know where to find docs about its internal table structure?

https://momjian.us/main/presentations/internals.html
Nope it’s not

1 Like

Your use case sounds perfect for my ibGib engine, but it is not mature enough as-is for production scale. It is similar to the event sourcing model (and consequently datomic), but more related to a blockchain/merkle DAG data construct.

I’m not actively coding it now, as I’m on my path for ibGib (currently in Fredericksburg after about 70 miles on my bicycle). I’m using the web app prototype to document my journey. Boy I wish I had implemented a has-child affordance and notifications but I digress…

So if you’re interested in a prototype level maturity lib with hash-based Merkle DAG history integrity, you can learn more playing with the MVP web app, watching my YouTube vids, and check out the source (took it off GitHub as it’s close to being a VCS itself because of the append-only versioning capabilities).

Whatever you choose, best of luck finding your solution!

1 Like