Git as a Database and Alternatives

I have a Pet project were I will be storing a lot of data, that is text based, and I would like to have it versioned like in Git, and in my initial research I end up finding that we can use Git as a database…

What, are you serious!!!

You can read the principles of doing it in this article Git as a NoSql Database, and it seems the author, Kenneth Truyers, is using it in production, and you can learn more on this video:

Continuing my search I came today across the Noms Database, that claims to be descendant of the Git version control system:

Noms is a decentralized database philosophically descendant from the Git version control system.

Like Git, Noms is:

  • Versioned: By default, all previous versions of the database are retained. You can trivially track how the database evolved to its current state, easily and efficiently compare any two versions, or even rewind and branch from any previous version.
  • Synchronizable: Instances of a single Noms database can be disconnected from each other for any amount of time, then later reconcile their changes efficiently and correctly.

Unlike Git, Noms is a database, so it also:

  • Primarily stores structured data , not files and directories (see: the Noms type system)
  • Scales well to large amounts of data and concurrent clients
  • Supports atomic transactions (a single instance of Noms is CP, but Noms is typically run in production backed by S3, in which case it is “effectively CA”)
  • Supports efficient indexes (see: Noms prolly-trees)
  • Features a flexible query model (see: GraphQL)

A Noms database can reside within a file system or in the cloud:

  • The (built-in) NBS ChunkStore implementation provides two back-ends which provide persistence for Noms databases: one for storage in a file system and one for storage in an S3 bucket.
    s
    Finally, because Noms is content-addressed, it yields a very pleasant programming model.

Working with Noms is declarative . You don’t INSERT new data, UPDATE existing data, or DELETE old data. You simply declare what the data ought to be right now. If you commit the same data twice, it will be deduplicated because of content-addressing. If you commit almost the same data, only the part that is different will be written.

So my question to the community its if anyone is using this type of database in production or even in a pet project?

Or do you know other Git like database alternatives to build a content versioned system?

Note: I know I can just add a column with the version of the content in a normal SQL database, but that is not what I am looking for.

8 Likes

Funny enough I was also recently checking if this could be a viable option.

What I figured out was, though, that it’s probably best to really use it when you have really decentralized clients. (By which I mean “not needing a centralized server”)

The reason is that (as described in the talk you linked), the write speed is not that great and has to be manually controlled, which seems hard.

That beeing said, depending on your actual need, you’d might like CouchDB which offers versioning of documents. Granted: it is not git, but might be enough to get along. Also it’s written in Erlang :smiley:

But again: it depends massively on the project you have in mind. Could you tell us a little more about that? :slight_smile:

2 Likes

Well the project I am trying to come up with is to help me learning new concepts, and keep focus in the learning path by building something that will help me with my life as a developer, because I find my self more often than not to start some learn path and then getting distracted by that other thing I accidentally stumbled in, therefore never finishing it.

So for now I want to start with have a place where I can write about my developer stuff, this can encompass random thoughts, notes about a video, article or podcast, code snippets, howto’s, troubleshooting errors, etc.

Oh completely forgot about this one, I will take a look into… thanks :wink:

While my needs are not directly related to scale and speed, I would like to learn how to write it in a way that could scale and be fast :wink:

Git as a database may work well if the data can be easily partitioned by repository, like by user, topic, etc.

What I really love in the idea of using Git, is that I get distributed backups almost for free git push <remote> master.

I can also use git hooks to update the Read side of my system, like the SQL database, and the Search side of it, like an Elastic search database. More or less in CQRS fashion, because this is another concept that I would like to explore in my Pet project.

Then by all means: go for it :smiley:

Actually I’m currently building something similar for my business. What I do is basically to keep all my project management, timetracking, reporting, etc. in plain text in order to track it with git because I was so fed up with all the fancy online services which look good but (lock you in || loose data || don’t let you write custom queries || are expensive || WORK ONLY ONLINE) /rant. :sweat_smile:

So I get where you’re coming from. I can only tell you this much: it’s probably the best and most rewarding pet project I did in the last 6 years. :smiley:

So again: go for it :partying_face:


Also, please let us know what you found out, when you run into bottlenecks :wink:

2 Likes

Well I hope this time I will, because I have been trying to get on it for several times :wink:

Oh, very nice to know that you are also doing something similar.

How are you using git from your code?

I am fed-up have to sign into many places and have so many tabs open in my browser for so many different services, and in the end they don’t cover all my needs :wink:

Wow, you have being doing this project for the last 6 years? What lang are you using for it?

I will let you guys know all of them :wink:

I’ve been exploring this idea recently as well, and it’s a large part of my reason for playing with a pure-Elixir implementation of Git. (See https://xgit.io for more info. Links to Hex, HexDocs, and GitHub are in the banner at the top of that page.)

It’s a long way from 1.0 release yet, but if this interests you, I’d love to have co-conspirators. :slight_smile:

2 Likes

I’m not. I’m simply using git as normal, but I have scripts and programs that will alter the source files. Then I’ll simply git commit them, because for me that’s a no brainer. If I were to use git directly I’d probably just use libgit from rust.

No that was missunderstandable: I have been working on this for the last two years (on and off) and from wildly different angles to tackle the problem. It’s just that it is the best personal project I did in the last 6 years.

Bash, Rust, Elixir, Python, Elm, Haskell and JavaScript are in the mix I think ^^
I just use what is practical to solve a specific problem at hand and that means: finding libraries that do what I want and connect them in a Unix-y way - with pipes and textfiles :smiley:

If I were to make a product out of it, I’d probably build the bulk with Rust and use Elixir for services that specifically require a server.

1 Like

@Exadra37 you might be also interested in the development of GitGud, GitHub clone entirely written in Elixir

:slight_smile:

1 Like

Sorry to revive this after such a long time, but in the meantime I’ve found https://github.com/mirage/irmin which looks great and actually has a “git backend”. It is a distributed database written in OCaml “that follows the same design principles as git”.

Just dropping this here if anyone is interested.

2 Likes