Distributed Databases - options/thoughts on storing data in a distributed manner?

The question I have regarding CockroachDB is if is actually possible to have a distributed ACID database with decent latency for the general case. An old jepsen article (Jepsen: CockroachDB beta-20160829) says:

Its most significant limitations (and remember, this database is still in beta) might be the lack of efficient joins and overall poor performance

and from other comments I’ve seen performance can still be a problem for them.

A pretty good evaluation of the 1.0 release: CockroachDB is now production-ready: our first impressions - OpenCredo

1 Like

Well Spanner shows that yes you can but obviously subject to having access to appropriate infrastructure.
I think it will take them 3-4 years to get the thing into usable state (it’s not uncommon to see them making 10x or even 100x improvements for some pathological use cases looking at issues on github). So as adoption grows and if they don’t run out of funding I think they might get to a viable product.

2 Likes

According to first presentation the only problem compared to Google Spanner that CDB is that is not True Time.
It relies on NTP. So Latency depends how good your network is. And you will never have as good network as Google has.
I assume Amazon could create CockroachDB as service :smiley:

2 Likes

Looks like they are making good progress https://www.cockroachlabs.com/blog/2-dot-0-perf-strides/
(instances are n1-highcpu-16 GCE VMs with Local SSDs attached)

4 Likes
2 Likes

DataStax CTO Jonathan Ellis compares the tradeoffs, strengths, and weaknesses of Apache Cassandra vs. Amazon’s DynamoDB, Microsoft’s Azure Cosmos DB, and Google’s Cloud Spanner.

https://academy.datastax.com/content/distributed-data-show-episode-35-apache-cassandra-vs-cloud-databases-jonathan-ellis

1 Like

I think the critique of Spanner is a bit stretched I can easily batch writes to amortize the cost of 2 phase commit. I can do it in fairly generic way for the write heavy path and it will have minimal dev. overhead now dealing with all the issues that Casandra’s design introduces is much more pain can not be dealt with in generic way and has a huge dev overhead.

2 Likes

Apple open sourced FoundationDB
https://www.foundationdb.org/blog/foundationdb-is-open-source/
Cool talk on how FoundationDB is tested:

6 Likes

Surprised no one has mentioned Riak Core, RiakKV or RiakTS.

All OSS now, http://basho.com/

Is it still locked to an older (custom) OTP version though?

I know it might sound boring but to anyone with interest in distributed systems I would highly recommend watching that talk on FoundationDB testing.

I believe they have updated it for 19 and/or 20 now. Finally.

1 Like
2 Likes

If you are interested in testing I recommend to check this blog https://jepsen.io/
Jepsen: CockroachDB beta-20160829

I read Jepsen but believe you me it’s not even in the same league as what they are doing :slight_smile:

2 Likes

I wonder how it could be used from elixir How difficult would it be to implement the wire protocol in other languages? - Development - FoundationDB. Would it require nifs for now?

I’ve also skimmed through their documentation and still couldn’t understand what makes them different (haven’t watched the talked you linked yet) …

EDIT Found Technical Overview — FoundationDB 7.1, but it’s still light on details.

So is it like a cluster of key-value databases where a smaller portion of it accepts writes (supports transactions) and the rest are just for storage? It also seems to have rather complicated (or smart) clients (from the forum above):

a client library from scratch would be a bit more complex than serializing the relatively simple get/set/commit API that’s in the client library. For example, the read-your-writes cache and maintaining the cache of what shards of data are located on what servers is maintained in the client.

It’s ACID distributed ordered key value store (serializable isolation by default). It’s built to be the underlying layer for higher level databases. It’s used in production at multi petabyte scale (even on relatively small cluster it can do millions of concurrent reads/writes per sec.). I still would highly recommend watching the talk it’s not really that much about FoundationDB it’s about using deterministic simulation to test distributed systems. They run the system on fully deterministic simulation for testing in 2013 they ran 10 million test scenarios nightly (limmited by the amount of $ they had as startup to buy compute to run the thing) I would imaging that with Apple acquisition in 2015 it that has grown a lot.

2 Likes

Offtopic:

Ookkkaayyy…

Seven NoSQL Databases in a Week Packt 2018-March-29

  • Redis
  • MongoDB
  • DynamoDB
  • Neo4j
  • HBase
  • Cassandra
  • InfluxDB

Seven Databases in Seven Weeks, Second Edition Pragprog 2018-April-15

  • Redis
  • MongoDB
  • DynamoDB
  • Neo4j
  • HBase
  • Postgres
  • CouchDB
1 Like

Yeah, it’s nice and certainly gives confidence that they know what they do. It’s just strange that I’ve read through several pages of their “technical” docs and still had no idea how the database actually works. They just say that it “uses a sophisticated data-parallel and multithreaded algorithm to optimize conflict-checking” and don’t even try to explain it. Is it like Serializable Snapshot Isolation? If not, what’s the difference? Same goes for their replication algorithms … What are “teams”, how do they work, why the “chance of data unavailability is reduced to about 0.5%” with teams? Where does this number even come from?

1 Like

https://www.dataengineeringpodcast.com/cockroachdb-with-peter-mattis-episode-35/

3 Likes