Google is making Spanner available on GCP

andre1sk · February 15, 2017, 4:00am

https://cloudplatform.googleblog.com/2017/02/introducing-Cloud-Spanner-a-global-database-service-for-mission-critical-applications.html

uranther · February 15, 2017, 4:19am

It’s awesome! And I think of more interest to our forum-mates would be its CAP theorem characteristics (technically CP, but can be treated as CA). And the following post comes from Eric Brewer himself - the guy who coined the CAP theorem. I didn’t know he worked at Google…

https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Spanner-and-the-CAP-Theorem.html

And the white paper: https://research.google.com/pubs/pub45855.html

DianaOlympos · February 15, 2017, 9:13am

To be totally precise : They are nor CP nor CA in terms of CAP theorem.

They only have “5 nines of availability” and they do not define exactly what they mean by availability here (transaction? Uptime of the server ?)
They are not Consistent in the CAP theorem proof meaning. They are serialiazable, not linearizable. That is not a big loss, but still.
To achieve that, they use a synchronised enough clock, using atomic clocks and GPS clocks. Which is quite the heavy machinery.
They “cheat” a bit on what they allow as operation. No DML operation, a lot of locking. And your response time will not be consistent for request that are the same and all. They claim “ACID”… but that term is loosely defined in term of distributed system and transaction meaning.

All in all : yes it is great. Use it if you want it. But no it is not a real solution to the CAP problem if it was a problem that mattered to you

Basically… nearly noone should need that, but everyone will use it because “aaaaaaaah we are loosing data !!!”. This is a great business decision from google pov, at oracle db level. But you probably do not need it.

OvermindDL1 · February 15, 2017, 3:57pm

On initial reading it actually seems very similar to Riak (though not exactly).

andre1sk · February 15, 2017, 5:22pm

Other than being ACID and SQL

OvermindDL1 · February 15, 2017, 5:26pm

Which makes me wonder without looking too deep in to it, what happens when two servers on opposite sides of the world both create the same file with conflicting contents? There has to be ‘some’ communication or resolver, but if using a resolver that happens ‘after’ the set then what happens if the servers that made both of those already ran code assuming the data is written or do they have to ‘wait’ until a quorum is reached (as riak does) to ensure success?

andre1sk · February 15, 2017, 5:46pm

they wait for a quorum as far as I remember

OvermindDL1 · February 15, 2017, 6:22pm

So it is RIAK but with SQL. ^.^

mkunikow · February 15, 2017, 9:00pm

Something like CockroachDB

CockroachDB is inspired by Google’s Spanner and F1 technologies, and it’s completely open source.

So Google GCP want to take some cloud pie by big data and machine learning.

andre1sk · February 15, 2017, 9:07pm

Hard to compare Cockroach to Spanner as Spanner is running in production for at least 5 years handling a huge production load in a cluster composed of hundreds of thousands of nodes. Spanner also relies on having proper hardware and network infrastructure that is hard to replicate on your own.

mkunikow · February 15, 2017, 9:35pm

I agree 100%. As I remember from cockroachdb-ben-darnell podcast spanner relay heavy on network time synchronization, what is hard to achieve outside google infrastructure.
But it nice to play with something similar on own computer

DianaOlympos · February 16, 2017, 12:32pm

They “cheat”. They use a quorum lock, but they do it faster because they have a distributed clock they can rely on…

OvermindDL1 · February 16, 2017, 4:34pm

Hmm, how does that work? I’m not sure how I can see that would save on communication time to make sure no conflicts happened at the same time?

DianaOlympos · February 16, 2017, 6:35pm

It does not, but it enables them to use timestamp, because they consider they have a shared synchronised time that has a finer precision than their machine.

So that simplify a lot the problem in the end.

OvermindDL1 · February 16, 2017, 6:43pm

Hmm, true, still leaves unresolved issues of overall latency that the initial reading of it implies that they mostly eliminated, and there are methods to compare without time (Riak uses vector clocks for example)…

andre1sk · February 16, 2017, 6:47pm

Dude you really just have to waist a bit of time watch the video or waste a bit more time on reading the paper.

OvermindDL1 · February 16, 2017, 6:52pm

Building type systems has been distracting me. ^.^

Plus starting tomorrow, huge project at work being started, my mostly free time will be gone for a bit again (it always ebbs and flows).

mkunikow · March 8, 2017, 1:40pm

andre1sk · March 8, 2017, 3:02pm

Very useful talk they need to make it clearer on the product site that when you buy 1 node you are actually getting 3 replicas across 3 zones.

andre1sk · March 9, 2017, 10:19pm

OH well no magic the cloud strikes again for 30% writes 70% reads to get to 30K tps you are looking at 30 nodes WTF!!! that’s 20K a month you can get the same with synch commit on PG on 2 3K boxes