Also, more clarifications regarding @joey_the_snake replies:
If it was a pointer or a reference then it wouldn’t really be a snapshot because then you could see other people’s modifications. Otherwise it would have to be a pointer or a reference to a copy.
No: it is a reference to the original data, a simple pointer, but it is isolated from other updates. The trick is that the original data is immutable, it cannot be changed by other updates. Instead, updates create a new Btree root that points to all the data that did not change, plus the part that changed. It is the same principle underlying immutable data structures with structural sharing (like those in Elixir, Clojure, etc.)
Looking at the source, it seems like a tree structure is copied to the caller. And this tree structure is walked when trying to read particular keys.
Not the whole tree, just the root, which is a small data structure. And yes, the tree is walked when accessed, but it is not a copy: it is the original (immutable) tree, but accessed from the snapshot root.
Yeah true. I do see the OP’s point though. I wouldn’t necessarily expect the full tree structure, even if it doesn’t have all the data, based on the description.
It’s not the full tree. You can think about it with an analogy with Elixir immutable data structures:
# Say we have a map, which in our analogy is the database
m = %{ foo: 1, bar: 2, baz: [1, 2, 3] }
# I take a "snapshot" by saving a reference to the current map
s = m
# Note that nothing was copied above. Simply, a new reference to the
# same (immutable) data structure is created
# This "snapshot" is isolated from changes:
m = Map.put(m, :bar, 3)
m # => %{ foo: 1, bar: 3, baz: [1, 2, 3] }
s # => %{ foo: 1, bar: 2, baz: [1, 2, 3] }
Note the part that is in common between s
and m
was not copied: both s
and m
point to the same values (for example the key :baz
points to the same list in m
and s
, not to two identical copies). Yet, because maps are immutable in Elixir, s
does not see changes. Simply, m
tracks the latest updates, while s
keeps pointing to a certain version.
As soon as s
is not in scope anymore, the garbage collector can then clean up the old data that is not reachable anymore from m
.
CubDB
works in much the same way, just on disk instead than in memory, and therefore with some different implementation details to account for the difference performance characteristics of disk vs. memory.
I just found this nice explanation of how an append-only Btree works, which corresponds to how CubDB
works internally. On top of that, CubDB
uses a compaction process to “garbage collect” disk space and avoid growing indefinitely with each update.
I hope this clarifies it further