andzdroid

andzdroid

Question about memory copy

I saw this bit of text on the cubdb repo’s readme (GitHub - lucaong/cubdb: Elixir embedded key/value database · GitHub):

Snapshots come at no cost: nothing is actually copied or written on disk or in memory, apart from some small internal bookkeeping.

In relation to this code:

{x, y} = CubDB.with_snapshot(db, fn snap ->
  x = CubDB.Snapshot.get(snap, :x)
  y = CubDB.Snapshot.get(snap, x)

  {x, y}
end)

I looked at the with_snapshot function and all it does is make a genserver call. Can somebody explain how this doesn’t copy anything in memory?

  def snapshot(db, timeout \\ 5000) do
    GenServer.call(db, {:snapshot, timeout}, :infinity)
  end

My understanding of elixir was that messages across processes are always copied. Is my understanding wrong? Am I missing something?

Marked As Solved

LostKobrakai

LostKobrakai

That expects that “the snapshot” is “the data you store in the db”. It probably isn’t. It’s likely just a pointer or reference to the actual data stored elsewhere.

Also Liked

lucaong

lucaong

@LostKobrakai is essentially right.

CubDB uses an append-only Btree: changes are appended to the database file, like in a log file. Basically, every new change causes a new Btree root to be appended to the database file. The root points at the old nodes as well as to some new ones containing the data that changed. Existing Btree nodes are never updated in place. It works very similarly to an immutable data structure, like the ones used by Elixir. This means that if one has a reference to an old Btree root, they can traverse the tree and see the database exactly as it was when that root was the current one.

Obtaining a snapshot then is as simple as holding a reference to a Btree root: new changes can be written to the database, but they won’t change what is pointed by that old root. Therefore, there is no database copying, as in CubDB does not have to make a copy of the database to give you an immutable snapshot. The Btree root is a small data structure, only holding a reference to other nodes down the tree: it doesn’t contain any data. Taking a snapshot is a very fast O(1) operation. In fact, it is internally used on every read operation, to ensure consistency in presence of concurrent writes.

The bookkeeping mentioned is simply because occasionally, in order to avoid growing the database file indefinitely as more changes accumulate, CubDB performs a compaction, which discards the data which is old and not referenced anymore. CubDB has to keep track of what’s referenced by the existing snapshots to avoid removing it, a bit like a garbage collector.

The details of compaction are a bit more involved, as compaction can proceed in parallel with reads and writes, but in essence when you take a snapshot CubDB does not copy any data: it merely remembers which Btree root your snapshot points to.

When you read from the snapshot, of course data will be copied from disk into memory. But that’s just the data you read, not the whole database. And, again, that is when you read from the snapshots, not when you take a snapshot.

I hope this clarifies it :slight_smile:

LostKobrakai

LostKobrakai

I’d imagine that in the grand scheme of things the tree is probably tiny compared to what would need to be moved around if values would be affected – especially when the db reaches sizes where you actually start caring for it not being copied.

lucaong

lucaong

Also, more clarifications regarding @joey_the_snake replies:

If it was a pointer or a reference then it wouldn’t really be a snapshot because then you could see other people’s modifications. Otherwise it would have to be a pointer or a reference to a copy.

No: it is a reference to the original data, a simple pointer, but it is isolated from other updates. The trick is that the original data is immutable, it cannot be changed by other updates. Instead, updates create a new Btree root that points to all the data that did not change, plus the part that changed. It is the same principle underlying immutable data structures with structural sharing (like those in Elixir, Clojure, etc.)

Looking at the source, it seems like a tree structure is copied to the caller. And this tree structure is walked when trying to read particular keys.

Not the whole tree, just the root, which is a small data structure. And yes, the tree is walked when accessed, but it is not a copy: it is the original (immutable) tree, but accessed from the snapshot root.

Yeah true. I do see the OP’s point though. I wouldn’t necessarily expect the full tree structure, even if it doesn’t have all the data, based on the description.

It’s not the full tree. You can think about it with an analogy with Elixir immutable data structures:

# Say we have a map, which in our analogy is the database
m = %{ foo: 1, bar: 2, baz: [1, 2, 3] }

# I take a "snapshot" by saving a reference to the current map
s = m

# Note that nothing was copied above. Simply, a new reference to the
# same (immutable) data structure is created

# This "snapshot" is isolated from changes:
m = Map.put(m, :bar, 3)
m # => %{ foo: 1, bar: 3, baz: [1, 2, 3] }
s # => %{ foo: 1, bar: 2, baz: [1, 2, 3] }

Note the part that is in common between s and m was not copied: both s and m point to the same values (for example the key :baz points to the same list in m and s, not to two identical copies). Yet, because maps are immutable in Elixir, s does not see changes. Simply, m tracks the latest updates, while s keeps pointing to a certain version.

As soon as s is not in scope anymore, the garbage collector can then clean up the old data that is not reachable anymore from m.

CubDB works in much the same way, just on disk instead than in memory, and therefore with some different implementation details to account for the difference performance characteristics of disk vs. memory.

I just found this nice explanation of how an append-only Btree works, which corresponds to how CubDB works internally. On top of that, CubDB uses a compaction process to “garbage collect” disk space and avoid growing indefinitely with each update.

I hope this clarifies it further :slight_smile:

Where Next?

Popular in Questions Top

sergio
In Ruby, I can go: User.find_by(email: "foobar@email.com").update(email: "hello@email.com") How can I do something similar in Elixir? ...
New
chrisalley
ExUnit now has describe blocks which is a welcome addition coming from RSpec. In the docs, it states that nested hierarchies of describe ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
shahryarjb
Hello, I get Persian date from my client and convert it to normal calendar like this: def jalali_string_to_miladi_english_number(persi...
New
fireproofsocks
Forgive me if this is obvious, but how does one delete a database record WITHOUT selecting it first? Ecto.Repo — Ecto v3.14.0 has exampl...
New
LegitStack
I’m trying to make a websocket server in Phoenix or raw Elixir. I heard about gun, I think I could use cowboy, but since I’m not that sma...
New
beno
I will often find my self writing things similar to: case some_value do nil -> something() "" -> something() _ -> somethi...
New
sergio_101
I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor...
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

Other popular topics Top

sen
Hi All, I set a environment variables in dev.exs , like below code. when i start server, how can i set the ${enable} value? thanks. d...
New
senggen
Erlang/OTP 25 [erts-13.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] 15:22:35.803 [error] gen_event {lager_file_backend...
New
New
9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
gshaw
What is the idiomatic way of matching for not nil in Elixir? E.g., First way: defp halt_if_not_signed_in(conn, signed_in_account) when...
New
aesmail
Hello guys, I have finally made it. I created an admin interface for a framework. It’s been on my todo list for years and with the curre...
New
sergio_101
I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor...
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
boundedvariable
I am going through the kafka architecture. All the features what the kafka is providing are already in Erlang. I would like hear your opi...
New
joaquinalcerro
Hi there, I am working with Ecto-Postgresql and I need to call all of the records from a specific table but the table has 40,000 records...
New

We're in Beta

About us Mission Statement