What would be the “proper” way to implement a simple persistent key/value store?

What would be the “proper” way to implement a simple persistent key/value store? Settings that are read far more than they are written. In Go I used a single postgres table and on application start-up I read it into to a mutex-backed struct. The write method would persist the setting in the database.

I wouldn’t necessarily need to use Postgres as a backing store, but that way the settings get backed up with the rest of the data. That being said, I’m not against using a non-postgres backing store, if that makes sense.

I’ve looked a bit into ETS, Mnesia, I also read somewhere that it would be a good idea to build a GenServer around a key/value store. I feel a bit overwhelmed by the choices and would like a pointer in the right direction. Thank you.

1 Like

ETS is not persistent, DETS and Mnesia are. As you said, there are many choices, so your usage pattern and what you feel more comfortable play a big part.

If I were you I will just use Postgres, especially if you are going to need a relational db down the road anyway.

2 Likes

GenServers are great if you don’t need to scale, as they are notorious for being a bottleneck, the best persistent key value store is DETS if you only need it for a single node or persistent_term.

The Erlang docs are pretty comprehensive, especially if you know your constraints.

Thanks. I think you are right, I feel comfortable with postgres so I I’ll stick to that. So now I just need to figure out how to cache the table in memory and trigger a write to db only when the data has been modified.

I see. I saw something about DETS, but assumed it was probably distributed ETS, hence overkill for my needs and still not persistent. I guess I was wrong. Thank you! I’ll look deeper into DETS, this time without prejudice.

This would be a good fit for cachex or nebulex if you don’t want to roll your own solution.

2 Likes

If you’re using Postgres, use Postgres. Otherwise use SQLite. Create a table with binary keys/values and then use ETS as a write-through cache.

Always write to the DB first and then the cache before returning from the write function. Always read from the cache.

def put(key, value) do
  Repo.insert! %Row{key: key, value: value}
  :ets.insert(@table, {key, value})
end

def get(key) do
  case :ets.lookup(@table, key) do
    [{^key, value}] -> value
    [] -> nil
  end
end

Load the rows from the DB into the cache at startup. If you want to use arbitrary terms just encode them with term_to_binary().

Edit: the put() function above is only correct for a single writer, meaning it cannot be used from multiple processes without synchronization. See @Asd 's more thorough answer below for an example that uses a GenServer as a single writer to serialize writes.

4 Likes

Given the number of times “disk” is mentioned, I’m pretty sure it’s “disk ets”. Distributed ets sure would be nice, though.

I’d do this like this if you want it backed by postgres table

defmodule Table do
  @moduledoc "ets table backed by postgres table"

  use GenServer

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end

  def init(_opts) do
    table = :ets.new(:table_name, [:protected, :set, :named_table])
    entries = for %{key: k, value: v} <- Repo.all(Table), do: {k, v}
    :ets.insert(table, entries)
    {:ok, %{table: table}}
  end

  def handle_call({:write, key, value}, _from, %{table: table}) do
    Repo.insert!(%Table{key: key, value: value})
    :ets.insert(table, {key, value})
  end

  def read(key) do
    case :ets.lookup(:table_name, key) do
      [{_, value}] -> {:ok, value}
      _ -> :error
    end
  end

  def write(key, value) do
    GenServer.call(Table, {:write, key, value})
  end
end

But if you’re okay with just a file on disk, I’d consider using a dets


However, I am going to release much more performant persistent LVM KV db in the upcoming months, so I will reply here again once it’s ready

3 Likes

This code is very wrong and buggy.

Consider two processes calling the put function at the same time. If one does put("key", 1) and other does put("key", 2), it is possible that the order of operations would be

Repo.insert! %Row{key: "key", value: 1}
Repo.insert! %Row{key: "key", value: 2}
:ets.insert(@table, {"key", 2})
:ets.insert(@table, {"key", 1}

And postgres database would have 2 written while ets would have 1. Wrapping it into GenServer (or any other locking mechanism) is a must

1 Like

You are 100% correct, the OP mentioned settings so I assumed a single writer without even thinking about it.

2 Likes

Yeah, you’re right, if its a single writer, then your code is correct.

Why is it always the same people in KV DB topics :smile: . Happy new year btw!

4 Likes

There are dozens of us! I’m really a relational model enjoyer though, KV is just a means to an end :slight_smile:

I’ll edit my post. Happy new year!

3 Likes

Ah yes, of course! I don’t know why my brain maps the letter d to distributed

1 Like

Aha! Yes, a locking mechanism is a must. In Go I used a mutex to lock during a write. That is what I wanted to be able to achieve in Elixir, but I’m not quite there yet. So GenServer is the best path then?

I’m not against using a file, but I’m quite biased towards using a db table. Not for any deep philosophical reason, just power of habit, I guess.In the root post, I did mention that the settings are backed up together with a database backup. That may be a form of justification.

It should be the case that there is only one writer, but I always prefer to play it safe just in case.

Of course you must trust your own application code, so if you legitimately know that there is only ever going to be a single process that changes settings then you don’t need to worry about it. The purpose of the GenServer is to ensure that there is only going to be a single process that changes settings, i.e. the correctness is tautological.

There are other techniques that could improve concurrency, like per-row locks, but I doubt that would be necessary in your case. Better to keep it simple.

To be honest, it all sounds like XY problem. Changing configuration and settings in runtime is a strange approach. And making these changes persistent is even more strange. Other thing is that such configuration changes may be just read once, then cached or stored in process state, making any change to settings partial and inconsistent

So, what are these settings exactly? How do you use them?

2 Likes

Yeah, you beat me to it. We never got told the concrete problem that made OP think they need a simple persistent key/value store for runtime-changeable configuration.