Code and patterns for an article about KV store engine with logs, in elixir

alvises · November 23, 2018, 6:19pm

Hi everyone,

It has been a while I wanted to write a series of articles on how to implement a working (an simple!) kv store engine using logs and elixir. I’m writing the first article, which it’s an intro of the different concepts and I was thinking to write a first super simple implementation with just one writer (with one log file), one index and a reader.

Since the article is becoming long, I wanted to keep the code as simple as possible to focus on the storage engine. To make it as easy as possible, at the beginning I was thinking force the name of process, and avoiding in general to put the pid argument in the server interface functions, like this:

defmodule LogKV.Index do
  use GenServer

  def start_link([]) do
    GenServer.start_link(__MODULE__, :empty, name: __MODULE__)
  end

  def init(:empty), do: {:ok, %{}}
  
  def update(key, offset, size) do
    GenServer.call(__MODULE__, {:update, key, offset, size})
  end

  def lookup(key) do
    GenServer.call(__MODULE__, {:lookup, key})
  end

  def handle_call({:update, key, offset, size}, _from, index_map) do
    {:reply, :ok, Map.put(index_map, key, {offset, size})}
  end

  def handle_call({:lookup, key}, _from, index_map) do
    {:reply, get_key_offset_size(key, index_map), index_map}
  end

  defp get_key_offset_size(key, index_map) do
    case Map.get(index_map, key) do
      {_offset, _size} = offset_size -> {:ok, offset_size}
      nil -> {:error, :not_found}
    end
  endend

full code with docs here: https://github.com/alvises/logkv_articles/blob/90a980eccc06d2bb55fa8f491de63916ae75d6f3/lib/logkv/index.ex

As you can see the update/3 function doesn’t have the pid parameter. This permits me, in the article, to focus more on the functionalities rather then processes. I don’t know if this is considered an anti-pattern, but for sure brings issues with unit testing and running the tests in parallel, since I can’t run multiple processes and can’t specify multiple pids.

I then have another module, the Writer which uses the Index. Having a simple interface with just one index with an harcoded name put me in the easy position to not having to introduce in the code registry etc… the idea is to bring all this as long as the implementation improves during the different parts.

Can you please tell me what’s your opinion about this? Should I make the code a bit more complicated introducing using Registry and making the code less coupled, or for sake of simplicity of the article I can leave the interface as it is?

If you want to have a better idea of what I’m talking about and take a look at the draft I’m writing, here’s the link: https://www.poeticoding.com/p/31d01f68-f853-48fc-93ad-8c097204af6b/
password to see the page: elixir_forum

Thanks

Alvise

tty · November 23, 2018, 7:00pm

You can always pull the name from sys.config or the application environment i.e Application.get_env/3. By using Application.put_env/4 prior to starting the GenServer you now can have several named processes.

alvises · November 23, 2018, 7:24pm

although this is just for an article, my main concern is, for the sake of a super simple code, to show implement an anti-pattern

tty · November 23, 2018, 7:37pm

The more common idioms are:

name the process the same as the module if only one such process is expected in a node
the name in provided if several similar processes would run on the same node and you need access to a specific process
have several start_link/start to simplify usage in the console and test

I’ve seen unique process names pulled from sys.config or a database in production code. As long as you are consistent about it.

The benefit of Registry is non-atom names.

alvises · November 23, 2018, 7:46pm

Thanks a log! I’m in the 1st point for sure. Only one process is expected, especially considering the code is just to introduce the concept.