FerricStore - a durable, Redis-compatible cache

Hey everyone,

I’ve been working on something I want to share and get feedback on.

The itch

Every web app I build ends up with the same stack: a database, a cache layer, and the app itself. Three things to deploy, monitor, pay for, and hope they stay connected to each other.

For caching, the options are:

Redis — great protocol, great ecosystem, but in production you need a managed service (ElastiCache, Upstash, etc.) because self-hosted Redis persistence is… optimistic. AOF rewrite can lose your last few seconds. RDB snapshots lose everything since the last dump. And now you’re paying for a whole separate instance just to hold keys.

Java solutions* (Hazelcast, Infinispan) — powerful but painful to operate. Config files longer than your app code. JVM tuning. Cluster formation issues. Not exactly “easy to maintain.”

ETS/Mnesia — built into the BEAM, fast, but volatile. Node goes down, data’s gone. Mnesia replication exists but comes with its own set of surprises.

What I actually wanted: use the disk I’m already paying for on my instance, speak a protocol I already know (Redis), get real durability without a separate service, and keep my stack simple.

What I built with AI

[FerricStore]( GitHub - yoavgeva/ferricstore: Distributed crash-safe key-value store with Redis wire protocol. Durable by default — every write is Raft-committed and fsync'd. Embeddable Elixir library or standalone server. Built with Elixir + Rust NIFs. · GitHub ) is a persistent key-value store written in Elixir + Rust that:

Speaks RESP3 — connect with redis-cli, or any Redis client library that support RESP3. Your existing code mostly just works.

Every write is durable by default — Raft consensus + Bitcask append-only log + fsync. When you get OK back, your data is on disk. Not “eventually.” Not “if the AOF rewrite finishes.” On disk, right now.

Runs embedded in your app — add it as a dependency, FerricStore.set("key", "value"). No separate process, no network hop, no connection pool. Your cache lives in the same BEAM node as your app.

Or runs standalone — `docker run -p 6379:6379 yoavgeva/ferricstore` and connect with redis-cli. Drop-in for apps that already speak Redis.

50+ Redis commands — strings, hashes, lists, sets, sorted sets, TTL, MULTI/EXEC, pub/sub, and more.

Native Elixir commands — CAS (compare-and-swap), distributed locks, rate limiting, FETCH_OR_COMPUTE — things you’d build on top of Redis but get out of the box here.

Probabilistic data structures — Bloom filters, Cuckoo filters, Count-Min Sketch, TopK, HyperLogLog, T-Digest. All built-in.

Vector search — HNSW index for similarity search, built into the storage engine.

The architecture in 30 seconds

Client (redis-cli / Redix / your app)

|
v

RESP3 Parser (pure Elixir, zero-copy)

|
v

Command Dispatcher

|
v

Raft Consensus (via ra library — same one RabbitMQ uses)

|
v

Bitcask Storage Engine (Rust NIF — append-only log, CRC-checked)

|
v

ETS Hot Cache (recent values served from memory, cold values read from disk)

Writes go through Raft for consistency, then to Bitcask for persistence. Reads hit ETS first (microseconds), fall back to disk for cold data. The Rust NIF handles the low-level I/O — pure functions, no dirty schedulers, proper consume_timeslice yielding so the BEAM scheduler stays happy.

Embedded mode — the thing I’m most excited about



# mix.exs

{:ferricstore, "\~> 0.1"}



# your app

FerricStore.set("user:123:session", session_data)

{:ok, data} = FerricStore.get("user:123:session")



# with TTL

FerricStore.set("rate:api:123", "1", ttl: :timer.seconds(60))



# atomic operations

{:ok, new_count} = FerricStore.incr("page_views")



# compare-and-swap

:ok = FerricStore.cas("inventory:sku42", "10", "9")


No Redis connection. No connection pool config. No “what happens when Redis is down.” It’s just a function call that persists to disk. Your Phoenix app, your cache, one deployment, one thing to monitor.

Standalone mode — drop-in Redis replacement

If you have apps in other languages, or you just want a Redis-compatible server with real durability:

bash

# Docker

docker run -p 6379:6379 -v ferricstore_data:/data yoavgeva/ferricstore



# Then use any Redis client

redis-cli SET mykey "hello"

redis-cli GET mykey


Or from your Elixir app via Redix:

elixir

{:ok, conn} = Redix.start_link("redis://localhost:6379")

Redix.command!(conn, \["SET", "user:42", "alice"\])

Redix.command!(conn, \["GET", "user:42"\])

# => "Alice"


Everything you’d expect from Redis works — MULTI/EXEC transactions, pub/sub, pipelining, HELLO 3 (RESP3). Plus you get a built-in health endpoint (GET /health on port 6380), Prometheus metrics, ACL authentication, and TLS support.

The dashboard gives you live visibility into shards, key counts, memory pressure, hit rates, and slow queries — no Grafana setup needed.

What it’s NOT

Not a database replacement — it’s a cache/store. Great for sessions, rate limits, feature flags, job queues, counters, leaderboards. Not for your users table.

Not a sharded cluster (yet) — scales to 3-5 nodes via Raft replication for high availability and read scaling (every node serves reads from local ETS). Storage capacity scales with disk, not RAM — a 500GB NVMe gives you 500GB of cache. But there’s no hash-slot sharding across nodes yet, so every node holds all the data.

Not battle-tested in production — this is v0.1. It passes 8000+ tests including shard-kill recovery and multi-node cluster tests, but it hasn’t seen real production traffic yet. That’s where you come in.

Not going to beat Redis on raw throughput — Redis keeps everything in RAM and doesn’t fsync by default. But it’s persistent store where every write hits disk,

Why Elixir + Rust?

Elixir gives us the BEAM’s supervision trees, distribution primitives, and ETS for the hot cache. Rust gives us a memory-safe, zero-copy storage engine that doesn’t GC-pause or mess with the BEAM scheduler.

The NIF boundary is clean — Rust functions are pure and stateless. No Mutex, no shared state, no dirty schedulers. Just `v2_append_batch(path, entries)` and `v2_pread_at(path, offset)`.

Looking for

Feedback on the API, the architecture, the docs — what’s confusing, what’s missing, what would make you try it?

Early adopters willing to try it in a side project or staging environment and report what breaks.

Use case ideas — what would you use a durable, embedded Redis-compatible cache for in your Elixir apps?

Links:

- Hex: ferricstore | Hex

- Docs: ferricstore v0.1.0 — Documentation

- Docker: yoavgeva/ferricstore - Docker Image

- GitHub:

Happy to answer any questions. And yes, the name is a pun — Ferric (iron/Rust) + Store.

7 Likes

Nice! Embedded is an actual selling point for me, though at that point you’d be competing with ETS.

I also expect @Asd to come and review your library top to bottom any moment now. :smile:

1 Like

ETS is the embedded cache every language has its own version of — it’s great for ephemeral, in-process state. But the moment you need TTLs, persistence across restarts, or richer data structures, you’re forced to leave your app and spin up a separate Redis.

Embedded mode is FerricStore’s answer to that — same “just a function call” ergonomics as ETS, but your data survives restarts, you get the same capabilities as Redis, and you never leave your BEAM node. One deployment, one thing to monitor, no network hop. That what I was searching for in other cache solutions, and I could not find.

1 Like

I see. I never worked on an app where we had to have caches survive app restarts because at that point the time to access a cache might not be much quicker than just going to the DB. But I heard there are exceptions and that people do actually need this.

I have heard about Bitcask but never checked how quick it is. Thanks for answering.

1 Like

Hi, good library,

It’s a very big project (30k lines of Elixir code, 15k lines of Rust and around 100k lines of tests) implemented in only two weeks according to git history.

I am reading the code and it is… strange.

First of all, it is a cache, but it has strange eviction logic. If write too many keys (95% of keydir memory limit), the FerricStore will enter a keydir_pressure_level = :reject state, where it will fail all incoming set calls. Entering this state also triggers the eviction, but it will only be triggered asynchronously. And even after this eviction happens, I won’t be able to write new keys until the MemoryGuard process performs another check (that is for 100ms). I mean, it would make sense to block the writes, but it appears to just return :error and not pass any set further.

Then, its RAFT, which is strange given that your post says “no network hop”. The problem is that async writes do not check for leadership change. In the end, async means that cluster can be in an inconsistent state and one node may see different results from the other. Let me explain this, for example, node A performs async set("x", 1) and node B performs async set("x", 2). Async write is implemented in a way that you first write to ETS, then disk storage, then you fire-and-forget into RAFT cluster. But this fire-and-forget operation may not succeed (because leader changed or any other issue, its consensus after all). So, if this issue happens with the second write, you are going to end up in a situation where node A sees "x" = 1 and node B sees "x" = 2. That may be fine, but the worst thing is that this situation is never fixed. It makes this async_write not even eventually consistent. It is just inconsistent, forever. I’d expect async writes to be eventually completed if the system is alive.

Then, every time one shard log file collects more than 256MB of data (that value is hardcoded), it will do a persistent_term.put, triggering a global GC. Not a big deal, but a strange decision.

Then, Bitcask compaction is triggered in a scheduled way. It attempts to compact every 30 seconds, and at that time it performs multiple File.ls calls. I think that it is possible to check if compaction needs to be triggered when rotating the shard log file, when performing a check.

And yeah, whole idea to use bitcask is very strange. How bitcask works: it just appends new entries into file and then puts them in a big hash table like key -> log_file_name_and_offset. That’s it. You use an ETS as a hot cache alongside a bitcask. That is strange. Why not just use an append-only file (or files) and a one or two ETS tables (or :shards) which stores key -> value or log_file_name_and_offset. That’s minus 15k lines of Rust, no NIF required. In the end, its an IO-bound task and there’s no need to use NIF in the first place. If you did NIF for performance, I’d suggest optimizing the code and reviewing the performance manually. I suspect that Claude performed some flamegraph analysis on hot paths and optimized code for them (thats obvious with comments like “I do … in order to save 10 nanoseconds”, that made me laugh :grin:), but that resulted in benchmark cheating, where all expensive operations were moved to the background (like compaction), and were left without optimization.

Then, Sandbox code is spread around all the project, introducing lots of ifs in all the modules. I’d suggest to just start separate instances of the FerricStore in tests, one for each test. That would make the code much more simple.


In the end, congratulations on initial release! :tada: I don’t think that I will be using this project in its current state. I’d also suggest to not feed this review into Claude, because I think it would be much better if you used this review as a starting point in your own review of the code in order to learn more about database and caches. I can perform a more detailed review if you want (for a reasonable price of course).


I’d suggest checking out marvelous Nebulex. It has no built-in persistence, but you can use any storage as an adapter (or implement your own storage).

3 Likes

@Asd Amazing feedback, will go over your input!, like I said version 0.1!, AI give a lot of boost, but mistakes happens.

As for why I chosen Bitcask, it’s the storage engine, the idea here keydir table (ETS) and the file appender.
every key maps to an exact file offset, o reads from the disk are always O(1). it detects corruption on read, which a plain append log doesn’t give you for free, since old values stick around in the log, Bitcask periodically rewrites the data to reclaim space and hint files a separate compact file alongside each data file so startup is fast even with large datasets (need to write keydir all over again).

it won’t reject set for existing keys, only for new, and I added a call to MemoryGuard when it happen based on your input thanks!, as for holding the set instead of returning an error, I prefer to return an error because I can’t promise eviction will happen, and it mean adding memory or delete must happen (depend on how it was configured)

Thanks for input, I think from excitement I wrote some confusing stuff, English not my first language, I meant in embedded mode you don’t do network hope like in Redis/Centralized solution you will have to do, but when you in multi-node cluster you will have Raft between the nodes

Why can’t you? Usually caches work like this: there is a limit, by memory size or just by amount of keys. When you hit the limit, you evict some keys to make more space. Sometimes you optimize it to evict more keys than needed (you have memory limit of 2000 entries, you have 1995 now, you want to add 10 and instead of evicting 5 you evict like 200) or you evict in background, etc. There is no such error as “Not being able to fit in cache”, because cache can evict data, thats the core idea of cache

1 Like

I agree with you, but that depend on configuration, for example in Redis if you didn’t configure lru eviction in cluster it will also return ERR, in burst that might happen I also don’t want to hold response, because it might not for sure will happen, I expect it to happen correctly the eviction 99.99% of the time, but if user configured with something that fight against it, it might happen, so it’s more of a guard, also we speak on keyDir, it represent the entire keys plus value (if they are hot), I can’t delete a key, I can only decrease value, so if you reach with 95% of keydir and values are all nil, I can’t evicte at all, keydir represent all keys (Cold path and hot path), basically it allow you to only read from disk (which give you higher capacity overall) but slower responses (depend on the disk type)

Thanks on the input, I already contributed for database in my history, this is my first attempt to do AI project in my small free time, with something I always wanted to do :slight_smile: , I agree it’s not perfect, and I am not native Elixir :slight_smile: I come from Java, C++ and golang world, it’s just a code nothing changed :wink:

Thanks on finding this!, I designed spec just for this (AI and I missed that he didn’t finish) and tests just for this, but forgot to test in multi-cluster (in one node it won’t be seen), fixed this!

thanks on the comment, I optimized it to embedded vs standalone mode (different effects)

I focused at the start on performance of reads/writes by user, you are correct, an optimization for background processes need to be done, I will start it soon thanks on the input!

I really liked the sandbox by Ecto (ruby does it also with rspec), usually when you use another storage without transaction you can get a lot of flaky tests, wanted something to ease the usage of testing, I will have to rethink the implementation thanks on the input!

Did you get Claude to write this? The text seems heavily LLM-generated. Nothing fills me with confidence in a new project like LLM-generated text!

Well the world is changing, the question is how you use it, I am bad at marketing, I am more of a technical person, so I use it for this also :slight_smile: , the future is here :wink:

Part of improving a library and making it suitable for community use is to understand the criticisms with your own brain and then give appropriate instructions to your LLM agent of choice.

Just copy-pasting community feedback to Claude and then releasing a fix and then copy-pasting its response to ElixirForum does not inspire confidence.

I am not sure what’s your rush. You can deliver that a bit slower but with more community goodwill.

1 Like

My bad, I thought I might find people who like minded and will like to help me take it forward, I guess this is not the place, in Java/Golang communities it’s much more common, nobody force you to use it or take it forward, I am a person who like to push stuff, that’s how I work in other projects, I don’t see a problem in using AI to move things forward, if you notice this is how the new projects all around the world are used in the last year, how do you think cloud code release 4 features a day, and so many more.

1 Like