How do I build a caching service in elixir?

siddhant3030 · February 14, 2020, 2:30pm

Suppose I have two tables in my database. I want to build a service in elixir which should pick data from my database and cache each rows in redis keys.

I want to use genserver and supervisor to do this.

Can anyone tell me how do I go about it?

wanton7 · February 14, 2020, 2:52pm

Do you really need to use Redis? If it is not mandatory to use it, then maybe just use https://github.com/whitfin/cachex Instead.

lucaong · February 14, 2020, 2:52pm

Hi @siddhant3030,
I think that more information about your case would make it easier for people to help you with this:

Why is Supervisor and GenServer a requirement? Is this an exercise/coding challenge, or you have specific needs with regards to resilience? If it’s about resilience, in order to design the supervision it’s useful to know how you want the system to react to failures (wipe all cached values, keep the cached values, restart the whole system or only specific parts, etc.)
What is the expected behavior of the caching service? When should values be purged from the cache? What is the cache key? How are values retrieved from the database?
Is Redis a requirement, or just an idea?

If you just want some starting point, you can have a look at the official Elixir getting started guide, that has a very similar example. The guide walks you through the implementation of a key/value store using various techniques, from Agent to GenServer, Supervisor, ETS, etc. Caching services are usually key/value stores with some logic on top, so that might put you on the right track.

andreaseriksson · February 14, 2020, 3:03pm

As @wanton7 suggests, cachex is fine. But I think it works best as a short term cache (like memcached). If you really want something that is still (fairly) easy to use would be to use either a new database table where you precalculate data or a materialized view.

I guess the options are endless.

siddhant3030 · February 14, 2020, 3:05pm

Okay. Wait

lucaong · February 14, 2020, 4:30pm

Before I start, is this an interview task or homework? If so, make sure that the company is ok with you sharing it here. It is quite likely that employees of companies working with Elixir hang out in this forum.

That said, there are different ways to design such a service and split up responsibilities, depending on specific cases and needs, but a possible way to start could be something like:

A scheduler module, possibly a GenServer, that schedules the cache update task every 10 seconds. Usually, something like that is implemented with a GenServer sending a delayed message to itself with Process.send_after, and reacting to it with handle_info, triggering the task and rescheduling the next message. The actual task logic would be in a different module.
A Task to perform the cache update: it should select newly inserted rows from the DB, serialize the data, and save it in Redis. How to select for newly inserted rows depends on the data model. It seems from your description that data can only be inserted and not changed or deleted: if so, you might assign auto-incrementing IDs and keep track of the latest ID that was cached, so you can select every record inserted after the last check by selecting for greater IDs.
The Postgres client, the Redis client, the serialization logic, etc. should probably all be implemented in their own processes, so they can be restarted independently in case of a crash
The supervision tree will depend on the specific needs and dependencies between these processes

Consider that this is just a guess: designing a real system requires a much better understanding of the task at hand, goals, and constraints than we can have from this short description.

If you start implementing this, and run into specific issues, I am sure that people on the forum will be able to help you with them.

EDIT: this response was based on the message it replies to, before it was heavily edited and basically mostly deleted. Without that context, it does not make much sense anymore… What I wrote here is not the way one would design a caching layer.

andreaseriksson · February 14, 2020, 5:42pm

Sounds like it.

lucaong · February 14, 2020, 6:02pm

It might very well be a learning question, I just don’t want people to get in trouble.

siddhant3030 · February 14, 2020, 6:02pm

No. This is part of a project. I want to test a POC for this.

lucaong · February 14, 2020, 6:52pm

Then the best would be to describe more clearly the project goals, so people can help you better. What’s the end goal of your POC, or the problem you are trying to solve with it?

I am asking because your description focuses on the what, but doesn’t mention the why, and that makes it difficult to devise a proper solution.

Not knowing what its goal is, the system you describe sounds strange and over-engineered. For example:

• A simple POST API in golang to insert data into your postgres tables.
• Elixir service should pick data in these 2 tables and cache each row in redis keys in every 10 seconds.

Why can’t the golang service write on Redis? That would make the Elixir service unnecessary, simplify the whole system a lot, and the cache would be updated as soon as a write happens.

Alternatively, why should the Elixir service poll the database every 10 seconds, rather than listening to a messaging queue (RabbitMQ, Kafka, etc.)? That would avoid unnecessary database queries, and make the system scalable, as you would be able to add more workers as needed.