I’m trying to learn more about use processes to store state (e.g. following this page), and I’m wondering how one might implement a TTL in a value. For example, if I wanted to cache a web request for a short period of time specified at runtime, what strategies could be employed? Sorry, I know this is kind of open ended…
One common pattern to implement TTL is the following: the process that maintains the state (usually a GenServer
) upon setting some data in the state would also send a delayed message to itself with Process.send_after/4
with a timeout equal to the TTL. The delayed message would contain some reference to the data to expire (for example the key, if it is some sort of key/value cache). Upon receiving the message, the GenServer
would clear the expired data.
Another common pattern is to invalidate the cache lazily: the cached data would contain a timestamp, and upon retrieval the process would check if the cached value is stale, and if so evict it and consider it a cache miss. This would mean that a value could stay in cache forever if it’s never requested, so often one would implement some sort of fixed-capacity cache (like an LRU cache), or periodically clean up the expired values by enumerating or sampling the entries.
Here is a minimal example of the first strategy. It does not handle situations like setting the same key more than once, but it should give you some inspiration:
defmodule Cache do
use GenServer
def start_link() do
GenServer.start_link(__MODULE__, [], [])
end
def put(pid, key, value, ttl \\ 10_000) do
GenServer.call(pid, {:put, key, value, ttl})
end
def get(pid, key) do
GenServer.call(pid, {:get, key})
end
# GenServer callbacks
def init(_) do
state = %{}
{:ok, state}
end
def handle_call({:put, key, value, ttl}, _from, state) do
Process.send_after(self(), {:expire, key}, ttl)
{:reply, :ok, Map.put(state, key, value)}
end
def handle_call({:get, key}, _from, state) do
{:reply, Map.get(state, key), state}
end
def handle_info({:expire, key}, state) do
{:noreply, Map.delete(state, key)}
end
end
You can verify in the console that the TTL works:
{:ok, pid} = Cache.start_link()
# Store something in the cache
Cache.put(pid, :foo, 123)
#=> :ok
# Get the stored value:
Cache.get(pid, :foo)
#=> 123
# Wait more than 10 seconds, then try again:
Cache.get(pid, :foo)
#=> nil
That’s really helpful, thank you!
Is it idiomatic to use Process.register/2
to name the pid? If so, where and how does that get referenced? If you were to put this Cache
module into your app’s supervision tree, then when the app starts up, it calls the start_link
function, right? But what does it do with the pid
returned? In that case, you wouldn’t be able to use the cache because you wouldn’t know the pid
– you would have to manually call start_link
. Is that correct?
If the GenServer
is started “statically” as part of the supervision tree, it is often convenient to give it a name. In my example above, we could change Cache.start_link
to set a name for the GenServer
:
def start_link(options \\ []) do
{name, options} = Keyword.pop(options, :name, __MODULE__)
GenServer.start_link(__MODULE__, options, name: name)
end
Here I am forwarding the options
as the second argument of GenServer.start_link/3
after extracting the name, so they will be passed to init
: that’s not relevant for our case, but it is a common way to pass other options to the GenServer
process.
Defining start_link
to take a single argument is useful, because in the Supervisor
we can specify children as tuples of {module, arg}
, and the Supervisor
will start them by calling module.start_link(arg)
. So we can add this to the list of children:
children = [
# Here we specify children as tuple of {module, argument_of_start_link}
{Cache, []}
]
And we could then refer to it by name (which in start_link
we set by default to the module name, if not explicitly passed):
Cache.get(Cache, :foo)
Or, if we prefer to assign a different name to it (for example if we want to start more than one processes from the same module):
# Supervisor:
children = [
# ...
{Cache, name: :my_cache}
]
# usage:
Cache.get(:my_cache, :foo)
That makes very good sense. Thank you again for explaining this so thoroughly!
Along with the the start_link
alternative Luca presented, if you need more than one running copy of a process within a single BEAM node that you can still reference from elsewhere “by name”, take a look at stdlib’s Registry and via
-tuples.
From there, another refinement of Luca’s later suggestions would be to brush up on child specifications and child_spec/1
which give you finer control over the relationship between “how I supervise this process” and “how it receives arguments”. You’re not limited to a single catch-all argument if you don’t want to be or if it doesn’t make sense for your purposes. This technique translates to many OTP process types and isn’t limited to just GenServers, either.
Everything you described should be doable with Cachex. It also has a TTL implementation. You don’t have to reinvent it unless it’s a learning project.