Periodically clearing out values store in a process? (i.e. enforcing TTL, e.g. in an Agent)

fireproofsocks · March 26, 2020, 4:10pm

I’m trying to learn more about use processes to store state (e.g. following this page), and I’m wondering how one might implement a TTL in a value. For example, if I wanted to cache a web request for a short period of time specified at runtime, what strategies could be employed? Sorry, I know this is kind of open ended…

lucaong · March 26, 2020, 4:24pm

One common pattern to implement TTL is the following: the process that maintains the state (usually a GenServer) upon setting some data in the state would also send a delayed message to itself with Process.send_after/4 with a timeout equal to the TTL. The delayed message would contain some reference to the data to expire (for example the key, if it is some sort of key/value cache). Upon receiving the message, the GenServer would clear the expired data.

Another common pattern is to invalidate the cache lazily: the cached data would contain a timestamp, and upon retrieval the process would check if the cached value is stale, and if so evict it and consider it a cache miss. This would mean that a value could stay in cache forever if it’s never requested, so often one would implement some sort of fixed-capacity cache (like an LRU cache), or periodically clean up the expired values by enumerating or sampling the entries.

lucaong · March 26, 2020, 4:51pm

Here is a minimal example of the first strategy. It does not handle situations like setting the same key more than once, but it should give you some inspiration:

defmodule Cache do
  use GenServer

  def start_link() do
    GenServer.start_link(__MODULE__, [], [])
  end

  def put(pid, key, value, ttl \\ 10_000) do
    GenServer.call(pid, {:put, key, value, ttl})
  end

  def get(pid, key) do
    GenServer.call(pid, {:get, key})
  end

  # GenServer callbacks

  def init(_) do
    state = %{}
    {:ok, state}
  end

  def handle_call({:put, key, value, ttl}, _from, state) do
    Process.send_after(self(), {:expire, key}, ttl)
    {:reply, :ok, Map.put(state, key, value)}
  end

  def handle_call({:get, key}, _from, state) do
    {:reply, Map.get(state, key), state}
  end

  def handle_info({:expire, key}, state) do
    {:noreply, Map.delete(state, key)}
  end
end

You can verify in the console that the TTL works:

{:ok, pid} = Cache.start_link()

# Store something in the cache
Cache.put(pid, :foo, 123)
#=> :ok

# Get the stored value:
Cache.get(pid, :foo)
#=> 123

# Wait more than 10 seconds, then try again:
Cache.get(pid, :foo)
#=> nil

fireproofsocks · March 26, 2020, 5:03pm

That’s really helpful, thank you!

fireproofsocks · March 26, 2020, 5:23pm

Is it idiomatic to use Process.register/2 to name the pid? If so, where and how does that get referenced? If you were to put this Cache module into your app’s supervision tree, then when the app starts up, it calls the start_link function, right? But what does it do with the pid returned? In that case, you wouldn’t be able to use the cache because you wouldn’t know the pid – you would have to manually call start_link. Is that correct?

lucaong · March 26, 2020, 5:34pm

If the GenServer is started “statically” as part of the supervision tree, it is often convenient to give it a name. In my example above, we could change Cache.start_link to set a name for the GenServer:

def start_link(options \\ []) do
  {name, options} = Keyword.pop(options, :name, __MODULE__)
  GenServer.start_link(__MODULE__, options, name: name)
end

Here I am forwarding the options as the second argument of GenServer.start_link/3 after extracting the name, so they will be passed to init: that’s not relevant for our case, but it is a common way to pass other options to the GenServer process.

Defining start_link to take a single argument is useful, because in the Supervisor we can specify children as tuples of {module, arg}, and the Supervisor will start them by calling module.start_link(arg). So we can add this to the list of children:

children = [
  # Here we specify children as tuple of {module, argument_of_start_link}
  {Cache, []}
]

And we could then refer to it by name (which in start_link we set by default to the module name, if not explicitly passed):

Cache.get(Cache, :foo)

Or, if we prefer to assign a different name to it (for example if we want to start more than one processes from the same module):

# Supervisor:
children = [
  # ...
  {Cache, name: :my_cache}
]

# usage:
Cache.get(:my_cache, :foo)

fireproofsocks · March 26, 2020, 5:57pm

That makes very good sense. Thank you again for explaining this so thoroughly!

shanesveller · March 26, 2020, 6:20pm

Along with the the start_link alternative Luca presented, if you need more than one running copy of a process within a single BEAM node that you can still reference from elsewhere “by name”, take a look at stdlib’s Registry and via-tuples.

From there, another refinement of Luca’s later suggestions would be to brush up on child specifications and child_spec/1 which give you finer control over the relationship between “how I supervise this process” and “how it receives arguments”. You’re not limited to a single catch-all argument if you don’t want to be or if it doesn’t make sense for your purposes. This technique translates to many OTP process types and isn’t limited to just GenServers, either.

dimitarvp · March 26, 2020, 11:04pm

Everything you described should be doable with Cachex. It also has a TTL implementation. You don’t have to reinvent it unless it’s a learning project.