Handling/batching concurrent call for cacheable (same) result

csadewa · October 28, 2021, 8:25am

Hi all, i hit an issue of performance problem in the term of caching. I wonder if there’s any easy way to batch concurrent call (same key) for cacheable result

To illustrate the problem, here i was using Cachex for caching, and to help cache some action/external nerwork call, i was using this helper code to wrap an action/function with cacheable result

@doc """
  Wrap doing something in cache, with cache_key
  Cache will only cache for succesful result, for error result will not be cached
  Input:
      cache_key: cache_key, can be anything (tuple, map, etc.)
      result_fn: anoymous zero-argument function which return {:ok. any()} or {:error, any()}
      opt:
        ttl: time to live, default to :timer.hours(1)
  Output:
      {:ok, any()} | {:error, any()} based on result_fn
  """
  def wrap_caching(cache_key, result_fn, opt \\ []) do
    case Cachex.get(:my_cache, key) do
      {:ok, nil} ->
        result = result_fn.()

        case result do
          {:ok, data} ->
            Cachex.put(:my_cache, cache_key, data, ttl: opt[:ttl] || :timer.hours(1))

          {:error, _} ->
            :do_nothing
        end

        result

      {:ok, data} ->
        {:ok, data}
    end
  end

so if i were to do an same external network call in a function, it would cache nicely and call external network only once, like:

def call_osrm_routing(origin, destination)
  wrap_caching({:call_osrm_routing, origin, destination}, fn ->
    ... do network call to OSRM routing server
  end)
end

def do_some_logic() do
  # Here, in this function, actually calling OSRM routing server only done once
  call_osrm_routing(origin, destination)
  ... do some work
  call_osrm_routing(origin, destination)
  ... do some work
  call_osrm_routing(origin, destination)
end

However, i hit a problem, is when call is concurrent (like for example in Absinthe async resolver), it would make each concurrent call do the external network call by themself, because when the function is called, the cache result isn’t yet ready yet, for example:

# these would make the external osrm call three time, due to cache result isn't ready when second/third invocation 
[
  Task.async(fn -> call_osrm_routing(origin, destination) end),
  Task.async(fn -> call_osrm_routing(origin, destination) end),
  Task.async(fn -> call_osrm_routing(origin, destination) end),
]

How should i solve this problem? is there library/tools to help?

I was thinking to build a batching genserver, which would intercept function call and do work if there’s no same work happen, or if there’s same work currently executing, it would wait until it’s finished, but would like to know if there’s other better way.

stefanchrobot · October 28, 2021, 9:12am

You can solve this by just serializing the Cachex.get calls via a process - the first process will make the actual call and the others will read the cached value. A GenServer per cache/key would do the job.

But maybe Cachex has that feature built in? Seems like ConCache does this by default.

csadewa · October 28, 2021, 9:20am

Oh, yeah, it’s true, ConCache explicitly mention this functionality in readme. thanks

axelson · October 28, 2021, 9:30am

It’s a little buried in the documentation but cachex can handle this with Cachex.fetch:
https://hexdocs.pm/cachex/reactive-warming.html#content

stefanchrobot · October 28, 2021, 10:35am

So it seems Cachex does this by default too. The problem is with the original implementation. It should pass the function as the argument to get instead of doing get+put.

csadewa · October 28, 2021, 10:43am

@stefanchrobot do you mean Cachex.fetch/4 (Cachex — Cachex v3.4.0)? Cachex.get/3 doesn’t have any function as input.

But even at Cachex.fetch/4, i can’t find documentation if indeed it behave in such a way (blocking/batch work progress if there’s multiple concurrent call.

Fetches an entry from a cache, generating a value on cache miss.

If the entry requested is found in the cache, this function will operate in the same way as [`get/3`](https://hexdocs.pm/cachex/Cachex.html#get/3). If the entry is not contained in the cache, the provided fallback function will be executed.

A fallback function is a function used to lazily generate a value to place inside a cache on miss. Consider it a way to achieve the ability to create a read-through cache.

A fallback function should return a Tuple consisting of a `:commit` or `:ignore` tag and a value. If the Tuple is tagged `:commit` the value will be placed into the cache and then returned. If tagged `:ignore` the value will be returned without being written to the cache. If you return a value which does not fit this structure, it will be assumed that you are committing the value.

If a fallback function has an arity of 1, the requested entry key will be passed through to allow for contextual computation. If a function has an arity of 2, the `:provide` option from the global `:fallback` cache option will be provided as the second argument. This is to allow easy state sharing, such as remote clients. If a function has an arity of 0, it will be executed without arguments.

If a cache has been initialized with a default fallback function in the `:fallback` option at cache startup, the third argument to this call becomes optional.

stefanchrobot · October 28, 2021, 10:45am

See the link posted by @axelson:

Cachex.get(:my_cache, "key", fallback: fn _ ->
    # ....
end)

Providing the fallback option will block other callers to avoid concurrency issues.

csadewa · October 28, 2021, 10:48am

Oh wow, i don’t realize there’s that Courier segment on Cachex, thanks very much, now i doesn’t need to change to ConCache from Cachex