Best way to warm up cache

fireproofsocks · April 10, 2019, 11:37pm

I am using Cachex to store several types of data for fast retrieval, but all of it is also in the database. I have experimented with adding a task to the application.ex that will read the database and populate the ETS table (i.e. Cachex) with current sessions. It works… but during deployments it can start barging so much if the database isn’t available when the app starts that it made me take it out.

Is that the wrong approach? My auth layer looks ONLY to the ETS cache, NOT to the database (for performance reasons), so it would be a problem if the app had to be restarted for any reason – everybody would get logged out! Is there a better way to warm up cache that isn’t so brittle?

I have something like this in my application.ex:

defmodule Auth.Application do
  
  alias Auth.Contexts.SessionContext

  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      worker(Cachex, [:sessions_cache, []], id: :sessions_cache),
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Auth.Supervisor]
    ret = Supervisor.start_link(children, opts)

    # Trying this as a task outside of the supervision tree...
    warmup_task = Task.async(fn -> SessionContext.warm_up_cache() end)
    Task.await(warmup_task)

    ret
  end
end

What are alternative approaches? Better ways?

axelson · April 11, 2019, 12:05am

I’m going to address only part of the question:

In my opinion, the majority of the time looking ONLY at the cache is the wrong approach. Instead I would use (and do use) Cachex.fetch/4 It allows you to get the data from the cache in the typical case, but if that specific entry is not in the cache it will hit the database (and the database will be queried only once, even if multiple processes request the same key at the same time!).

It would look something like this:

def user_logged_in?(user_id) do
  Cachex.fetch(:my_app_cache, "user_logged_in_#{user_id}, fn user_id ->
    # Make database call here
  end)
end

Then if you call user_logged_in?/1 a second time you will get the cached value without anything hitting the database

Harrisonl · April 11, 2019, 12:21am

Just to further touch on the above point - if you ONLY look at the cache for your session data then if you actually need to scale your application to two nodes, then users will be logged in/out depending on which node their connection goes through. To get around this you would need to implement a distributed cache across your nodes, adding another level of complexity. So i agree with @axelson and have a “cache miss” strategy in place which will fallback to the DB is the write way to go there. This means that your cache can have a short TTL (say 1- 5 minutes) meaning, that if the connect to multiple different servers and are logged out on server A, then at most they will be logged into server B for 5 minutes (but you would still probably want some sort of cache invalidation in place).

In regards to warming your cache, what you can do is have a simple gen server which starts with your application, then populates your cache after a certain amount of time.

defmodule CacheWarmer do
  use GenServer

  def start_link, do: GenServer.start_link(__MODULE__, [], name: __MODULE__)

  def init(_) do
    start_timer()
    {:ok, nil}
  end

  def handle_info(:warm, _state) do
    start_timer() # Restart the timer again if you want this to continuously warm the cache.
    SessionContext.warm_up_cache()
    {:noreply, nil}
  end

  def start_timer, do: Process.send_after(self(), :warm, 1000 * 60 * 5) # 5 minutes
end

What you can then also do here, is using this Genserver is to have it act as a janitor process and every say 20 seconds, check the database for expried sessions and evict them from the cache.

Let me know if you have any other questions! I’d be happy to help out

fireproofsocks · April 11, 2019, 5:02am

Thank you @axelson and @Harrisonl – for some reason I thought Elixir was pooling the cache across nodes differently than traditionally load-balanced servers. I haven’t delved into that part of Elixir yet, so I don’t understand its capabilities wrt talking between nodes.

I can see the logic in using the fallback approach – I built that out as a behavior, but I hadn’t planned on using it for anything more than the low-traffic admin portal where security is a greater concern than speed.

Currently, I’m only storing the cache HITS in the database, but that could potentially leak into more database lookups. The scenario I was worried about was a malicious user trying to enumerate session tokens, which would force a lot of database lookups. The more I think about it, however, instead of just falling back to the database and coming up with an empty “there is no session identified by that token”, the behavior should instead be to STORE the miss in cache (e.g. as a boolean false), just so that any subsequent requests made with that session token would hit only the ETS cache. This is where handing things over to an application-level firewall might make sense (if you can recommend any, please feel free to share).

I know this is probably splitting hairs on implementation details, but I’d rather be prepared for this to scale than discover that there’s a problem when traffic surges.

LostKobrakai · April 11, 2019, 7:34am

If you’re worried about enumeration attempts maybe the more sane approach would be to implement rate limiting instead of making a simple cache more complex than it needs to be.

axelson · April 11, 2019, 7:40am

I haven’t used it but Cachex does support distributed caches:
https://hexdocs.pm/cachex/distributed-caches.html#content

OvermindDL1 · April 11, 2019, 2:41pm

Cachex has a distributed mode for note.

Cachex also has this.

Cachex has a distributed mode that you can setup with multiple synchronization methods you can setup.

I love this fallback approach that Cachex uses, cannot recommend it enough!

Storing misses is also a good thing, I do that (and I’m very eager about clearing keys in my cache on certain actions that even ‘might’ cause an update).

However, if you are worried about someone enumerating things then you really should use both a UUID and throttle any ‘busy’ connection like that (like a plug that stores an IP in a cache or ets table, even per single node is fine, and just store the access time associated with the IP each time in a list in each entry that is pruned of ‘too old’ times and then sleep for a time related to the length of that list, perhaps exponentially as that would catch siege bots really fast but humans wouldn’t notice).

Yep this!

Cachex is pretty amazing. Every cache related feature I’ve needed it has already had. ^.^