Data_fetcher - a small library that can ease your fetch-and-cache jobs

qhwa · May 25, 2021, 9:27pm

data_fetcher is a small library that can ease fetch-and-cache jobs for your Elixir projects.

Occasionally we need some data, typically used as configurations, from an external source. Chances are these configurations won’t change too often, so we may want to cache them and refresh the cached data periodically.

We may think these kinds of fetch-and-cache jobs are too simple for a library. Well, I thought that too, until I found myself repeating myself tackling the same challenges:

to set up the scheduler, either by sending a message to a scheduler with Process.send_after or send messages to the GenServer itself
to handle the failures in fetching the data, usually, just restart the job
to prevent immediately hitting the external source if there are simultaneous requests and the cache expires at the same time for them
to serve the simultaneous requests while we are fetching the data at the first time
no to block the whole application from booting up as the fetch job may be slow and the data is required by some of the endpoints only
to have a decent performance, no matter how big the data size is

So, I made a library for myself called data_fetcher, which solves the above problems. Please check it out if you have the same need.

I am grateful for the help from @wojtekmach who gave me a guide on the original idea and pointed out the problems on the first “design”, which I think most new developers may have too when not familiar with OTP. I made some mistakes on the first working version and (after reading related chapters from Elixir in Action) got it rewritten now.

Again, please check it out and any feedback is welcome!

Cheers!

Ref:
Online doc | Github

dimitarvp · May 26, 2021, 8:33am

Cool. I always enjoy libraries that wrap OTP boilerplate. Thanks for this.

As a suggestion, I’d also add a function that can generate a stereotypical child spec, like this:

def child_spec_for_function_and_interval(func, minutes)
    when is_function(func, 1) and is_integer(minutes) and minutes > 0 do
  {
      DataFetcher,
      name: :my_fetcher,
      fetcher: func,
      interval: :timer.minutes(minutes)
    }
end

It’s going to be best if the name is shorter though.

stefan_z · May 26, 2021, 9:04am

Hi @qhwa, lib looks pretty cool

I have one suggestion and one question

Suggestion: Definitely more documentation
Question: How cache will behave in a distributed environment?

qhwa · May 26, 2021, 9:32am

Aha, I’m not the only one thinking that the helper function is too clumsy! Thanks for the advice. Would you think it will be better?

Supervisor.init([
  {DataFetcher, fetcher_options()}
  ...
], strategy: :one_for_one)

...
defp fetcher_options,
  do: [
    name: :my_fetcher,
    fetcher: my_func,
    interval: :timer.minutes(20)
  ]
...

qhwa · May 26, 2021, 9:42am

Hi @stefan_z Thanks for the suggestion! I’ll improve the documentation and bring more examples.

For how it works in a distributed environment, currently it uses local registry so it will setup supervisor tree on each node, which means they do their job separately, with their own scheduler, worker and cache storage.

This may work for most scenarios but sometime you need a singleton job across the cluster. For such cases, we don’t directly support via configuration yet. But would be easy to support that, hopefully just replace Registry with a distributed registry, such as Horde.Registry

dimitarvp · May 26, 2021, 10:07am

Yep, that looks a bit better than mine.

stefan_z · May 26, 2021, 3:48pm

Cool, thanks for explanation!