data_fetcher is a small library that can ease fetch-and-cache jobs for your Elixir projects.
Occasionally we need some data, typically used as configurations, from an external source. Chances are these configurations won’t change too often, so we may want to cache them and refresh the cached data periodically.
We may think these kinds of fetch-and-cache jobs are too simple for a library. Well, I thought that too, until I found myself repeating myself tackling the same challenges:
to set up the scheduler, either by sending a message to a scheduler with Process.send_after or send messages to the GenServer itself
to handle the failures in fetching the data, usually, just restart the job
to prevent immediately hitting the external source if there are simultaneous requests and the cache expires at the same time for them
to serve the simultaneous requests while we are fetching the data at the first time
no to block the whole application from booting up as the fetch job may be slow and the data is required by some of the endpoints only
to have a decent performance, no matter how big the data size is
So, I made a library for myself called data_fetcher, which solves the above problems. Please check it out if you have the same need.
I am grateful for the help from @wojtekmach who gave me a guide on the original idea and pointed out the problems on the first “design”, which I think most new developers may have too when not familiar with OTP. I made some mistakes on the first working version and (after reading related chapters from Elixir in Action) got it rewritten now.
Again, please check it out and any feedback is welcome!
Cool. I always enjoy libraries that wrap OTP boilerplate. Thanks for this.
As a suggestion, I’d also add a function that can generate a stereotypical child spec, like this:
def child_spec_for_function_and_interval(func, minutes)
when is_function(func, 1) and is_integer(minutes) and minutes > 0 do
{
DataFetcher,
name: :my_fetcher,
fetcher: func,
interval: :timer.minutes(minutes)
}
end
It’s going to be best if the name is shorter though.
Hi @stefan_z Thanks for the suggestion! I’ll improve the documentation and bring more examples.
For how it works in a distributed environment, currently it uses local registry so it will setup supervisor tree on each node, which means they do their job separately, with their own scheduler, worker and cache storage.
This may work for most scenarios but sometime you need a singleton job across the cluster. For such cases, we don’t directly support via configuration yet. But would be easy to support that, hopefully just replace Registry with a distributed registry, such as Horde.Registry