Going through the excellent course on Elixir by Dave Thomas. There’s a bit of homework there to solve the Fibonacci sequence (for numbers up to 100), by using an Agent to cache results in order to improve performance.
The proposed solution is in the link above and I’m finding it a bit hard to read.
Is the solution given similar to a commonly found pattern, that I’d recognise if I were to see it more often?
I’d be interested to hear from people more experienced.
In my understanding, introducing a GenServer to optimize a recursion is an …uhhm overkill.
Just make it tail-recursive and you are all set:
defmodule F do
@till 5
def fib(prev \\ 1, curr \\ 1, n \\ 1)
def fib(prev, curr, @till), do: curr + prev
def fib(prev, curr, n) when n < @till, do: fib(curr, prev + curr, n + 1)
end
IO.puts F.fib()
#⇒ 13
For @till = 500:
time elixir /tmp/fib.ex
365014740723634211012237077906479355996081581501455497852747829366800199361550174096573645929019489792751
elixir /tmp/fib.ex 0,24s user 0,06s system 133% cpu 0,228 total
You can write it even simpler just with plain tail recursion:
defmodule MthTest do
use ExUnit.Case
test "zero fib number is 0" do
assert Mth.fib(0) == 0
end
test "first fib number is 1" do
assert Mth.fib(1) == 1
end
test "second fib number is 1" do
assert Mth.fib(2) == 1
end
test "thirteen fib number is 233" do
assert Mth.fib(13) == 233
end
end
defmodule Mth do
def fib(n), do: fib(n, 0, 1)
def fib(0, prev, _), do: prev
def fib(n, prev, cur), do: fib(n - 1, cur, prev + cur)
end
I think the exercise was a practice in using Agents to hold state, I guess he wasn’t proposing it was a good solution in the sense of optimising code though.
I’m wondering if there’s a more readable way to organise the code, if you HAD to use an Agent?
You are actually not identifying which aspects of the code you are having trouble with.
The code starts with CachedFib.fib/1 which simply invokes Cache.run/1 - supplying it’s own anonymous function for calculating any desired value with CachedFib.cached_fib/2. What could be a bit more clear here is that cache is a process identifier (PID).
CachedFib.cached_fib/2 tries to look up the value but also supplies an alternate function to calculate the requested value (by accessing the two “previous” values from the “cache process” cache and summing them). Supplying this “backup strategy” function feels weird but is a concession to the agent - agents only hold state - not logic. So any logic has to be supplied externally. The point here is that Cache simply caches key value pairs - how these key value pairs are generated is up to the client. (Ultimately agents aren’t that useful but in Elixir learning resources they are often used as a first exposure to processes).
Meanwhile Cache.run/1 simply starts up the agent and stores the initial values for n = 0 and n = 1 in the agent (to be consistent this initial value should really come from CacheFib.fib/1 as CacheFib.cache_fib/2 also supplies the logic for calculating key/value pairs). The second expression hands the passed anonymous function body the agent’s PID (granting CachedFib.cached_fib/2 access to the agent’s state so that it can be accessed and manipulated with the remaining Cache functions) binding the final result of the body function. The third expression terminates the agent process. The fourth and final expression simply returns the result generated by the second expression.
The remaining Cache functions concern themselves with looking up previously calculated values or adding new ones as generated by the if_not_found function - i.e.:
fn -> cached_fib(n-2, cache) + cached_fib(n-1, cache) end
Cache.set/3 creates a new “state” for the agent by taking the map already stored in the agent and adding {n,val} to it. Note the definition of Agent.get_and_update/3 - Cache.set/3 will return val which is only part of the {n,val} pair - i.e. the calculated value.
Cache.complete_if_not_found/4 uses pattern matching across two function clauses (defining one function). The first clause (the first argument value is nil) is invoked when a requested value wasn’t found in the map (stored as state in the agent); the supplied if_not_found function is invoked to generate the desired value and the result is pipelined into Cache.set/3 to update the agent’s state. Note that calculated result will be returned by Cache.set/3. The second complete_if_not_found/4 function clause simply returns the value it was given (from the map stored as the agent state).
Cache.lookup/3 simply tries to look up the requested value. The result drives what happens with Cache.complete_if_not_found/4 - it either simply returns the looked up value or generates the required value with the supplied if_not_found function.
Thanks for the thorough explanation. I could understand the code after studying it, but it took a little while. I think it’s mainly because I’m not used to the syntax around anon functions and also the concept of Agents. (actually I’m still not sure why the . required in: “body.(pid)”)
Aside from not setting the initial state of the cache in Cache.run, what would be different if you were to implement a general purpose cache (for key/value map) using an Agent?
Also now I’m curious why they’re not generally useful?
Also now I’m curious why they’re not generally useful?
They don’t manage their own state - i.e. the state can be corrupted by any one of their “clients”. I typically prefer for a process to be in charge of it’s own state. As such agent just strikes me as a glorified “global state” that anybody can trash. I prefer processes to exercise total autonomy over their own state.
I’m not particularly fond of sending functions in messages. Anonymous functions belong to the creating module and the function closure can reference other modules. The message can travel to a process in another node which may not have the requisite module dependencies - and therefore kill any process trying to execute it.
Following a bit further in the course, it appears that the new homework was indeed to implement the cache as a general purpose key/value cache. I’ve tried to make it as understandable as possible for myself. Would be grateful for any feedback, particularly around the lookup functions (/2 and private /3). I wasn’t sure about using the same name for the 3 functions that handle the lookup, because they handle 2 different stages of the lookup. (in other languages I would have used a conditional inside 1 method.) Also I thought that using _value to give an obvious indication of what was being returned would improve readability when scanning the functions in the pipe.
defmodule KVCacheAgent do
@me __MODULE__
################################### API ######################################
def start_cache(initial_state) do
Agent.start_link(fn -> initial_state end, name: @me)
end
def stop_cache() do
Agent.stop(@me)
end
def lookup(key, fn_if_not_found) do
_value = Agent.get(@me, fn state -> state[key] end)
|> lookup(key, fn_if_not_found)
end
############################### PRIVATE ######################################
defp lookup(_value = nil, key, fn_if_not_found) do
update(key, fn_if_not_found.())
Agent.get(@me, fn state -> state[key] end)
end
defp lookup(value, _key, _fn_if_not_found) do
value
end
defp update(key, value) do
Agent.update(@me, fn state -> Map.put(state, key, value) end )
end
end
defmodule Fibber do
def fib(n) do
KVCacheAgent.start_cache(%{ 0 => 0, 1 => 1 })
value = KVCacheAgent.lookup(n, cached_fib(n))
IO.puts("The fibonacci sequence value for #{n} is: #{value}")
KVCacheAgent.stop_cache()
end
defp cached_fib(n) do
KVCacheAgent.lookup(n, fn -> cached_fib(n-2) + cached_fib(n-1) end)
end
end