I am running some design tests. The example here is not a real system but just some rough parts to test how things can fit together.
APPLICATION STRUCTURE
Hypothetically imagine a structure like this:
APPLICATION
|__SUPERVISOR
|__SUPERVISORTEST
|__DATACACHETEST
|__WEBSOCKET
The DataCacheTest is a GenServer that starts an ETS table which can be read/written by anyone anywhere. Read/write is not passed through the call/cast system before the ETS. Yes, there are race conditions.
WebSocket is an HTTP router and/or websocket upgrade by which users might access or manipulate the data in the ETS table.
PROBLEM
The problem I am trying to figure out is:
What happens if the DataCacheTest GenServer crashes? It will be restarted, but in the mean time, queries on the ETS data from the HTTP system will fail. (ETS table won’t exist.) What happens in that moment? ie. While the ETS table GenServer is restarting?
1) CHECK IF EXISTS?
One idea is you could do a check on the GenServer existing like inside DataCacheTest:
def exists do
pid = Process.whereis(__MODULE__)
is_pid(pid) # return true or false
end
And then use that before any read/write operations on the module. ie. running in websocket:
if DataCacheTest.exists() do
DataCacheTest.read_value()
end
But I feel like this is wrong. Because you could theoretically pass the exists()
check and then still fail at the read_value()
if it dies between that time. Or it dies but the Registry isn’t updated yet.
You don’t want the error reaching the HTTP socket module as I’m not sure how crashing works but you certainly don’t want this to crash as all your users would be disconnected.
2) TRY/RESCUE ALL ETS OPERATIONS?
My best idea is then to (knowing that any ets operations can fail in this case) wrap any :ets.lookup
and :ets.insert
functions inside DataCacheTest like so:
result = try do
:ets.lookup(:doesntexist, :doesntexist)
rescue
ArgumentError -> IO.puts("Caught ETS failure")
@failure_value
end
ie.
try do :ets.lookup(:doesntexist, :doesntexist) rescue ArgumentError -> IO.puts("Caught ETS failure") end
try do :ets.insert(:doesntexist, {:doesntexist, :whatever}) rescue ArgumentError -> IO.puts("Caught ETS failure") end
These two operations do seem to catch the errors without the red error test when I test them in the console. I’m guessing any time you see red error text, something may be crashed? I’m not sure. Is that the case?
Is this the right idea?
Thanks for any thoughts.
TEST CODE
1) Application
Hypothetical Application that starts the system hierarchy above.
defmodule MyApplication.Application do
#use Application
def start(_type, _args) do
web_socket = {Plug.Cowboy, plug: TestWebSocket, scheme: :https, options: [ port: 4100, ] }
supervisor_test = {SupervisorTest, nil}
children = [web_socket, supervisor_test]
opts = [strategy: :one_for_one, name: MyApplication.Supervisor]
supervisor = Supervisor.start_link(children, opts)
end
end
2) ETS Supervisor
Not really needed, I suppose, but I was just thinking maybe it is nice to have a different supervisor here for the ETS Genserver so tested creating this.
# test run as Supervisor.start_link([{SupervisorTest, nil}], strategy: :one_for_one)
defmodule SupervisorTest do
use Supervisor
def start_link(args) do
Supervisor.start_link(__MODULE__, args, name: __MODULE__)
end
# how to initialize supervisor on start up
def init(_args) do
children = [
{DataCacheTest, nil}
]
Supervisor.init(children, strategy: :one_for_one)
end
end
3) ETS Table Owner
This GenServer creates an ETS server and has some static (non-call/cast) functions for accessing that data anyone anywhere can use.
defmodule DataCacheTest do
# Supervisor.start_link([{ DataCacheTest, nil}], strategy: :one_for_one)
# DataCacheTest.read_value()
# DataCacheTest.increment_value()
# DataCacheTest.crash()
use GenServer
@key_name :ets_key
@value_name :ets_value
@table_name :data_cache_test_ets_table
@default_value 1
def start_link(args) do
GenServer.start_link(__MODULE__, args, name: __MODULE__)
end
#===============================================
# (i) INITIALIZATION FUNCTION (CREATE ETS TABLE)
#===============================================
def init(args) do
create_table()
{:ok, nil}
end
def create_table() do
case :ets.info(@table_name) do
:undefined ->
:ets.new(@table_name, [:set, :public, :named_table, {:read_concurrency, false}, {:write_concurrency, false} ])
# RETURNS :global_ets_table
_->
nil
end
end
#=====================================
# ii) ETS ACCESSOR FUNCTIONS - intentionally not using call/cast and not atomic, yes there will be racing
#=====================================
def read_value() do
result = :ets.lookup(@table_name, @key_name) #=== WRAP IN TRY/RESCUE???
case result do
[]-> #doesn't exist, insert default
:ets.insert(@table_name, {@key_name, @default_value}) #=== WRAP IN TRY/RESCUE???
@default_value
[{@key_name, value}]-> #found key/value
value
_->
#not sure why we are here, create table again maybe
IO.puts("FAILED READING VALUE")
@default_value
end
end
def increment_value() do
current_val = read_value()
new_val = current_val + 1
:ets.insert(@table_name, {@key_name, new_val}) #=== WRAP IN TRY/RESCUE???
new_val
end
#================
# (iii) CRASH
#================
def crash() do
pid = Process.whereis(__MODULE__)
Process.exit(pid, :crash)
raise "intentionally crashed"
end
end
4) Websocket/Server
This wants to run queries on the ETS Data directly without going through call/cast
defmodule TestWebSocket do
use Plug.Router
plug Plug.Logger
plug :match
plug :dispatch
get "/get_value" do
value = DataCacheTest.read_value() # WHAT HAPPENS IF ETS GENSERVER IS CRASHING?
conn
|> put_resp_content_type("text/plain")
|> send_resp(200, to_string(value))
end
get "/increment" do
value = DataCacheTest.increment_value() # WHAT HAPPENS IF ETS GENSERVER IS CRASHING?
conn
|> put_resp_content_type("text/plain")
|> send_resp(200, to_string(value))
end
match _ do
send_resp(conn, 404, "not found")
end
end