Best way to store/retrieve DB table features

When my PostgreSQL tables have lots of unique indexes and/or their rows are multi-referenced as foreign keys, I prefer to call unique_constraint/3 and foreign_key_constraint/3 dynamically. I can send a query to the database (such as the one below), obtain a list of the relevant constraints, and modify my changeset by iterating that list. This way the database remains my SSOT and I do not have to worry about unique_constraint/3 and foreign_key_constraint/3 maintenance.

Repo.query("SELECT conname FROM pg_catalog.pg_constraint c WHERE c.confrelid = '#{my_table}'::regclass AND c.contype = 'f'")

Obviously, I do not want to query the database repeatedly for immutable information. I therefore need to store two pieces of information per schema/table, either before or as soon as my server starts. What is the best way to achieve this? Is it possible to query the database at compile time? This would be the ideal solution (with a bit of try/rescue tinkering for missing tables during development), but the outcome I am getting is “could not lookup Xyz.Repo because it was not started or it does not exist”. If compile time is out of the question, what is the best alternative?

Regarding compile time I’m not sure, since Ecto (& the Postgrex adaptor) do some compile time work (and it would add a bunch of complexities…) - but if doing it on application init then I think something like this would work.

defmodule YourApp.Application do

    use Application
    
    def start(_type, _args) do
        children = [
             Db_Main.Repo
             # ... other things you're starting
        ]

        opts = [strategy: :one_for_one, name: YourApp.Supervisor]
        Supervisor.start_link(children, opts)
    end
end

Every module you’re starting here (usually genservers or other supervisors) have an init/2 callback. Erlang when starting supervisors/genservers/etc blocks on the init/2 of each child module being started, knowing that, you can add a gen_server after your repo is started and before other parts of the application are, that would be responsible for querying the DB, retrieving the relevant info and storing it in an ETS table that has public read (default) access. To ensure that the remaining app would only be loaded after this, you would place this loading inside that genserver’s init.

It would then look like:

children = [
     Db_Main.Repo
     Supervisor.child_spec({YourApp.ConstraintsLoader, []}, type: :worker)
     # ... other
]

So something like this should work, but you should handle possible failures, etc…

def YourApp.ConstraintsLoader do
    use GenServer
    require Logger

    def start_link(_) do
        Logger.info("Starting Constraints Loader")
        GenServer.start_link(__MODULE__, nil, name: :constraints_loader)
    end

    def init(nil) do
        # if you're just going to read from the table subsequently and from many processes/requests perhaps `read_concurrency: true`, can be given to the ets options as well

        :ets.new(:constraints_table, [:named_table])

        YourRepo.query("SELECT conname FROM pg_catalog.pg_constraint c WHERE c.confrelid = '#{my_table}'::regclass AND c.contype = 'f'")
        |> Enum.each(fn(result_row) ->
            # figure out what you need to insert and how you'll identify it, the first key on the tuple is the identifier and it can be any term

             :ets.insert(:constraints_table, {"unique_identifier_element", result_row.column_1, result_row.column_2})
        end)

        {:ok, :initialized}
     end
end

After this you should be able to query the :ets from any part of your app, with something like:

case :ets.lookup(:constraints_table, "unique_identifier_element") do
     [] -> # no element with that identifier was in the table
     [{identifier, column1, column2}] -> # an element with that key was found
end

You can also retrieve the full table as a list with :ets.tab2list(:constraints_table)

2 Likes

You need to make it :protected otherwise you can’t query it from other processes. :read_concurrency is probably important too.

This may be a good use case for the new :persistent_term module on OTP 21 actually.

2 Likes

I haven’t checked :persistent_term yet, will do! but :protected isn’t needed? :read_concurrency is probably warranted though

This forum community never ceases to amaze me, thank you for these most helpful responses. You have led me to delve deeper into OTP, which is a good thing! Despite my best efforts, however, I am still facing two conceptual difficulties, as I still have plenty of ground to cover…

  1. Why is YourApp.ConstraintsLoader using GenServer? Wouldn’t init/1 work similarly without the “use GenServer” bit?

  2. Why is YourApp.ConstraintsLoader not using GenServer? Does ETS have an advantage over GenServer-based storage (with which I am more familiar) in this particular scenario?

1 Like

An ets table is always bound to one process and if that process dies it will be removed. That’s why you use a genserver. You don’t use the genserver for storage, because ets can be directly accessed by other processes, while reading from the genserver’s state would be sequential and could therefore be a bottleneck.

2 Likes

Just to make sure I am getting this right. My reading of your reply is:

• that the GenServer keeps the OTP process alive, thus “persisting” the ETS data
• that GenServer-based storage is sub-optimal except in a 1-process to 1-GenServer (or similar) scenario

I hope that I am not totally off the mark…

Persistence isn’t quite the right term. A Genserver is an OTP process. An Ets table is NOT a process. However, an ets table is owned by a process, and the VM will delete the table when the owning process dies. The GenServer exists to just basically hang around and do nothing so that the ets table can stay alive.

:ets is still better if all you’re doing is key value lookups. GenServers however become useful when you want to do something other than simply store information.

3 Likes

As @benwilson512 and @LostKobrakai mentioned, it isn’t persistence, but the fact that in Erlang an ETS table is inherently tied to the life cycle of the process that starts it.
So if you start an ets table inside a genserver (which is OTP), that can be part of a supervision tree, due to the ETS ownership model, its life cycle will mimic that of the owner process.

In your example imagining that perhaps multiple requests for reading the constraints can happen simultaneously, having them stored as part of the GenServer state and retrieving them through interaction with the genserver would introduce at least a possible bottleneck (it might be irrelevant in given situations), since interaction with the genserver is itself serialised through the process mailbox and a process can only be “dealing” with one message at a time. ETS on the other hand doesn’t introduce that bottleneck, and is super fast.

Also having it being started on a genserver that is part of a supervision tree means that now, if you decide you’ll need to add one more module/etc to the initialisation that also depends on the repo, and for which you want a different restart strategy overall, would mean you can easily move those into their own supervisor, and from the application init start that supervisor, which in turn starts the Repo, the constraints loader, etc, by simply writing a supervisor module and moving a couple of lines.
Plus, now it’s neatly compartmentalised - so you can use the OTP principles to reason about the way your “components” are initialised in a pretty straightforward way.

Lastly, in your case it doesn’t seem like you would need to do anything else after the table is initialised and populated, but in many cases you might want to be able to refresh the tables, or serialise specific access. By having it in a genserver you can easily design this, by perhaps starting 3 tables instead, one active (populated), a empty one, and pointer table (as ordered_set) that just stores the name table for the active one and then writing a handle call for updating, where it gets the active one table name from the pointer table, fills the other one (the spare), replaces the “pointer” afterwards to point to the newly populated one, and then cleans the one that was previously populated, etc.

2 Likes

Thanks to the fantastic help above, I was able to achieve what I set out to. I no longer need to worry about synchronizing constraints between my database and my Elixir code. Even better, I learned a lot about GenServer, ETS and OTP in general. My only (minor) disappointment was when I discovered that

:ets.match(:my_store, {“my_key”, :"$1"})

returned a (list containing a) 1-character charlist instead of the expected integer. No big deal, of course, I was able to pattern match using :ets.lookup instead. It just felt like a reality-check, nothing is perfect…