What is the reasoning behind baking in the adatper at compile-time

D4no0 · December 29, 2023, 10:01am

In a recent discussion in another thread related to configs(which I can’t find for some reason),

I mentioned that Ecto configuration is limited in terms of adapters, since adapters can be configured only at compile-time .Example:

  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.SQLite3

This imposes some hard constrains, for example you cannot have a runtime configuration that would switch between postgres and sqlite, as they are using different adapters.

@josevalim’s response was that this was an explicit decision when the library was designed.

Looking a little at the code I found the logic responsible for repo definition:

defmacro __using__(opts) do
 quote bind_quoted: [opts: opts] do
      @behaviour Ecto.Repo

      {otp_app, adapter, behaviours} =
        Ecto.Repo.Supervisor.compile_config(__MODULE__, opts)

      @otp_app otp_app
      @adapter adapter
      @default_dynamic_repo opts[:default_dynamic_repo] || __MODULE__
      @read_only opts[:read_only] || false
      @before_compile adapter
      ...

It is clear that the adapter code will be generated in your repo. I would see the following advantages to this:

Code completion works out of the box, since that will become code that is part of your custom repo;
Better performance? (not sure about this one, would like to hear more);
Repo is implementing different functionality and public API based on the adapter.

Another question would be if this is the only way to implement this, and if we couldn’t replace this with a fully dynamic configuration without potentially losing features?

dimitarvp · December 29, 2023, 10:07am

What would the benefit of swappable repo config be? And why is Ecto’s dynamic repo functionality insufficient?

D4no0 · December 29, 2023, 10:14am

The simplest example is just having sqlite3 on your dev machine and postgres on prod, which currently is not possible without creating 2 separate repos and selecting the correct one with a function.

This is not possible because postgres is using Ecto.Adapters.SQL and sqlite3 Ecto.Adapters.SQLite3. Why sqlite3 cannot use the SQL adapter is explained by @warmwaffles in detail here.

Dynamic repos can configure everything at runtime except the adapter.

dimitarvp · December 29, 2023, 10:29am

But what’s the problem of having two separate repos for dev and prod?

At least that would be more or less explicit. With a repo that can swap adapters underneath you’re moving complexity to an invisible place (inside this alternative Ecto and not in your app config as it currently would happen).

I mean OK, you can probably argue it’s somewhat better with a bucket list of items – but the added value would be very small IMO. I don’t get why the current way of doing things is a complete deal-breaker for you?

D4no0 · December 29, 2023, 10:39am

It just adds a lot of scaffolding and complexity to the project. For example you want to have the option of switching between postgres and sqlite3 on dev/prod, now this becomes a problem of having to maintain 2 repo modules, one for sqlite and another for postgres.

Creating a library on top of ecto to deal with this is trivial and I was thinking of doing this, however I want to fully understand why this is done like this in the first place.

I never mentioned abstracting away the adapter configuration, but to have adapter configuration actually be in config.exs and work with runtime.exs, opposed to how it is currently in the module definition of your custom repo module.

dimitarvp · December 29, 2023, 10:52am

OK, I won’t be chasing this since you seem determined but I can’t say the problems you are outlining are major or even worth solving – to me.

Curious what you would find though, and how will you end up solving your problem. As it is, I wouldn’t trust my dev environment at all if it uses a different DB there vs. prod.

D4no0 · December 29, 2023, 10:56am

Maybe the example I pointed is not the best one, the use-case is not having different environments on dev vs prod, but to have the option of switching between sqlite3 and postgres. At the end of the day this is one of the selling points of ecto, write generic queries that can run on any underlying database implementation.

Of course this implies that you are constrained in db specific queries, not to mention that query generation might be one more reason why adapter is baked. For sanity checks, on our project we are running the tests on both postgres and sqlite, because yeah, interesting things can happen if you are not careful.

felix-starman · December 29, 2023, 7:25pm

I’d imagine that if they made it runtime from the beginning it might have resulted in more developers expecting the adapters to handle more of that RDBMS implementation variance at runtime as well, making it harder to implement new adapters.
We run MySQL in production but want to migrate to postgres, and yeah, I can see where it would be nice to swap that without recompiling, but in reality it’s not something I need to do without recompiling anyway.

Although I will note that we handle the testing part of that by using dynamic repos or by passing the repo in directly in the places where it matters to confirm the compatibility in testing.

At compile though we could just use Application.compile_env as part of the use and swap it in dev.exs or test.exs

al2o3cr · December 30, 2023, 12:57am

This is the big one: the code slightly below the part you quoted uses behaviours to decide which parts of the Ecto.Repo API to generate.

github.com

elixir-ecto/ecto/blob/b386c39331993a78d5a9b584dc85e8efad6a2d13/lib/ecto/repo.ex#L273-L293


      
          if Ecto.Adapter.Transaction in behaviours do
            def transaction(fun_or_multi, opts \\ []) do
              repo = get_dynamic_repo()
          
              Ecto.Repo.Transaction.transaction(
                __MODULE__,
                repo,
                fun_or_multi,
                Ecto.Repo.Supervisor.tuplet(repo, prepare_opts(:transaction, opts))
              )
            end
          
            def in_transaction? do
              Ecto.Repo.Transaction.in_transaction?(get_dynamic_repo())
            end
          
            @spec rollback(term) :: no_return
            def rollback(value) do
              Ecto.Repo.Transaction.rollback(get_dynamic_repo(), value)
            end
          end

With purely-runtime knowledge of the adapter, there would only be two alternatives:

assume every adapter supports everything and fail at runtime if they don’t
require explicit configuration to a “lowest common factor” - if some adapters to be used at runtime don’t support transactions, then remove them for every adapter. Make the wrong choice and pick something not universally supported? Runtime error again

Either way, you’re trading compile-time safety for hypothetical run-time flexibility.

I say “hypothetical” because it’s very easy to accidentally couple to specific features of a database - not just with obvious stuff like fragment("VENDOR SPECIFIC SQL") but even things like Ecto.Query.API.filter/2 which doesn’t work on anything besides PG and SQLite.

You can even find library bugs that only appear for some adapters - for instance, here’s one that was silently slightly wrong when tested with SQLite but would have failed disastrously in production with Oracle.

tfwright · December 30, 2023, 1:45am

I share OP’s curiosity on this point. Also not a practical problem for me but for the sake of better understanding I’m interested in what the intentional choice was here (I am pretty sure it was clarified in the original thread it is not a legacy issue).

If this is the answer, it seems like it at least could apply to any config, since any compile time config could be verified at that point instead of waiting for runtime to fail (e.g. missing API key). But my understanding is that it is best practice to make all config that can be, runtime config.

sodapopcan · December 30, 2023, 1:55am

Forgive my ignorance if I’m missing something (I often am), but in the case of using sqlite in dev and postgres in prod (which I know you said wan’t the best example), why does that have to be a runtime thing? Does setting the appropriate adapter at compile time cause issue with that?

warmwaffles · December 30, 2023, 4:00am

Just configure your repo in config/dev.exs and config/test.exs to use sqlite and in config/prod.exs use something else. Although I highly recommend against doing that from an operations stand point. I would go sqlite for all three environments or postgres. I dot not recommend mixing because you lose features explicit to each backend type. For postgres, it would be access to native enums, ltree, hstore, etc… But if you must, I recommend running sqlite for local dev and what ever database you are targeting for production to be configured for config/test.exs as well. Just my two pennies on this.

sodapopcan · December 30, 2023, 4:07am

Ya, I’ve never been able to make this work because I push a fair share of business logic to the DB through Ecto (one of many reasons I Ecto) but there are people who deliberately work in a completely DB-agnostic way. Using sqlite in testing would give them a significant speed boost.

Schultzer · December 30, 2023, 4:45am

I don’t believe that this is true.

You can set your adapter in dev and prod to whatever you want and compile your repo with it.

defmodule MyApp.Repo do
  use Ecto.Repo, otp_app: :my_app, adapter: Application.get_env(:my_app, :ecto_adapter)
end

D4no0 · December 30, 2023, 10:26am

This does seem to be correct, from what I see the postgres adapter supports transactions while the mysql doesn’t. Then it is true that you will lose compile-time benefits if you were to do this at runtime.

I think this can be qualified as a bug or maybe a mistake that is too late to rollback, since from what I remember Ecto always tried to keep the query api compatible with all the underlying implementations and it would turn down addition of more specific constructs to the query API.

You want to have the possibility to have runtime config, you don’t want to recompile a deployed application every-time you change your credentials. By design runtime.exs runs after application has been compiled, but before you application starts.

This is just a feature that I need for my project. I want the compiled project to have the ability to switch from postgres to sqlite, without recompiling the entire project.

Let me make it clear by describing the product and state what these features try to achieve. So we are creating a project for government that is aimed at scanning local government or local critical websites for potential misconfiguration that can result in security holes. One of the important features is that the project will be self-hosted in multiple places, with entirely different loads. Having the ability to switch between postgres and sqlite would allow us the following:

Simplify greatly deployment for small instances, you just download the docker image or the release tarball, start it and voila. This is important because the government ops are just a huge mess and most of the people working there are not qualified;
We are not limited by sqlite. For deployments that will scan tens of thousands of hosts, you will be able to leverage postgres power, at the cost of having additional configuration and a postgres service.

Agree, there are different use-cases out there, once you decide to use them, changing the adapter makes no longer sense, be it compile-time or runtime. Maybe that is one of the reasons why definition of adapter is in the repo file as opposed to config.

Indeed, but the discussion is not the fact that adapter can be defined using config, but the fact that this is a strict compile-time config. If you were to run credo on this file, you would get a warning to change Application.get_env/2 to Application.compile_env/3, because changing the adapter in runtime.exs will not do anything.

tfwright · December 30, 2023, 10:20pm

Yes, exactly. API credentials are a classic example of how runtime config can be preferable to compile config. This despite the fact that the same potential argument made above for why Ecto adapter config was intentionally made compile time, could also be made for API credentials, and thus I think, for any config whatsoever. Certainly I have been bitten by a config problem that caused an API credential to not be set, which would have been nicer to catch during deploy (compilation) instead of hitting a user-facing runtime error.

Most of the other replies in this thread address the specific issue of environment specific Ecto config, but the more general question of why any config would be better implemented as compile time config vs runtime config remains unclear. It seems it actually must be, like most things, a tradeoff–between the flexibility of runtime config vs the “safety” of compile time config (validation), and so in the end it’s just a subjective call by the library author.

As I recall, in the Ruby ecosystem where everything was runtime it was common to have “startup” checks that would prevent the app from running if certain config wasn’t set correctly, which is not something I’ve commonly seen in Elixir apps. It seems like a potential value of those is the ability to catch config validation issues as early as possible, before users are affected, without giving up flexibility.

D4no0 · December 30, 2023, 10:40pm

I think this is related to the nature of how runtime configs were implemented.

We currently have the option of either using compile-time only config or runtime, one by fetching the value at compile-time and another by using a function. As far as I know there is no tooling to distinct between the two for library creators, so if your library supports only compile-time config, someone who has no idea will just declare a runtime config, only to find out later that his runtime config did nothing. Add to that the fact that you just started with elixir, and you have a recipe for a mess.

I was thinking for some time about a library that would allow for a config specification, that would both create and enforce a schema on what keys can be configured, and define if the value can be configured at runtime. The only thing I have still to ascertain is the fact that this kind of enforcement is possible at compilation, since configurations have their limitations in place.

tfwright · December 30, 2023, 11:50pm

I am not sure exactly when runtime.exs is read but the docs is clear that it is after compilation so I don’t think what you are describing would be possible, but I could be wrong about that.

However, with the current implementation of runtime.exs I believe it is pretty trivial to add checks that prevent app start at runtime, it just seems like most library authors don’t bother with this, preferring to allow the app to start and fail later.

dimitarvp · December 31, 2023, 12:24am

Hit me up if you start it. I had the same idea but my hands are super full lately. But could spare an hour or two here and there to do reviews or contribute.

felix-starman · December 31, 2023, 4:16am

Ah ok, that makes sense.

So if I’m understanding correctly, it’s not necessarily that you need a particular repo module to be able to swap adapters at the drop of a hat, or for different use-cases running in the same VM, but that you need the ability to avoid the maintenance headache of never being able to just use MyApp.Repo, and not even put_dynamic_repo because if they’re different adapters, they’re likely different modules. And even if you did ship different releases for different DB adapters, you might be dealing with constant confusion.

If shipping a my_app_pg.tar and my_app_sqlite3.tar would be too much of a pain, then I’d look at the Ecto.Repo behaviour itself and just write a thing shim implementation that immediately calls out persistent_term to get the “true” repo module, and delegate to that.
Then in your MyApp.Application.start/2, or in runtime.exs the very first thing it should do is determine which adapter is actually going to get started, and start that one.

I have no idea if that will work though.

Here’s some example code that adapted from the a branch of double I had been working on that builds shims.

This is really hacked together and could be vastly simplified/condensed, I just copy-pasted and changed some bits.

A word of caution though: This is a BAD idea to do it this way. You will constantly be debugging things, especially since Ecto.Repos expect that they can use their own module name to find the current dynamic_repo.

I originally built these macros to facilitate debugging badly behaving macro-laden/auto-generated libraries, or overgrown domain modules by allowing the ability to inject a spy module that you could configure and introspect as part of a test, and was only meant to exist in the codebase long enough for a person to guide them towards proper abstractions.

I REALLY suggest you just have multiple tarballs called my_app_minimal.tar and my_app_full.tar. You’ll pay more in CI/CD from duplicate test runs, but it’s a level of mental load the devs won’t have to constantly be aware of.

defmodule MyApp.Application do
  def start(_type, _args) do
    # determine which repo needs to be started
    repo_mod = MyApp.RepoPicker.starting_repo!()
    :persistent_term.put({MyApp.ShimRepo, :repo}, repo_mod)

    children = [
      ...,
      repo_mod,
     # other stuff, but NOT MyApp.ShimRepo
    ]
  end
end

defmodule MyApp.RepoPicker do

  @default_impl_repo_mod Application.compile_env(:my_app, [__MODULE__, :default_impl], MyApp.SqliteRepo)
  @other_repo_mods Application.compile_env(:my_app, [__MODULE__, :others], [MyApp.PgRepo])

  import RepoShimmer

  defshim(MyApp.ShimRepo, for: [default_impl_repo_mod | @other_repo_mods])

  # set in runtime.exs, pulled from a ENV VAR or something, idk
  def starting_repo!, do: Application.fetch_env(__MODULE__, :starting_repo)
end

defmodule RepoShimmer do
  defmacro defshim(alias, opts \\ []) do
    sources = Keyword.fetch!(opts, :for)

    env = __CALLER__
    expanded = Macro.expand(alias, env)
    sources = for s <- sources, do: Macro.expand(s, env)
    default_source = #... idk, the first one, or you could grab it from opts? idk, doesn't matter

    func_defs = generate_function_defs(expanded, default_source)

    Module.create(expanded, func_defs, Macro.Env.location(__ENV__))
  end

  def repo_from_persistent_term(shim_mod), do: :persistent_term.get({shim_mod, :repo})

  defp generate_function_defs(mod_name, source, other_funcs \\ []) do
    funcs = Enum.uniq(nonprotected_functions(source) ++ other_funcs)

    for {func_name, arity} <- funcs do
      generate_function_def(mod_name, func_name, arity)
    end
  end

  defp generate_function_def(mod, func_name, arity) do
    args = Macro.generate_arguments(arity, mod)

    quote do
      def unquote(func_name)(unquote_splicing(args)) do
        repo_mod = RepoShimmer.repo_from_persistent_term(unquote(mod))
        apply(repo_mod, unquote(func_name), [unquote_splicing(args)])
      end
    end
  end

  defp nonprotected_functions(mod) do
    mod.module_info(:functions)
    |> Enum.reject(fn {k, _} ->
      [:__info__, :module_info] |> Enum.member?(k) ||
        String.starts_with?("#{k}", "_") ||
        String.starts_with?("#{k}", "-")
    end)
  end
end