I’m writing an app that manages sensitive data. It might be nice to segregate each customer’s data into a separate Sqlite database.
I’ve done different flavors of multi-tenancy in Postgres, and have configured apps that use a pre-defined group of databases. But never have seen anyone attach, create & destroy Sqlite instances dynamically.
I’m pretty sure everything you need can be achieved in Elixir/Phoenix, yet, I must say, I have no experience with it.
Having said that, if instead of calling Repo in your project, you do something akin to get_repo(user, team), where you lazy create that database if it doesn’t exist (and check permissions to access), that should sandboxed enough.
If you think about it, it’s much more secure than other approaches, where a web framework defaults to a db connection. Here, every, single, time, you need to explicitly tell elixir what database to use, by calling a configured Repo (you can configure more than one).
At scale this becomes less appealing. There is a cost to open/close each db. At scale you won’t be able to keep all of them open all the time. With any kind of concurrent usage of a single db, your app will need to coordinate who the “writer” is. If your app moves to multiple nodes, the work will need to be routed to the node that hosts the sqllite db. If your app is transaction heavy, you will end up doing alot more i/o, the syncing of the single postgres WAL is significantly more efficient than an array of sqlite dbs.
@dimitarvp thanks again for the reference to dynamic repos. Using this info, I wrote an prototype library called DynDb to manage Dynamic Sqlite databases. Interested to get feedback re: API design and potential gotchas. Have a look!
#!/usr/bin/env elixir
Mix.install([
{:dyn_db, github: "andyl/dyn_db"}
])
# Write a Migration
defmodule Migration0 do
use Ecto.Migration
def change do
create table("chats") do
add(:message, :string)
timestamps(type: :utc_datetime_usec)
end
end
end
# Write a Schema
defmodule Chat do
use Ecto.Schema
schema "chats" do
field(:message, :string)
timestamps(type: :utc_datetime_usec)
end
end
# Interact with the DynDb
defmodule Main do
def main do
# The database file will be auto-created
{:ok, db1} = DynDb.start_link(database: "./data1.db")
# Migrate
DynDb.migrate(db1, Migration0)
# Query
DynDb.insert!(db1, %Chat{message: "HELLO at #{Time.utc_now()} UTC"})
DynDb.all(db1, Chat) |> IO.inspect()
end
end
Main.main()
Could you say a little more about the overall design and thought process in terms of scalability and failover?
And my understanding is that sqlite can only have one connection, which might be fine if you guard access using a single genserver.
When using multiple SQLite databases, which are usually on same file-system as the application accessing it, how do then move to supporting a multi-node configuration?
Then does every node mount a shared drive or file system to access the sqlite files or is there a primary node that opens connections to databases and every other node connects to that node? Or is there a global registry that is synced with all connected nodes, and each node has access to the same mounted file system and every node can checkout and connect to databases but the registry will only allow one node to connect to a database at a time?
This is an old constraint. As long as you open your database with journal_mode: :wal you should be fine with multiple processes writing to the same database. Where you WILL run into potential issues is if you are writing to a database file that is on an NFS mount. Then your file system locks won’t be perfect. I haven’t had any issues yet but I suspect if I have network issues that NFS mount is going to suffer.
With how ecto_sqlite3 and exqlite are currently built, it uses DBConnection to pool the sqlite file handles. We have had discussions about moving to a single genserver for writing and reads going to another genserver(s). But right now each node has its own pool of handles open for the database.
I have a personal project I’ve been toying with where I have thousands of sqlite databases. The data is partitioned that way so that I can do a really dumb map reduce. I have it setup to use dynamic repos which causes the connection pool to go down to 1 and execute on that. All of the databases are local to the instance and replicated out to a file server for backups. I avoided opening the database over NFS for the fear of network gremlins.
There are a lot of ways to slice this, and just make sure you have a plan in place to deal with the faults that will happen.
Multiple handles all pointing to the same SQLite database file.
Like @warmwaffles says, you can also open the DB in wal or wal2 mode in combination with any of options 2 and 3 and then your SQLite is effectively like a local Postgres. It would still struggle if there was a big amount of writes per second – I reckon 2000 or more – but most apps needing SQLite don’t do that so…