Seeking guidance implementing DbConnection

I’m working on an Elixir client for LanceDB (sqlite for vector storage) called ElixirLanceDB. Most of this work so far has comprised of writing Rustler NIFs over the LanceDB rust client. I’ve got enough of the library implemented that I can use it for a basic RAG / hybrid search server, and I plan to keep fleshing it out steadily and would eventually like to release it on Hex.

I’d like to implement DBConnection so that I can plug it into Ecto and AshDataLayer and the rest of the ecosystem.

The problem I’m running into is, I don’t understand a lot of these DBConnection callbacks, and they don’t all seem to apply to LanceDB in the same way as they’d apply to Postgres/MySQL/SQLite etc.

For example, a connection to the database itself is really only used for opening connections to tables. All the querying etc happens on the table. So I’d actually want to maintain connection pools both to the DB itself as well as to individual tables.

I’d want to be able to plug into migration systems, but I probably also want the ability to create tables on the fly.

I have a pretty good handle on the LanceDB rust client at this point, but I’m a little overwhelmed by this DBConnection aspect. I guess my big broad question here is, can anyone give me some pointers or suggest some reading or otherwise help me wrap my head around this? Thanks!

4 Likes

Can you point to some documents on how to connect to LanceDB? Requiring a persistent connection per table sounds quite unusual.

1 Like

I am just finishing preparing my Rust NIF wrapper library for Ecto integration. This has mandated a number of changes that I couldn’t easily guess by myself e.g. supporting savepoints and finding the primary key value of a just-inserted record.

Full disclosure: I made heavy use of Gemini Pro v2.5 for this particular task and asked it to analyze the existing first-class Ecto integrations (PostgreSQL, MySQL, SQL Server). It provided a very nice checklist that I am still working through and I am almost done – only schema introspection is left, after which I will proceed with writing another library that is using my Rustler NIF library but adds all the necessary Ecto boilerplate (Elixir code of course) to make it a good Ecto citizen. My goal is a 99% drop-in replacement for the other 3 main contenders (well, and for the other SQLite NIF library that is using C code: exqlite).

This hybrid approach – asking an LLM for a discussion and an analysis, making it generate some code, rigorously checking it against what I wrote (or would write), and asking to check if I missed corner test cases, and a few other such tension items – so far has netted me fantastic results and mildly increased productivity, while keeping me firmly on the wheel (I want an LLM to be an enabler and a productivity booster, not a brain replacement). Most code I wrote came from me with some exceptions like e.g. super long list of error mappings from an underlying Rust library’s error enum to my own type of error (also a Rust enum) that gets encoded to {:error, reason} tuples. I have only used the LLM for such laborious and annoying tasks that would waste me a lot of time. (And still checked them manually by eyeballing, top to bottom).

…oh, and I just used it to generate me a README. It did so surprisingly well.

If you like, we can pair on your library one of the following days, weeks, months. Most of that Ecto integration checklist is still warm & fresh in my head and I’ll start working my way through it VerySoon™. We could do screen share sessions or, if you are not comfortable with that, we can chat via text here or somewhere else. I believe we can help each other. Or you can choose to go your own way if that helps you learn in a way that sticks with you for longest.

I could paste you that checklist here but I am afraid the thread could become quite big and noisy.

1 Like

You might not even need DBConnection. Since LanceDB can speak SQL then you could maybe try shim it with Ecto.SQL.Adapters.SQL.

Or look at GitHub - Schultzer/ecto_qlc: QLC-based adapters and database migrations for Ecto or GitHub - foundationdb-beam/ecto_foundationdb: FoundationDB Adapter for Elixir's Ecto for none DBConnection Ecto adapters.

4 Likes

There’s a lot of things I learned (and still learning!) about implementing an Ecto Adapter that aren’t written down anywhere. I’ve been meaning to write a blog post about the experience, but I don’t even have a blog yet, so that’s still on the project list.

One of those things was realizing I shouldn’t use DBConnection for my particular backend. I don’t know enough about LanceDB to make a call, but if it were me, I’d start without it and add it back in later if needed.

A bit off topic, but this is actually something I’m wrestling with right now on my Ecto Adapter, so I’m curious about what’s in your future for your project.


Aside: I don’t know if folks would be interested in connecting on the EEF Slack for more realtime collab but I’d be up for it.

4 Likes

They don’t apply that well to sqlite as well: Replace DirtyNIF execution model with a different mechanism · Issue #192 · elixir-sqlite/exqlite · GitHub

I’d go even further and ask how interaction with LanceDB looks like from ecto in the first place – especially around querying. Ecto in theory can deal with none sql databases, but I don’t think this has ever been fully and thoroughly executed.

1 Like

Unless I got this all wrong, a DirtyNIF call taking multiple seconds is not a problem at all since we have a number of dirty scheduler OS threads and that’s what they are for – to allow NIFs to slack off. A function call taking its sweet time while running on one does not ruin the BEAM VM’s guarantees per se. F.ex. the 1ms rule (“every function call must return within 1ms”) only applies on normal schedulers, not on dirty ones. Hence in projects that are in danger of having longer-running dirty NIFs bumping the value of dirty schedulers is recommended f.ex. add -SDio 40 -SDcpu 40 to the BEAM VM startup options gives us 40 dirty I/O scheduler OS threads and 40 CPU ones. Super recommended if f.ex. you have an SQLite connection pool that’s 30-35 connections big.

I have read that thread multiple times and since I am extremely close to feature parity with other DB adapters and I am about to start the Ecto integration, I thought of interjecting on that thread multiple times. F.ex. I am not at all convinced that it’s useful knowing if an SQL statement you want to execute is read (SELECT) or write (INSERT, UPDATE, DELETE). But we’ll resume that discussion there.

Mostly it is about: get another connection handle valid for another Elixir process that actually describes the same database to (1) rid us of having to thoroughly inspect and abide by multi-threading caprices of the underlying storage engine and (2) allow closer integration with other Ecto-friendly code that assumes a connection pool, and that has numerous advantages f.ex. using DBConnection allows you to carry another state in the connection – SQLite version and branching on certain sensitive actions like whether INSERT ... RETURNING ... is at all supported comes to mind. Also transactions and savepoints (technically this is Ecto and not DBConnection though) fit really well. Thus, DBConnection support is actually very useful.

I absolutely am up for it but as I am still neck-deep in job search, my schedule is not stable and very much not my own yet, sadly. We can try and arrange a session.

1 Like

I’m wondering if you’re to deep into the weeds here. Yes there is low level details of connection handling. That’s not the only thing to deal with though when ecto is meant to interact with a db. I don’t know much about lancedb, but my point about looking at the higher level was mostly about the question if sql is actually sufficient or if there needs to be other querying apis and stuff like that. If it’s sql, then what sql statements are required to be supported, does Ecto.Query support building up all the necessary statements, do migrations support what is needed, …

DBConnection is also not at all required to build an ecto adapter. It just happens to encapsulate a lot of pooling requirements that you encounter when connecting to a database “server” over tcp. It might or might not be the right tool for a local database, which comes with different tradeoffs and limitations.

2 Likes

Oh I absolutely am, no doubt about it. I am coding my thing in half-conscious frenzy and fever as I am feeling productive and motivated to do hobby coding like I have not been in at least 5 years. Can I get a hug?

If you feel I am missing something important or veering off into a direction that’s not related to the goals of those projects then please – your perspective will be appreciated. And I’ll announce the library on the forum soon enough, so anyone will be able to criticize. Ideally I’d love to avoid missing something hugely important and embarrass myself but oh well, this does not sting me as much as it did before (in fact it almost does not these days).

Yep, true, and that’s what the mentioned Ecto integration checklist is for. It’s not exactly small but it’s also very doable.

Well, if one wants to be able to f.ex. to have the normal Ecto.Query DSL then the raw SQL is not enough, no? My own library even now can do almost everything that SQLite can do (minus a few things like UDF, backups etc. that I don’t deem critically important for a first functional version) via raw query and statement NIF APIs. But I don’t think that’s good enough, I want full Ecto integration and I am starting on it today or tomorrow.

This is the thing that took me a little time to unpack and I claim that DBConnection looks to be a nice fit for a local database as well.

In the SQLite “no mutex” mode (which is the most recommended one and the one I’ll default to in my library) you can get 50 connection handles to the same 1 database as long as every handle is only used by one OS thread (and thus an Elixir process) at a time. You could share a single connection handle to 50 OS threads / Elixir processes, nobody can stop you, but you’ll have to wait on a Mutex, effectively serializing access and making a connection pool superfluous.

Hence, if doing Ecto integration as one should, DBConnection lends itself excellently to a local database as well because allocating a connection handle per checkout request naturally leads to getting another handle that points at the same database – just as with TCP/UDP-connected databases.

Do you feel I am missing – or have misunderstood – something important?

You do not need Ecto.Query for prepared queries. Ecto.Query generates parameterized and caches them in ets so they can be used as prepared queries.

1 Like

Nice, thanks. I probably got overzealous in trying to achieve some nebulous 100% coverage.

The future for generating sql will be GitHub - elixir-dbvisor/sql: Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries.

By using the basic engineering principle of separation of concern you can built something very powerful with Enumerable and Collectable protocols.

2 Likes

Really cool idea using Enumerable and Collectable directly on queries and implementing the protocol right on the Ecto.Repo.

So, it’s not that a persistent connection is required so much as that it’s preferable. In order to query a table you have to open it first. That incurs some overhead, but I can store the connection reference in a rustler ResourceArc to share the open table back to the BEAM.

This isn’t documented, but is something that a LanceDB team member told me on discord in response to a very direct question about it.

Hot diggity. Yeah - users do not connect to LanceDB over tcp at all. You can have basically limitless reads and appends, and you only want to throttle concurrent update/delete (mutation) operations to a recommended 10 max. The exqlite discussion you linked is really useful because it seems like similar dynamics are at play.

I am not sure where I got the impression that I needed to implement DBConnection to create an Ecto Adapter but I’m going to regroup and try to attack this directly as an Adapter.

It’s not a noSQL DB, fwiw. “LanceDB supports a growing list of SQL expressions.” including DataFusion math functions. However it’s not relational, i.e. there are no joins.

It also has operations like “nearest_to” for vector similarity search, as well as full-text search, and hybrid “vector + full-text” search with configurable reranking strategies. Very good for RAG. So if I can make an Ecto adapter that supports that kind of stuff either via options or some DSL, that will be ideal.

3 Likes

Thank you. I’m marking this as the answer because it answered the question I didn’t even think to ask: do I need DBConnection for this use case? The answer seems to be no, mostly because LanceDB doesn’t need to connect over TCP. I’m going to look into creating an Ecto Adapter instead.

Not really looking to pair program at the moment. Really just trying to wrap my head around what I need to do to plug into the Ecto ecosystem and it’s looking like I won’t actually need to implement DBConnection to make an Ecto adapter. Woohoo! I definitely agree LLMs can be great learning tools etc.

1 Like

Yea I don’t think DBConnection is necessary for you. Even for the whole “opening a table” concept you could have some sort genserver / ets table in the background that handled this. Just make sure to scope it to a particular repo.

1 Like

I am potentially interested in chatting in the EEF slack! Not sure how to join it though.

Thanks so much. This whole convo has been super clarifying. Good forum is good :smiley:

1 Like