Is Explorer's ADBC module orthogonal to Ecto?

vegabook · October 3, 2023, 6:48pm

I’m in the early stages of writing a Bloomberg API wrapper for Elixir and recently asked about Ecto’s support for column DBs.

Now I discover that NX Explorer has an ADBC connectivity driver, which suits my needs really well.

However my sense was always that Ecto is the canonical database interface library for Elixir so now I’m wondering what this ADBC driver’s relationship is to Ecto. ADBC storage model is much more natural for data science, and by extension the Bloomberg API’s tendency to create huge time series. But it seems much more rudimentary for row-oriented workflows because even in a timeseries database where the data itself needs column orientation, you will also need other tables such as for ticker descriptions, or ticker groups etc, which are more row-oriented.

So my question is what is the plan on this ADBC driver. Will it be merged somehow into Ecto? Can it already be used via Ecto? Should Ecto be chucked out altogether for data science workflows?

A bit of guidance on how the developers are thinking on this would be useful. Thanks.

josevalim · October 3, 2023, 9:40pm

Should Ecto be chucked out altogether for data science workflows?

IMO. If you can use Explorer for data science, then that would be my choice, because it was designed for this use case. Ecto was designed for business/application logic (and many operations are structured with this goal in mind).

vegabook · October 3, 2023, 10:02pm

So the issue is, let’s say you have a universe of 500 000 possible securities which is I believe a lower bound on what the bloomberg terminal is capable of showing. Each security has tons of metadata associated with it. Also, for say, a stock index, or a yield curve, many tickers must be joined up in “groups”. So as you can see we’re starting to have transaction-like OLTP workflows here in addition to the ticker storage columnar store.

We have:
ticker_timeseries_data - many_to_one - ticker
ticker - many_to_many - ticker_groups

We may also, when we use NX in a “live” condition (streaming inward data), have analyses that must be performed live too. So we need to keep track of those

analylsis - many_to_many - ticker
analysis - many-to_many - ticker_group

This looks a lot like “orders - customers” or “users - roles” for which Ecto is great. But it’s not great for columnar data. And the exact converse for ADBC.

So I basically have to have two database abstractions in my app. This is because unlike Python and R, which own “static” data science, the big pitch IMO for Elixir is “live” data science, with streaming data. But that’s a lot more complex situation in many cases than “static” where it’s often one-off style workflows. Live is inevitably going to have more structure to the application which is where Ecto can come in. Also, Phoenix Liveview is miles ahead of anything (except maybe ObservableHQ) in this regard.

Given the whole Elixir / BEAM stack’s way-better story on live applications than PYthon or R, I’m ensuring the bloomberg stuff I’m writing is geared that way too (in addition to “batch” history). Rblpapi for example has no concept of “live” subscription data, and in Python it’s hard.

Anyway I thought it was worthwhile elucidating my thoughts on this. I’ll for now go with Ecto + ADBC.

josevalim · October 3, 2023, 10:43pm

Interesting… couldn’t Explorer joins help here? Explorer.DataFrame — Explorer v0.7.1