Using Ash with DuckDB/MotherDuck: Possible and/or Useful?

alecStewart1 · January 30, 2025, 5:28pm

Hey friends!

I had this thought given an actual real world problem at work.

Generally whenever you have some kind of front-facing app that consumes data from somewhere, usually an API of sorts, if you’re using data from your own company or somewhere else the data the API uses is stored in some kind of database somewhere.

The most basic example is:

Hosted database → something like an API that’ll get data from it → application

And that’s actually fine for a lot of cases. Obviously there’s stuff like caching and backing up your data that has to be thought of, but the general flow of data is pretty simple in a broad sense.

However there’s one thing I thought of to consider, especially in a case at work.

What if the API is only used to get data from a database? If you’re not actually updating or removing anything from the database, this is maybe where you might have a usecase for using an OLAP. It’s optimized for analyzing data as opposed to transacting with it.

There’s several solution, but the one I’m focused on is DuckDB or MotherDuck (it’s DuckDB under the hood).

If I wanted to use Ash with Motherduck, Ash to model the entities for manipulating for analysis or whatever with the API, could I create a custom Data layer that will connection to a running instance of DuckDB? Or would this get a little hairy?

zachdaniel · January 30, 2025, 6:05pm

You can definitely do that kind of thing

The method would range from the very simple, using generic actions to just do “whatever you want”: Generic Actions — ash v3.4.60

To very complex, like writing a custom data layer. There are some examples out there of non-relational or unconventional data layers, like ash_csv, ash_cubdb and ash_sanity etc.

alecStewart1 · January 31, 2025, 3:28pm

Awesome! I think for our cases using generic actions might work, however I feel like it would be more elegant to use a custom data layer. duckdbex would likely be used for this. Looks like it’s not too hard, but I’d have to make sure I understand how everything works and not just copy paste stuff from other custom data layers.

However I did remember that there is a PostgreSQL extension for analytics, which embeds DuckDB into PostgreSQL, so maybe creating a postgrex extension like geo_postgis might be easier considering that it doesn’t seem like pg_analytics introduces any new types like PostGIS does.

I think it’ll be a matter of how we want to structure things, but thanks you, Zach! You’re always super helpful.