Livebook as a replacement for scheduled ETL?

felix-starman · April 3, 2024, 5:16pm

So I have been writing Elixir for the better part of a decade now and using Livebook to solve small problems for myself for daily operations but I am trying to encourage our data analyst to investigate it as well instead of recreating our ecto queries in SQL and then running them in bigquery which seems like a waste.

A lot of these need to be nightly scheduled queries, but I don’t want to require them to have to write oban jobs if they want just get started at doing something.

Has anyone come up with a way to run a livebook on a schedule without user input?

jonatanklosko · April 3, 2024, 6:03pm

Hey @felix-starman, we plan to explore scheduled execution in the future, most likely in relation to Livebook Teams, tracked in Allow scheduled execution of multi-session apps · Issue #2091 · livebook-dev/livebook · GitHub.

That said, you should be able to create a notebook or app that runs code periodically. Here’s a quick idea:

<!-- livebook:{"app_settings":{"slug":"cron"}} -->

# Cron

```elixir
Mix.install([
  {:kino, "~> 0.12.3"},
  {:quantum, "~> 3.5"}
])
```

## Section

```elixir
defmodule Scheduler do
  use Quantum, otp_app: :scheduler
end

defmodule Tasks do
  require Logger

  def hello() do
    Logger.info("Hello #{DateTime.utc_now()}")
  end
end

Application.put_env(:scheduler, Scheduler,
  jobs: [
    {"* * * * *", {Tasks, :hello, []}}
  ]
)

Kino.start_child(Scheduler)
```

In the task you could even do RPC into a remote node.

You can deploy that notebook as an app, so it is more “permanent”, and you most likely want it to run on a cloud instance. There are already options to deploy with Docker, and we are working on Livebook Teams, where you will be able to deploy an app like this into your infrastructure with a single click.

These are just some quick notes, this is definitely open to ideas and future features : )

felix-starman · April 3, 2024, 6:10pm

Fantastic. Thank you <3

As I’m unaware of the inner workings of Livebook and if/how modules are scoped, is it possible to reference the modules of other livebooks? That “feels” really gross to me as a dev, but I am imagining our data folks wanting to having a livebook per “job” that they would define everything in a module for, and the other sections of which would be for “manually” running it, but then a separate livebook containing the “cron” settings (whether Quantum or Oban, probably Quantum since there’s no postgres dependency)

The alternative of course is making a separate hex package and just using the :git / :github options, which is a viable alternative, but just has more churn.

EDIT: I should note that for some reason I thought deploying it as an app still required a user interaction/session for it to be evaluated/initialized, so that’s nice to know that is not the case.

jonatanklosko · April 3, 2024, 6:41pm

Notebooks are isolated, so there is no way to share modules between them.

In the notebook you could do this:

button = Kino.Control.button("Run hello") |> Kino.render()
Kino.listen(button, fn _ -> Tasks.hello() end)

if Kino.Hub.app_info().type != :none do
  Kino.start_child(Scheduler)
end

So there is an explicit button to run the task on demand, and the cron is only started when the notebook runs as an app (as opposed to just being open and edited). This way you can reuse the same notebook for both manual and scheduled. And this way it should be fine to use a single notebook that defines all the tasks, each of them can be run with a button, and there’s a single cron config for all of them.

If you want to define tasks in separate notebooks and have a single notebook with a cron, then the modules would need to be a dependency or perhaps they would live in a production node that the notebook RPCs into (which may be the way to go anyway, so that you can use all the application code/setup, but that depends on the exact use case).

larshei · July 23, 2024, 10:53am

I am very much interested in exactly this usecase as well.

Currently running a lot of “Pipelines” (ETL) in Azure, which are … not my personal favourite.

Looking to replace those with scripts or livebooks. We already have Livebook Teams available but unfortunately haven’t gotten to use it much within the team as of yet.

I think instead of scheduling we run a RabbitMQ consumer in each livebook and use messages to trigger runs.

That way, the livebook can be deployed permanently and only does work when required.

josevalim · July 23, 2024, 11:29am

Latest Livebook supports running a proxy/web server too, so you could also listen to HTTP requests and start workflows.

ghannam80 · July 23, 2024, 11:48am

I see livebook very well suitable to be like a host or ecosystem where many smaller components can interact with each other “Currently we have smart cells but maybe not 100% the same concept”.

But noting we already have Kino forms , cells to connect to database , send API requests , listen to Https requests , and maybe later Pub/Sub ,notifications,scheduler…etc things like building workflows by just connecting those components on top of livebook let us realize we are not using livebook to its potential.