Using dagger Elixir SDK to refactor a script

In this post I’d like to share my experience working with Elixir SDK for dagger. dagger positions itself as a “programmable CI”, but I used it to re-write a shell script: using Elixir SDK for dagger, I was able to simplify my write-only fish-shell script significantly.

I’m impressed with the results, and intent to use it as my go-to approach for future scripts involving code that needs to be isolated in a Docker image or container.

The features Elixir SDK for dagger I used are:

  • building images,
  • running commands in images,
  • copying files between images, and into a host,
  • downloading files from the URL

The coolest things I found about doing it via dagger are:

  • caching by default,
  • ability to implement the script fully locally,
  • minimal external dependencies (only elixir command + running docker are needed).

The API reference for all possible features can be found here: Dagger GraphQL API Reference. A plethora of examples can be found here - it’s a long page, just make sure to look at the list to the right: Cookbook | Dagger. It includes things like:

  • cloning Git repos,
  • setting ENVs and attaching secrets,
  • integrating with things like GitHub Actions, and much more,
  • re-using existing Dockerfiles,
  • and much more

It’s worth noting that as of this moment, the SDK is not released on Hex, but can still be quite easily installed & used from sources.

Kudos to @wingyplus for implementing the Elixir SDK! :clap:

So first, the context.


A few weeks ago I was working to change a base for Docker image used in one of our services from debian to alpine. Upon testing the resulting image, I discovered that ghostscript utility was giving a result that’s different from what I’ve seen before. In order to be able to discuss this issue with my colleagues, I needed to find out what causes this issue. This led me to work on a minimal reproduction example to showcase the difference between “before” and “after”.

The minimal reproduction example was made using a script in fish, that invoked various docker commands to achieve the result. It looked like this:

Script using bash and docker
#!/usr/bin/env fish

set gs_command gs \
  -sDEVICE=png16m \
  -dGraphicsAlphaBits=4 \
  -dTextAlphaBits=4 \
  -r150 \
  -o /tmp/gs-test/page-%03d.png \
  /tmp/gs-test/Arizona_Dream.pdf

set download_file_command curl \
  --silent \
  --output /tmp/gs-test/Arizona_Dream.pdf \
  https://en.wikipedia.org/api/rest_v1/page/pdf/Arizona_Dream

mkdir -p /tmp/gs-test-1
docker run --interactive --detach --rm --name gs-test-1 --mount type=bind,source=/tmp/gs-test-1,target=/tmp/gs-test debian:bullseye-20230522 bash
docker exec gs-test-1 bash -c 'apt update; apt install -y curl ghostscript'
docker exec gs-test-1 $download_file_command
docker exec gs-test-1 $gs_command

mkdir -p /tmp/gs-test-2
docker run --interactive --detach --rm --name gs-test-2 --mount type=bind,source=/tmp/gs-test-2,target=/tmp/gs-test alpine:3.17 ash
docker exec gs-test-2 ash -c 'apk add ghostscript'
docker exec gs-test-2 $download_file_command
docker exec gs-test-2 $gs_command

mkdir -p /tmp/gs-test-result
docker run \
  --interactive \
  --detach \
  --rm \
  --name gs-test-result \
  --mount type=bind,source=/tmp/gs-test-1,target=/tmp/gs-test-1 \
  --mount type=bind,source=/tmp/gs-test-2,target=/tmp/gs-test-2 \
  --mount type=bind,source=/tmp/gs-test-result,target=/tmp/gs-test-result \
  alpine:3.17 \
  ash

docker exec gs-test-result ash -c 'apk add --quiet --no-progress npm'
docker exec gs-test-result ash -c 'npm install -g pixelmatch'
docker exec gs-test-result ash -c 'pixelmatch /tmp/gs-test-1/page-001.png /tmp/gs-test-2/page-001.png /tmp/gs-test-result/diff.png 0.1'

docker stop gs-test-1
docker stop gs-test-2
docker stop gs-test-result 

open /tmp/diff.png

In essence, the script does 3 things:

  1. installs ghostscript utility in debian image & process a PDF file,
  2. installs ghostscript utility in alpine image & process a PDF file,
  3. compare results between steps 1 and 2, copy a diff into the host filesystem.

Shortly after the script was finished, I noticed a “Dagger_ex - Dagger SDK for Elixir” topic and wanted to see if I could re-write my script in Elixir using the SDK.

The only pre-requisite to starting it to have Docker daemon running, as well as Elixir installed.


First, I needed to bootstrap the SDK. It can be done like this:

Mix.install([{:dagger, github: "dagger/dagger", sparse: "sdk/elixir"}])

client = Dagger.connect!()

Last line can potentially time out, since behind the scenes it waits on a dagger Docker image, only if missing, to be pulled from the internet, which may take time on slow networks. If it times out, re-trying typically helps.

Now I needed to build a debian image and do some work there. Here’s how I do it:

gs_command = ~w(
  gs -sDEVICE=png16m
     -dGraphicsAlphaBits=4
     -dTextAlphaBits=4
     -r150
     -o /tmp/page-%03d.png
     /tmp/Arizona_Dream.pdf
)

download_file_command = ~w(
  curl --silent --output /tmp/Arizona_Dream.pdf https://en.wikipedia.org/api/rest_v1/page/pdf/Arizona_Dream
)

container1 =
  client
  |> Dagger.Query.container([])
  |> Dagger.Container.from("debian:bullseye-20230522")
  |> Dagger.Container.with_exec(~w(apt update))
  |> Dagger.Container.with_exec(~w(apt install -y curl ghostscript))
  |> Dagger.Container.with_exec(download_file_command)
  |> Dagger.Container.with_exec(gs_command)

Following pretty much the same pattern, I’m going to build an image for alpine and run a couple of commands there. It’s done like this:

container2 =
  client
  |> Dagger.Query.container([])
  |> Dagger.Container.from("alpine:3.17")
  |> Dagger.Container.with_exec(~w(apk add curl ghostscript))
  |> Dagger.Container.with_exec(download_file_command)
  |> Dagger.Container.with_exec(gs_command)

At this point in each image I have a bunch of files located at /tmp/page-%03d.png, e.g. 5 files (corresponding to each page) in each image. I’m interested only in the first of page. Instead of copying the files on host like it was done originally, I will grab references to the files I’m interested in, and use it to copy files in the final image:

sample1 = Dagger.Container.file(container1, "/tmp/Arizona_Dream.pdf")
sample2 = Dagger.Container.file(container2, "/tmp/Arizona_Dream.pdf")

Run 3rd container to perform comparison:

result =
  client
  |> Dagger.Query.container()
  |> Dagger.Container.from("alpine:3.18")
  |> Dagger.Container.with_file("/sample1.png", sample1)
  |> Dagger.Container.with_file("/sample2.png", sample2)
  |> Dagger.Container.with_exec(~w(apk add --quiet --no-progress npm))
  |> Dagger.Container.with_exec(~w(npm install -g pixelmatch))
  |> Dagger.Container.with_exec(["sh", "-c", "pixelmatch /sample1.png /sample2.png /diff.png 0.1 || true"])
  |> Dagger.Container.file("/diff.png")

Finally, copy diff.png to host system, as /tmp/diff.png:

Dagger.File.export(result, "/tmp/diff.png")

There’s one thing the bothered me in the original script, e.g. downloading the file using curl. First, curl needs to be installed. But also, I need to maintain a variable containing curl command.

This can be simplified by using, in the spirit of keeping the number of dependencies low, an Erlang HTTP client to download the file on the host + mounting it to each container, like this:

:inets.start()
:ssl.start()

url = 'https://en.wikipedia.org/api/rest_v1/page/pdf/Arizona_Dream'
headers = []

path_to_file = "/tmp/Arizona_Dream.pdf"

http_request_opts = [
  ssl: [
    verify: :verify_peer,
    cacerts: :public_key.cacerts_get(),
    customize_hostname_check: [
      match_fun: :public_key.pkix_verify_hostname_match_fun(:https)
    ]
  ]
]

{:ok, :saved_to_file} =
  :httpc.request(:get, {url, headers}, http_request_opts, [stream: String.to_charlist(path_to_file)])

original_file =
  client
  |> Dagger.Query.host()
  |> Dagger.Host.file(path_to_file)

container1 = Dagger.Container.with_mounted_file(container1, "/tmp/Arizona_Dream.pdf", original_file)
container1 = Dagger.Container.with_mounted_file(container2, "/tmp/Arizona_Dream.pdf", original_file)

This will work great, but running the code that depends on Dagger.Container.with_mounted_file/3 or Dagger.Container.with_file/3 (another API) will result to cache being invalidated after a step like this (typically, the same will happen after executing ADD or COPY statement in Dockerfile).

I remembered that ADD statement in Dockerfile supports an HTTP as source, so the above can be simplified to just a single line:

original_file = Dagger.Query.http(client, "https://en.wikipedia.org/api/rest_v1/page/pdf/Arizona_Dream")

What great about this, is that this will be completely cached, and there won’t be a need to re-download the file during the script runs.

After some polishing, the final script looks like this:

Final result
Mix.install([{:dagger, github: "dagger/dagger", sparse: "sdk/elixir"}])

alias Dagger.{Query, Container}

gs_command = ~w(
  gs -sDEVICE=png16m
     -dGraphicsAlphaBits=4
     -dTextAlphaBits=4
     -r150
     -o /tmp/page-%03d.png
     /tmp/Arizona_Dream.pdf
)

client = Dagger.connect!()
file = Query.http(client, "https://en.wikipedia.org/api/rest_v1/page/pdf/Arizona_Dream")

sample1 =
  client
  |> Query.container()
  |> Container.from("debian:bullseye-20230522")
  |> Container.with_exec(~w(apt update))
  |> Container.with_exec(~w(apt install -y ghostscript))
  |> Container.with_file("/tmp/Arizona_Dream.pdf", file)
  |> Container.with_exec(gs_command)
  |> Container.file("/tmp/page-001.png")

sample2 =
  client
  |> Query.container()
  |> Container.from("alpine:3.17")
  |> Container.with_exec(~w(apk add --quiet --no-progress ghostscript))
  |> Container.with_file("/tmp/Arizona_Dream.pdf", file)
  |> Container.with_exec(gs_command)
  |> Container.file("/tmp/page-001.png")

result =
  client
  |> Query.container()
  |> Container.from("alpine:3.18")
  |> Container.with_file("/sample1.png", sample1)
  |> Container.with_file("/sample2.png", sample2)
  |> Container.with_exec(~w(apk add --quiet --no-progress npm))
  |> Container.with_exec(~w(npm install -g pixelmatch))
  |> Container.with_exec(["sh", "-c", "pixelmatch /sample1.png /sample2.png /diff.png 0.1 || true"])
  |> Container.file("/diff.png")

Dagger.File.export(result, "/tmp/diff.png")

The before/after comparison can be seen below:

The advantages of writing this in Elixir is availability of Elixir standard library + ecosystem to leverage within the program and (subjectively) much more improved readability. But there are other cool things that think are worth mentioning again:

  • caching: code on the left not uses cache, while code on the right uses cache all the time,
  • efficiency: code on the left does unnecessary steps, while code on the right does only what needs to be done, and not more.

I hope this was helpful :bowing_man:

7 Likes