Babel - Data transformations made easy

Babel was born out of a desire to simplify non-trivial data transformation pipelines. To focus on the “happy path” instead of having to write a bunch of boilerplate error handling code.

But don’t listen to me, take a look for yourself:

pipeline =
  Babel.begin()
  |> Babel.fetch(["some", "nested", "path"])
  |> Babel.map(Babel.into(%{atom_key: Babel.fetch("string-key")}))

data = %{
  "some" => %{
    "nested" => %{
      "path" => [
        %{"string-key" => :value2},
        %{"string-key" => :value2},
        %{"string-key" => :value2}
      ]
    }
  }
}

Babel.apply(pipeline, data)
=> {:ok, [
   %{atom_key: :value1},
   %{atom_key: :value2},
   %{atom_key: :value3}
]}

Since you’ll most likely build non-trivial transformation pipelines with Babel - which can fail at any given step - Babel ships with elaborate error reporting:

Error Reporting
pipeline =
  Babel.begin()
  |> Babel.fetch(["some", "nested", "path"])
  |> Babel.map(Babel.into(%{atom_key: Babel.fetch("string-key")}))

data = %{
  "some" => %{
    "nested" => %{
      "path" => [
        %{"unexpected-key" => :value1},
        %{"unexpected-key" => :value2},
        %{"unexpected-key" => :value3}
      ]
    }
  }
}

Babel.apply!(pipeline, data)

Which will produce the following error

** (Babel.Error) Failed to transform data: [not_found: "string-key", not_found: "string-key", not_found: "string-key"]

Root Cause(s):
1. Babel.Trace<ERROR>{
  data = %{"unexpected-key" => :value1}

  Babel.fetch("string-key")
  |=> {:error, {:not_found, "string-key"}}
}
2. Babel.Trace<ERROR>{
  data = %{"unexpected-key" => :value2}

  Babel.fetch("string-key")
  |=> {:error, {:not_found, "string-key"}}
}
3. Babel.Trace<ERROR>{
  data = %{"unexpected-key" => :value3}

  Babel.fetch("string-key")
  |=> {:error, {:not_found, "string-key"}}
}

Full Trace:
Babel.Trace<ERROR>{
  data = %{"some" => %{"nested" => %{"path" => [%{"unexpected-key" => :value1}, %{"unexpected-key" => :value2}, %{"unexpected-key" => :value3}]}}}

  Babel.Pipeline<>
  |
  | Babel.fetch(["some", "nested", "path"])
  | |=< %{"some" => %{"nested" => %{"path" => [%{"unexpected-key" => :value1}, %{...}, ...]}}}
  | |=> [%{"unexpected-key" => :value1}, %{"unexpected-key" => :value2}, %{"unexpected-key" => :value3}]
  |
  | Babel.map(Babel.into(%{atom_key: Babel.fetch("string-key")}))
  | |=< [%{"unexpected-key" => :value1}, %{"unexpected-key" => :value2}, %{"unexpected-key" => :value3}]
  | |
  | | Babel.into(%{atom_key: Babel.fetch("string-key")})
  | | |=< %{"unexpected-key" => :value1}
  | | |
  | | | Babel.fetch("string-key")
  | | | |=< %{"unexpected-key" => :value1}
  | | | |=> {:error, {:not_found, "string-key"}}
  | | |
  | | |=> {:error, [not_found: "string-key"]}
  | |
  | | Babel.into(%{atom_key: Babel.fetch("string-key")})
  | | |=< %{"unexpected-key" => :value2}
  | | |
  | | | Babel.fetch("string-key")
  | | | |=< %{"unexpected-key" => :value2}
  | | | |=> {:error, {:not_found, "string-key"}}
  | | |
  | | |=> {:error, [not_found: "string-key"]}
  | |
  | | Babel.into(%{atom_key: Babel.fetch("string-key")})
  | | |=< %{"unexpected-key" => :value3}
  | | |
  | | | Babel.fetch("string-key")
  | | | |=< %{"unexpected-key" => :value3}
  | | | |=> {:error, {:not_found, "string-key"}}
  | | |
  | | |=> {:error, [not_found: "string-key"]}
  | |
  | |=> {:error, [not_found: "string-key", not_found: "string-key", not_found: "string-key"]}
  |
  |=> {:error, [not_found: "string-key", not_found: "string-key", not_found: "string-key"]}
}
8 Likes

If you’re wondering whether or not Babel is production-ready: we’ve been using a pre-release version for nearly a year at this point at work in production, and it made external API integrations a lot easier and smoother. :wink:

3 Likes

babel can be misinterpreted as the popular js transpiler

1 Like

Sorry, but your example does not looks really good …

%{
  "some" => %{
    "nested" => %{
      "path" => [
        %{"string-key" => :value2},
        %{"string-key" => :value2},
        %{"string-key" => :value2}
      ]
    }
  }
}
|> get_in(~w[some nested path])
|> Enum.map(&%{atom_key: &1["string-key"]})
  1. Shorter syntax
  2. No dependencies
  3. I guess that the Elixir language and related hex packages maintained by core team members are as fast as other packages if not faster

Later in the main module’s documentation the Error handling is mentioned, but in this case all we have to do is to change a getter function like:

%{
  "some" => %{
    "nested" => %{
      "path" => [
        %{"string-key" => :value2},
        %{"string-keyy" => :value2},
        %{"string-key" => :value2}
      ]
    }
  }
}
|> get_in(~w[some nested path])
|> Enum.map(&%{atom_key: Map.fetch!(&1, "string-key")})
  1. Do exactly same error handling
  2. Uses well known KeyError and it’s message
  3. Babel.fetch/1 is confusing as it does not uses a trailing bang naming convention

So far I did not found anything in documentation which is not handled by Enum, Map, List, Kernel and Access modules.

I agree, the simple example I laid out can all be done easily with builtin functions.

But Babel isn’t meant to be used to simple data transformations. It’s meant to be used for non-trivial transformations. But putting that in a code example would make it a lot harder to grok and fail in it’s task to clarify the basic usage. Babel’s strength is in its composability, combining many simple elements into something that’s no longer simple but still understandable.

Doing these non-trivial transformations with standard library mechanisms is feasible but if you don’t want to raise and instead collect errors and be able to explain them, it becomes very complex - and also very hard to read - very quickly.

I’m planning to add a LiveBook which showcases how to use Babel to transform responses from GitHub’s GraphQL, which will lay out how the library shines for these non-trivial transformations.

1 Like

Just to give an example: here’s an excerpt from how we use Babel at work. I’ve changed model names and omitted some functions but the basic idea should be clear. The strength I’d argue is that it’s composable and readable in how it specifies the desired outcome, and if it fails at any given step it will tell you exactly what went wrong:

@spec list_of_pieces(opts) :: Babel.t([Piece.t()])
def list_of_pieces(opts) do
  :pieces
  |> Babel.begin()
  |> Babel.fetch(["contentCollection", "items"])
  |> Babel.map(piece(opts))
end

@spec piece(opts) :: Babel.t(Piece.t())
def piece(opts) do
  Babel.into(%Piece{
    id: cms_id(),
    locale: locale(),
    name: Babel.fetch("name"),
    is_locked:
      Babel.then(fn
        %{"isPremium" => true} -> opts[:lock_premium?]
        %{"isPremium" => _} -> false
      end),
    category: category(),
    audio:
      :audio
      |> Babel.begin()
      |> Babel.fetch(["audioCollection", "items"])
      |> Babel.match(fn
        [_ | _] -> Babel.at(0) |> Babel.chain(audio())
        [] -> Babel.const(nil)
      end)
  })
end

This can then be used to parse like this:

Babel.apply(list_of_pieces(opts), data)

Or if you only want to parse a single Piece:

Babel.apply(piece(opts), data)
1 Like

I’m not sure what to say to that. I guess it can. :person_shrugging:

It would be perfect if you could give us also some example data and expected output as so far it looks simple as well, see:

defmodule Piece do
  defstruct ~w[audio category id is_locked locale name]a
end

defmodule Example do
  def cast_pieces(data, opts) do
    data
    |> get_in(~w[contentCollection items])
    |> Enum.map(&%Piece{
      # or in case nil is acceptable for audio key:
      # get_in(data, ["audioCollection", "items", Access.at(0)]) && audio()
      audio: cast_audio_collection_items(data),
      category: category(),
      id: cms_id(),
      is_locked: &1["isPremium"] == true && opts[:lock_premium?],
      locale: locale(),
      name: Map.fetch!(&1, "name")
    })
  end

  defp cast_audio_collection_items(piece) do
    piece
    |> Map.fetch!("audioCollection")
    |> Map.fetch!("items")
    |> then(&List.first(&1) && audio())
  end

  defp audio, do: :ding
  defp category, do: :pieces
  defp cms_id, do: System.unique_integer(:positive)
  defp locale, do: :en
end

For casting a complex structs ecto should be more than enough. Perhaps somebody reading this topic may not even heard about @field_source_mapper in ecto.

As I mentioned before, I’m working on a LiveBook example.

But beyond that I can only repeat what I wrote earlier:

Doing these non-trivial transformations with standard library mechanisms is feasible but if you don’t want to raise and instead collect errors and be able to explain them, it becomes very complex - and also very hard to read - very quickly.

Maybe the confusion stems from this?

Babel does not raise unless you call apply!/2 (which calls apply/2 under the hood and raises in case it returns an error). At no point during the transformation exceptions are involved. It’s all results that get accumulated.

Your example is actually a good one to compare against because it’s largely optimized for the happy path. Defaulting to nil values is not something that flies in production. Neither is using Map.fetch!/2 from my experience. Those are cases where gracefully handling errors is preferred. Babel is optimized for exactly this. That’s where most of the work is. But you get that for free.

A comparable solution would check at every step if an error occurs, collect it, and return the accumulated error. It would be able to explain why that step failed, what the step was, and what the input data was. That version of the code will be a lot more complex. Often enough code is written that only focuses on the happy path, which then becomes hard to debug when something does not work. I’d like to refer to the initial error handling example to point out how much information Babel provides in error cases. You could say that Babel not only transforms but also asserts on the expected data shape.

Ecto is excellent when you have control over the shape of the incoming data. If you don’t, you will have to write a non-trivial amount of code to transform the data into a shape ecto understands. If Ecto fits your use-case: great, use that. But Ecto and Babel do not cover the same use-cases.

1 Like

What I’m taking away from this, is that more examples would be helpful and a comparison against libraries like Ecto.

2 Likes

This one was only for the example you shared as you said it’s more complex one it’s used in production, so I have tried to re-produce it. :smiley:

I didn’t have a problem with that at all, but who am I to speak about it when Gentoo was my first Linux distribution. :crazy_face:

Perhaps you mean cases like Map.fetch/2 returning :error atom? It could be solved with a simple wrapper function.

def map_fetch(map, key) do
  case Map.fetch(map, key) do
    {:ok, result} -> {:ok, result}
    :error -> {:error, %Wrapper.MapKeyError{key: key, map: map}}
  end
end

All Elixir core features including debugging is more than enough for me and I rarely need to write any wrapper. As said if Elixir core is not enough at most I have used ecto.

Ignore that, I got something wrong when quickly browsing documentation. :sweat_smile:

That’s a good point … for a Rust developers. :crab:

I’m writing pipelines in pure Elixir supporting not single, but all “happy paths” and either “let it default” to nil in case some part of data is expected to be nullable or let it fail when it’s not supported. Especially the “let it fail” is Erlang and Elixir concept, so speaking generally about “happy paths” have not sense in general context in such languages. Therefore at the very end the code is very specific to your needs. I believe that your package may therefore be or not be used even in very similar cases. Just speaking it’s a good for “Rust” was 50-50 for fun. :joy:

Sounds like latest Elixir improvements like dbg macro, improved error messages and using part of code in error output. I guess that I personally would be more interested in such package before mentioned improvements, but at the end I would stop use said package after that improvements would be added. :thinking:

Again who am I to speak about that when in many answers I gave few possible solutions where the biggest one is based only on the pattern-matching. I have no idea, but for some reason I like how things works now. :+1:

Oh, at first I did not expanded it, so I did not saw it. Hmm …

Too long! Shorten it into 5 words!

:joy:

More seriously … it’s really long and maybe that’s because it somehow reminds me a terrible long stacktrace in JVM-based languages. I’m exhausted by just seeing and XX more at the bottom of already too long stacktrace. Again I like how things works now, so I’m not sure if I can see a real-world usage in my personal case. :see_no_evil:

Why you mention control? You have only control if your app is generating the data, right? As said for all those years I have used pure Elixir and at most ecto to cover almost all if not all cases of casting a data from for example JSON API response.

What? You are developing the “shape” as same as you are creating structs. Regardless if data have proper format or not you simply pass "audioCollection" value to changeset function and in worst case you use some mapper like the default one supported by ecto i.e. @field_source_mapper mentioned previously.

Yeah, I guess that I did not get non-ecto use cases.

I can see that posting:

Data transformations made easy.

without explaining what exactly use-cases you mean is really confusing. That makes me only more interested to hear “your story”. :thinking:

Maybe answering such questions could help you describe your point of view more easily:

  1. Could you please describe uncommon data sources from your use cases?
  2. Could you please describe by example where casting data by ecto starts causing trouble?
  3. Why things like @field_source_mapper are not enough for a data with different structure?

Edit: When writing all of this I got some idea. If it’s not “something like” JSON API that have a well known structure then the source could have a dynamic structure. Ok, but how to parse a dynamic structure? How about creating a complex html form for dynamically querying data of unknown for the app structure (but known by user)? That’s interesting, but how such data would be used? How about using dynamic rules to fetch dynamic data in order to present it in a predictable format like a graph? Am I going into the right direction or by any change I have messed it up completely? Maybe I think about it too much and there is much simpler use case … :smiling_imp: