Bindable - General purpose for-comprehension at your service

iamafanasyev · May 20, 2023, 2:01pm

While playing around with public API of Resource I stuck with composability problem (also got a topic on it). I was wondering what should public API look like to expose composable resources (to work with several resources while the library handles all the safe acquire/release hustle).

# Can we do better?
Resource.use!(ra, fn a ->
  Resource.use!(rb, fn b ->
    Resource.use!(rc, fn c ->
      f(a, b, c)
    end)
  end)
end)

There is already syntactic sugar to tackle such problem in a functional programming world — for-comprehension (or e.g. its cousin do-notation). It allows you to write effectful code in an imperative style. Elixir already has for-comprehension:

for x <- [1, 2, 3], x < 3,
    y <- [4, 5, 6], y > 4 do
  {x, y}
end
# [{1, 5}, {1, 6}, {2, 5}, {2, 6}]

Which in fact is just a:

Enum.flat_map(Enum.filter([1, 2, 3], fn x -> x < 3 end), fn x -> 
  Enum.flat_map(Enum.filter([4, 5, 6], fn y -> y > 4 end), fn y -> 
    [{x, y}]
  end)
end)
# [{1, 5}, {1, 6}, {2, 5}, {2, 6}]

However, it works only for lists (and that’s fine, it has list-specific functionality, e.g. :uniq option, or :reduce option).

So the library exposes “general purpose” for-comprehension. It work with any kind of “monadic container” (and doesn’t have a specific options to any particular one):

require Bindable.ForComprehension

Bindable.ForComprehension.for {
  x <- [1, 2, 3], if(x < 3),
  y <- [4, 5, 6], if(y > 4)
} do
  {x, y}
end
# [{1, 5}, {1, 6}, {2, 5}, {2, 6}]

Due to the fact, that under the hood it is nothing more (up to some details) than just a flat_map(m(a), (a -> m(b))) :: m(b) you always end up with “containered-value”. In other words, for-comprehension is a way to construct a new (monadic) value.

That’s why as a nice bonus you also get a facility to lazily(!) construct a new streams (Elixir’s for eagerly evaluates enumerables, as it uses Enum.flat_map/2 under the hood):

Bindable.ForComprehension.for {
  x <- Stream.map(1..5, fn x -> if(x > 2, do: (raise "boom"), else: x) end),
  y <- Stream.map(1..5, fn y -> if(y > 2, do: (raise "boom"), else: y) end)
} do
  {x, y}
end |> Enum.take(2)
# [{1, 1}, {1, 2}]

And finally:

Bindable.ForComprehension.for {
  a <- ra,
  b <- rb,
  c <- rc
} do
  f(a, b, c)
end

“Scala-like” implementation was selected intentionally for two main reasons:

it is already known and well established syntax;
it solves a “variadic-problem” (from compiler’s perspective we define macro accepting two arguments: a tuple and a “do-block”).

More notable implementation insides could be found on Bindable (hex)
Bindable (GitHub)

hst337 · May 20, 2023, 8:17pm

I didn’t get what are the features and why can’t I just use Stream for every lazily evaluated enumerable?

All of your examples can be rewritten using Stream (or Enumerable in case of two streams’ product)

hst337 · May 20, 2023, 8:22pm

And this tuple syntax is unique across all Elixir’s macros and will be confusing for elixir developers. I suggest nested structure to imitate variadic function like

quote do
  bindable for x <- xs, y <- ys, z <- zs do
    x + y + z
  end
end

Here you just need to implement the bindable/2 macro and parse variadic for call as an AST

hst337 · May 20, 2023, 8:25pm

But my last suggestion would be to implement Stream.product function in the PR to the elixir-lang instead of creating custom macro with unfamiliar (to elixir devs) syntax.

This would solve the problem with for-style lazy evaluation as a product of two enumerables and it would be compatible with both elixir language design and Stream API

Anyway, great work!

iamafanasyev · May 20, 2023, 11:22pm

Thanks for the feedback!

That is not about Streams at all. As I mentioned, Stream composition comes as a bonus.

Anyway, let’s talk Streams: Elixir’s for is a great facility to create a new lists (well, up to :reduce feature, using which you can obtain any value). And one of the great option of it is “contextual” filtration. And the thing is:

for a <- stream_of_as,
    b <- stream_of_bs,
    a < b,             # accessing values from another stream
    c <- stream_of_cs,
    a + b + c < 10 do
  a + b + c
end

can not be reproduced in general with Stream.product, as it works on independent streams:

Stream.product([
  Stream.filter(1..5, fn a -> a < 3 end),
  Stream.filter(5..1, fn b -> b > 3 end),    # only accessing values from the current stream
  Stream.take(1..5, 3)
])
|> Stream.filter(fn {a, b, c} -> a + b + c < 10 end)
|> Stream.map(fn {a, b, c} -> a + b + c end)

or if you have only two streams combinator:

Stream.product(
  Stream.filter(1..5, fn a -> a < 3 end),
  Stream.product(
    Stream.filter(5..1, fn b -> a > 3 end),
    Stream.take(1..5, 3)
  )
)
|> Stream.filter(fn {a, {b, c}} -> a + b + c < 10 end)
|> Stream.map(fn {a, {b, c}} -> a + b + c end) # looks scary :)

So I don’t think Stream.product would close the gap as a Stream combinator (at the end of the day, for is a way more readable option to construct new values).

iamafanasyev · May 20, 2023, 11:32pm

The main point: for facility not only for lists.

If you have a data type, that “behaves flat-mappable”, you can use it inside for.

iamafanasyev · May 20, 2023, 11:36pm

Clever trick, I like it!

(but we still need to use parentheses to explicitly bound the only bindable argument)

hst337 · May 21, 2023, 10:19am

What about

Stream.product(as, bs)
|> Stream.filter(fn {a, b} -> a < b end)
|> Stream.product(c)
|> Stream.fitler(fn {{a, b}, c} -> a + b + c < 10 end)
|> Stream.map(fn {{a, b}, c} -> a + b + c end)

?

That’s mostly because you’re comping from Scala. Elixir developers use Streams for lazy collections all the time.

for is not for lists only. It accepts any enumerable, but produces any Collectable you’re specifiying in option into. While your for solution always produces a composition of something your custom protocols implement. I undestand that this is scala-like approach, and it looks really interesting to play with, and I definitely have nothing against this approach, but I am just noticing that

Elixir is not lazily evaluated
For lazy collections developers usually use Stream in Elixir
Elixir developers do not use lazy collections for resource management. The most common approach here is to implement monitoring process for resource management.

And with this in mind, your ideas will just be alien to the ecosystem, since they do not match regular development style. You can take a look at witchcraft and algae projects which brought algebraic typing ideas to Elixir. These libraries became a fun tools to play with, not a widely adopted practice for the community and the ecosystem.

Any implementation of algebraic types in Elixir comes across a well-know set of problems like

Extremely low performance, caused by the lack of compile-time type analysis based optimizations in the language. This means there will be protocol dispatch and map key access for every operation with this structure in runtime.
Steep learning curve, since Elixir devs are used to dynamic typing with composition of simple primitives
Pattern-matching, since there is no way to pattern match on structures whose internals are encapsulated and interactions are hidden behind the protocols.

While your library certainly gives an unusual (for Elixir ecosystem) approach to solve these problems with lazy evaluation and it is interesting to talk about, I can only suggest you to get a grasp of Elixir’s approach to these problems in return.

For example, for resource management in Elixir we usually use something like this

def at_resource(func) do
  Process.flag :trap_exit, true

  # Open the resource
  resource = open_resource()

  # Spawn the worker
  pid = spawn_link(fn -> func.(resource) end)
  receive do
    # Worker finished normally or exited abnormally
    {:EXIT, ^pid, reason} ->
      :ok

    # It takes too much time, which means that the worker is in
    # infinite loop or just stuck or unreachable
    after :timer.hours(1) ->
      Process.exit(pid, :kill)
  end

  # Close the resource as soon as the worker finishes or dies
  close_resource(resource)
end

This kind of approach is more fault tolerant because

It handles infinite loops during resource acquisition. If the function (func) using resource is taking more than 5 seconds, it will be forcibly terminated
It handles stack overflow and any other memory exhaustion issue, because as soon the working process hits the memory limit, it will be killed by the VM (or VM will crash, it’s configurable) and the supervising process will close the resource
It handles hardware outages (sic!!!), because we can spawn the working process on the separate node, and when the separate node has hardware outage, we will receive either exit signal or a timeout in receive

For more on resource management, I’d suggest reading libraries NimblePool and DBConnection since they’re doing exactly what I’ve described above.

iamafanasyev · May 22, 2023, 5:27am

Really appreciate such a deep and comprehensive reply. And I’m inlined with almost all of it. I’ll try to make a more refined statement to reason about:

There is nothing about lazy evaluation of anything in the library, as well as adopting some external abstraction in the first place (nor addressing performance aspects). It’s only DX. Initially I was addressing the problem of copy-pasting and refactoring the same code-approaches across our projects (e.g. time bounded synchronous expression evaluation or “singleton-resource” management). Stick with Elixir’s kernel (primary building blocks) as much as possible was a must. As a result I ended up poking around:

Stream.resource/3, which is (from the-day-one) a way to “handle resources” of the “stream-nature” (so it’s like 2-in-1: Elixir’s standard library resource management + acquired resource content streaming);
for as a “composition facility” for enumerables (as I mentioned before, I started with some kind of Resource.use_all!/2, until I realised, for is a more natural candidate as it does exactly the same for lists as a syntactic sugar).

And all this “composable hustle” was just to refactor these patterns into separate library, so they target one concept at a time in a general way, so no preconditions nor assumptions on the nature of resources was made (should it be a file, database connection pool or whatever).

And that is where our points of view diverge:

you can view Stream.resource/3 as a stream in the first place, but with “resource management” baked in;

being able to:

Stream.resource(
  fn -> File.open!("sample") end,
  fn file ->
    case IO.read(file, :line) do
      data when is_binary(data) -> {[data], file}
      _ -> {:halt, file}
    end
  end,
  fn file -> File.close(file) end
)

but do-it-by-yourself for resources of “non-stream-nature” every time you need it (no matter how: using at_resource-approach or adopting Stream.resource/3);

you can view for as an exclusive facility for enumerables;

and at the same time being able to:

for a <- as, b <- bs, a < b, c <- cs, a + b + c < 10, do: a + b + c

for lists, but:

Stream.product(as, bs)
|> Stream.filter(fn {a, b} -> a < b end)
|> Stream.product(c)
|> Stream.fitler(fn {{a, b}, c} -> a + b + c < 10 end)
|> Stream.map(fn {{a, b}, c} -> a + b + c end)

for streams, although for also accepts streams (when you really need a Stream as a result).

So these libraries are nothing more than to view these points as a some sort of asymmetry:

being able to use Elixir’s kernel “resource management” facility (Stream.resource/3) without “stream-nature” assumption on the acquired resource content (which could be implemented using at_resource-approach);
being able to use for as a for-comprehension (just as its name suggests , I’m totally ok with Kernel.SpecialForms.for/1).

mudasobwa · May 22, 2023, 6:08am

FWIW, with LazyFor v1.0.1 — Documentation one might do

users = [user: "john", admin: "meg", guest: "barbara"]
result = stream {type, name} when type != :guest <- users do
  String.upcase(name)
end
Enum.to_list(result)
#⇒ ["JOHN", "MEG"]

mudasobwa · May 22, 2023, 6:18am

Kernel.SpecialForms.for/1 is translated to erlang comprehension not to flat_map/2+filter/2.

mudasobwa · May 22, 2023, 6:20am

pixels = <<213, 45, 132, 64, 76, 32, 76, 0, 0, 234, 32, 15>>
for <<r::8, g::8, b::8 <- pixels>>, do: {r, g, b}
#⇒ [{213, 45, 132}, {64, 76, 32}, {76, 0, 0}, {234, 32, 15}]

iamafanasyev · May 22, 2023, 9:52am

Nice!

However, we can discuss this case, since streams are lazy:

λ (stream x <- [1, 2], x > 2, y <- [0], do: x * y) |> Enum.to_list                 
[]
λ (stream x <- [1, 2], x > 2, y <- [raise "boom"], do: x * y) |> Enum.to_list
** (RuntimeError) boom

hst337 · May 22, 2023, 10:35am

That’s a common misconception. for is always translated into Enum.reduce and :lists.reverse. You can check the debug info in the beam file

iex(1)> defmodule X do
...(1)>   def f(list) do
...(1)>     for x <- list, do: x
...(1)>   end
...(1)> end
{:module, X,
 <<70, 79, 82, 49, 0, 0, 5, 220, 66, 69, 65, 77, 65, 116, 85, 56, 0, 0, 0, 206,
   0, 0, 0, 22, 8, 69, 108, 105, 120, 105, 114, 46, 88, 8, 95, 95, 105, 110,
   102, 111, 95, 95, 10, 97, 116, 116, 114, ...>>, {:f, 1}}
iex(2)> {_, _, b, _} = v
iex(3)> :beam_lib.chunks b, [:abstract_code]
...
       {:function, 2, :f, 1,
        [
          {:clause, 2, [{:var, 2, :_list@1}], [],
           [
             {:call, 3, {:remote, 3, {:atom, 3, :lists}, {:atom, 3, :reverse}},
              [
                {:call, 3, {:remote, 3, {:atom, 3, Enum}, {:atom, 3, :reduce}},
                 [
                   {:var, 3, :_list@1},
                   {nil, 3},
                   {:fun, 3,
                    {:clauses,
                     [{:clause, 3, [{:var, 3, ...}, {:var, ...}], [], [{...}]}]}}
                 ]}
              ]}
           ]}
        ]}
...

I have a project which optimizes such behaviour: GitHub - hissssst/tria: Optimizing compiler for Elixir

hst337 · May 22, 2023, 10:41am

I got your point, and I’ve provided an example to show why nobody uses Stream.resource for managing resources and why it’s actually a bad idea. Stream.resource is an optimistic way to manage resources, and it has a lot of flaws (like leaving resource open in case of infinite recursion bug, memory limit, etc.) . As I’ve said, an idiomatic way to manage resources is using a separate monitoring process.

iamafanasyev · May 23, 2023, 2:35am

Seems like we can safely refactor timeout-divergence into separate risk. Say we have some facility to run the expression evaluation within the provided timeout, so we can always apply time bound externally (e.g. on the client side). By the way, using Stream.resource/3/try-rescue in WithTimeout looks safe, since the only thing performed on the resource (Task.Supervisor) is spawning task to await.

Good insights. For example, the one on a data copying concern (addressed in DBConnection.Holder). Again as a sidenote — Stream.resource/3/try-rescue does everything in-place, so no cost on copy messages.

That is tough! Agree, an existing library for such case would be Elixir (Erlang) itself.

To sum up on fault tolerance:

While Resource uses Stream.resource/3 under the hood, it could be seen as a solution for the same class of problems, one can approach with Stream.resource/3 (up to resource streaming necessity);
timeout-divergence could be addressed on demand (applying time bounds on the resource usage).

And yes, Resource is not the way you should approach highly specific resource management domains (e.g. HTTP connection pools). Now I see, that Resource’s name could be slightly misleading in such a context. Its focus is mainly not on the resource object itself (there could be no “object” at all), but on the “bracketing-the-usage” ceremony (do this on acquire, do that on release).

But I’ll try to dig more on Stream.resource/3 backend alternative.

iamafanasyev · December 11, 2023, 3:32pm

Hey, guys, I’m back

Thanks for @hst337 good points, I was able to release 1.0.0 and 1.1.0 versions of the library:

1.0.0 introduces an Elixir way to write for-comprehension (yep, macro application to Kernel.SpecialForms.for/1 solves the variadic-problem);
1.1.0 extends 1.0.0 with pattern matching filtration semantics across generated values (that was really fun challenge to implement saving all native compile-time warnings[1]).

So now the main focus of these releases was on extending already existing Elixir’s for-comprehension syntax to work with “other-for-comprehendable-structures” and both syntactically and effectively be as close as possible to Kernel.SpecialForms.for/1 on List’s. As a result I end up with identical to Elixir’s for-comprehension syntax up to macro application of cause (well and Kernel.SpecialForms.for/1’s exclusive reduce syntax/semantics, there is no such in bindable-for):

import Bindable.ForComprehension

xs =  [[], [2, 2], [3], [4]]
xys = [[1, 1], [2, 2], [3, 3], [4, 4]]
bindable for [x] <- xs,      # pattern matching guard
             [^x, y] <- xys, # pattern matching guard with ^ operator
             y > 0,          # "conventional" guard
             do: [x, y]
# [[3, 3], [4, 4]]

As always happy to hear your thoughts and critics on the subject, cheers

[1] — match?/2 may generate unused variable warnings, when the pattern contains (in some sense) “free” variables, e.g. match?([x], [42]) or simply match?(x, 42); function head definitions could also generate warnings about “unreachable head”, e.g. fn x -> 42; _ -> :causes_warning end; so the answer was with/1, that provides pattern matching, assigns and a (do-)scope to work across.