Merging Intervals the "Elixir Way"

saudade · May 11, 2017, 2:11am

Hi there! Newbie to the language looking for tips on how to write more idiomatic elixir.

The problem: Given a list of intervals (start, end), merge any overlapping intervals and return the new list.

Ex:

iex(1)> intervals = [%{start: 1, end: 2},%{start: 2,end: 3}, %{start: 4, end: 6}, %{start: 7, end: 10}, %{start: 9, end: 
12}]
iex(2)> Intervals.merge_intervals intervals
[%{end: 3, start: 1}, %{end: 6, start: 4}, %{end: 12, start: 7}]

Here’s what I’ve got working:

defmodule Intervals do
  def merge_intervals(intervals) do
    Enum.sort(intervals, &(&1.start <= &2.start)) 
    |> _merge_intervals([])
    |> Enum.reverse
  end

  defp _merge_intervals([], merged_intervals), do: merged_intervals
  defp _merge_intervals([%{end: cur_end} = cur, %{start: next_start} = next | tail], merged_intervals) when cur_end >= next_start do
    collapsed = _collapse_intervals(cur,next)
    _merge_intervals(tail, [collapsed | merged_intervals])
  end
  defp _merge_intervals([head | tail], merged_intervals), do: _merge_intervals(tail, [head | merged_intervals])

  defp _collapse_intervals(a, b), do: %{start: a.start, end: b.end}
end

Obviously looking for any no-brainer idiomatic improvements, but my one nagging question is if inclusion of the second merge_intervals definition is the right way to do things. It seems significantly more difficult to read (especially pattern matching to extract cur_end and next_start) than throwing a control-flow if into the body of the function definition. I know that pattern matching is powerful, but is there a compelling reason to use that functionality in a case like this one? Is there a better way to do this?

Thanks!

minhajuddin · May 11, 2017, 9:51am

Looks idomatic to me. A few changes I would make are:

Make the collapse interval inline. That should improve readability.
Remove _ prefixes, I haven’t seen used this in Elixir for private functions. You can just use the same name as the arity is different.
Move the when a new line
Rename _merge_intervals to merge_sorted_intervals

saudade · May 11, 2017, 2:13pm

Would my above solution be preferable to a standard reduce?

def merge_intervals(intervals) do
   [head | tail] = Enum.sort(intervals, &(&1.start <= &2.start)) 
   Enum.reduce(tail, [head], &merge_step/2)
   |> Enum.reverse
end

def merge_step(int, acc) do
   [ head | tail ] = acc
   if head.end >= int.start do
     [_collapse_intervals(head, int) | tail]
   else
     [int, head | tail]
   end
end

Three questions related to that solution:

Notice that I now am only piping two functions instead of three. It seemed to be required to start off the reduce with the first interval in the accumulator already and I couldn’t figure out how to achieve the [head | tail] match without removing my first pipe. Is there a way do that?
Is there a way to define the merge_step function as an anonymous function on multiple lines? I come from the javascript world where something like:

intervals.reduce( (acc, int) => {
…work
});

is standard. I tried the fn(int,acc) -> end syntax but it seems like that only works if the end is on the same line.

I still feel like what I’ve read suggests avoiding if statements when possible, but this seems like a pretty bread and butter use case. Is there a better option?

To point 2 that you made: This was suggested in Programming Elixir 1.3. I’m happy to choose away from it, as I’ve never liked leading underscores anyway, but would be interested in what the consensus is.

bbense · May 11, 2017, 2:19pm

Leading underscores for private functions is a terrible idea IMHO. Leading underscores have meaning to the compiler for variable names. Adding them to functions just leads to confusion.

benwilson512 · May 11, 2017, 2:29pm

Correct me if I’m wrong, but don’t you want to sort by &(&1.start <= &2.start) ?

saudade · May 11, 2017, 2:31pm

Yep. Fixed that in the above code.

peerreynders · May 11, 2017, 5:58pm

I think in this case the reduce version is more readable - but that doesn’t mean that reduce is always better than recursion - context is everything.

Actually according the Elixir style guide that code should be written as

def merge_intervals(intervals) do
  [head | tail] = Enum.sort(intervals, &(&1.start <= &2.start))
  tail
  |> Enum.reduce([head], &merge_step/2)
  |> Enum.reverse
end

Well you could do this

def merge_intervals(intervals) do
  intervals
  |> Enum.sort(&(&1.start <= &2.start))
  |> (fn ([]) -> [] 
         ([h|t]) -> [[h]|t] end.())
  |> Enum.reduce(&merge_step/2)
  |> Enum.reverse
end

… but I don’t see anything to recommend that approach.

Works fine, no problems …

def merge_intervals(intervals) do
  [head | tail] = Enum.sort(intervals, &(&1.start <= &2.start))
  tail
  |> Enum.reduce([head],fn (int, [head|tail]) ->
       cond do
         head.end >= int.start -> 
           [collapse_intervals(head, int) | tail]
         true ->
           [int, head | tail]
       end
     end)
  |> Enum.reverse
end

… but why for heavens sake would you choose to write code like this??? I find this infinitely more readable:

defmodule Intervals do

  defp compare_start(lhs,rhs), do: lhs.start <= rhs.start

  defp merge(lhs, rhs), do: %{start: (min lhs.start, rhs.start), end: (max lhs.end, rhs.end )}

  defp cons(head, [next|tail]) do
    cond do
      next.end >= head.start -> # intervals overlap
        [merge(head, next) | tail]
      true ->
        [head, next | tail]
    end
  end

  defp prep([]), do: []
  defp prep([head | tail]), do: [[head]|tail]

  def merge_intervals(intervals) do
    intervals
    |> Enum.sort(&compare_start/2)
    |> prep()
    |> Enum.reduce(&cons/2)
    |> Enum.reverse
  end

end

IO.inspect Intervals.merge_intervals [%{start: 1, end: 2},%{start: 2,end: 3}, %{start: 4, end: 6}, %{start: 7, end: 10}, %{start: 9, end: 12}]

i.e. no anonymous functions “cleverly” interleaving code everywhere, but rather well-named functions that describe what they do (so OK, cons is a bit conceited). Anonymous functions have their place but I use them when I need a closure - other than that I prefer well named functions.

And you’ve just hit one of my pet peeves with the “JavaScript community style” - to quote my own rant:

The main issue with if is that it looks like a statement when in fact it is an expression - an if without an else will return a value of nil (which actually is :nil, i.e. its an atom) when the condition fails. So I personally focus on using cond and case (as shown in the example above) to remind myself that everything is an expression.

Also unless you are planning to add more information to the maps, I think its perfectly OK to use tuples in this case:

defmodule Intervals do

  defp compare_start({lhs,_},{rhs,_}), do: lhs <= rhs
  
  defp merge({lhs_start,lhs_end}, {rhs_start,rhs_end}), 
    do: {(min lhs_start, rhs_start), (max lhs_end, rhs_end)}
  
  defp cons({head_start,_} = head, [{_, next_end} = next|tail]) do
    cond do
      next_end >= head_start -> # intervals overlap
        [(merge head, next) | tail]
      true ->
        [head, next | tail]
    end
  end
  
  defp prep([]), do: []
  defp prep([head | tail]), do: [[head]|tail]
  
  def merge_intervals(intervals) do
    intervals
    |> Enum.sort(&compare_start/2)
    |> prep()
    |> Enum.reduce(&cons/2)
    |> Enum.reverse
  end

end

IO.inspect Intervals.merge_intervals [{1,2},{2,3}, {4,6}, {7,10}, {9,12}]

saudade · May 11, 2017, 8:32pm

Great stuff, thanks!

One question: What’s going on here?

do: lhs <= rhs

Haven’t seen this syntax yet. Is it effectively a merge?

OvermindDL1 · May 11, 2017, 8:39pm

Nope, all this does is the body of the function (delineated via the do:) just compares the left and right bindings of the names lhs and rhs using the <= operator, basically just like 1 <= 2, and returns true or false depending on if the lhs is less than or equal to the rhs.

peerreynders · May 11, 2017, 9:15pm

This could be considered another refinement

defmodule Intervals do

  defp lte_start({lhs_start, _},{rhs_start, _}), do: lhs_start <= rhs_start
  
  defp cons({head_start, head_end}, [{next_start, next_end} | tail]) 
    when next_end >= head_start,   # intervals overlap
      do: [{next_start, (max next_end, head_end)} | tail]
  defp cons(head, rest),
    do: [head | rest]
  
  defp prep([]), do: []
  defp prep([head | tail]), do: [[head]|tail]
  
  def merge_intervals(intervals) do
    intervals
    |> Enum.sort(&lte_start/2)
    |> prep()
    |> Enum.reduce(&cons/2)
    |> Enum.reverse
  end

end

IO.inspect Intervals.merge_intervals [{1,2},{2,3}, {4,6}, {7,10}, {9,12}]

It’s very important to recognize that pattern matching is a conditional construct - unlike JavaScript’s Destructuring Assignment.

I personally also view the guard sequence (when-clause) as an extension of the pattern match as the guard sequence can supply match conditions that would be difficult to express within the pattern itself.

saudade · May 11, 2017, 9:45pm

Nope, all this does is the body of the function (delineated via the do:) just compares the left and right bindings of the names lhs and rhs using the <= operator, basically just like 1 <= 2, and returns true or false depending on if the lhs is less than or equal to the rhs.

…that makes more sense