defmodule Intervals do
def merge_intervals(intervals) do
Enum.sort(intervals, &(&1.start <= &2.start))
|> _merge_intervals([])
|> Enum.reverse
end
defp _merge_intervals([], merged_intervals), do: merged_intervals
defp _merge_intervals([%{end: cur_end} = cur, %{start: next_start} = next | tail], merged_intervals) when cur_end >= next_start do
collapsed = _collapse_intervals(cur,next)
_merge_intervals(tail, [collapsed | merged_intervals])
end
defp _merge_intervals([head | tail], merged_intervals), do: _merge_intervals(tail, [head | merged_intervals])
defp _collapse_intervals(a, b), do: %{start: a.start, end: b.end}
end
Obviously looking for any no-brainer idiomatic improvements, but my one nagging question is if inclusion of the second merge_intervals definition is the right way to do things. It seems significantly more difficult to read (especially pattern matching to extract cur_end and next_start) than throwing a control-flow if into the body of the function definition. I know that pattern matching is powerful, but is there a compelling reason to use that functionality in a case like this one? Is there a better way to do this?
Would my above solution be preferable to a standard reduce?
def merge_intervals(intervals) do
[head | tail] = Enum.sort(intervals, &(&1.start <= &2.start))
Enum.reduce(tail, [head], &merge_step/2)
|> Enum.reverse
end
def merge_step(int, acc) do
[ head | tail ] = acc
if head.end >= int.start do
[_collapse_intervals(head, int) | tail]
else
[int, head | tail]
end
end
Three questions related to that solution:
Notice that I now am only piping two functions instead of three. It seemed to be required to start off the reduce with the first interval in the accumulator already and I couldn’t figure out how to achieve the [head | tail] match without removing my first pipe. Is there a way do that?
Is there a way to define the merge_step function as an anonymous function on multiple lines? I come from the javascript world where something like:
intervals.reduce( (acc, int) => {
…work
});
is standard. I tried the fn(int,acc) -> end syntax but it seems like that only works if the end is on the same line.
I still feel like what I’ve read suggests avoiding if statements when possible, but this seems like a pretty bread and butter use case. Is there a better option?
To point 2 that you made: This was suggested in Programming Elixir 1.3. I’m happy to choose away from it, as I’ve never liked leading underscores anyway, but would be interested in what the consensus is.
Leading underscores for private functions is a terrible idea IMHO. Leading underscores have meaning to the compiler for variable names. Adding them to functions just leads to confusion.
i.e. no anonymous functions “cleverly” interleaving code everywhere, but rather well-named functions that describe what they do (so OK, cons is a bit conceited). Anonymous functions have their place but I use them when I need a closure - other than that I prefer well named functions.
And you’ve just hit one of my pet peeves with the “JavaScript community style” - to quote my own rant:
The main issue with if is that it looks like a statement when in fact it is an expression - an if without an else will return a value of nil (which actually is :nil, i.e. its an atom) when the condition fails. So I personally focus on using cond and case (as shown in the example above) to remind myself that everything is an expression.
Also unless you are planning to add more information to the maps, I think its perfectly OK to use tuples in this case:
defmodule Intervals do
defp compare_start({lhs,_},{rhs,_}), do: lhs <= rhs
defp merge({lhs_start,lhs_end}, {rhs_start,rhs_end}),
do: {(min lhs_start, rhs_start), (max lhs_end, rhs_end)}
defp cons({head_start,_} = head, [{_, next_end} = next|tail]) do
cond do
next_end >= head_start -> # intervals overlap
[(merge head, next) | tail]
true ->
[head, next | tail]
end
end
defp prep([]), do: []
defp prep([head | tail]), do: [[head]|tail]
def merge_intervals(intervals) do
intervals
|> Enum.sort(&compare_start/2)
|> prep()
|> Enum.reduce(&cons/2)
|> Enum.reverse
end
end
IO.inspect Intervals.merge_intervals [{1,2},{2,3}, {4,6}, {7,10}, {9,12}]
Nope, all this does is the body of the function (delineated via the do:) just compares the left and right bindings of the names lhs and rhs using the <= operator, basically just like 1 <= 2, and returns true or false depending on if the lhs is less than or equal to the rhs.
It’s very important to recognize that pattern matching is a conditional construct - unlike JavaScript’s Destructuring Assignment.
I personally also view the guard sequence (when-clause) as an extension of the pattern match as the guard sequence can supply match conditions that would be difficult to express within the pattern itself.
Nope, all this does is the body of the function (delineated via the do:) just compares the left and right bindings of the names lhs and rhs using the <= operator, basically just like 1 <= 2, and returns true or false depending on if the lhs is less than or equal to the rhs.