# Any plans for other diff algorithms like String.myers_difference method, but for other data types?

Hey, I read a `&String.myers_difference/2` code and modified it, so it’s possible to use `List` (also char list) and `Keyword` types instead of `String` type.

Here is an example module:

``````defmodule Example do
def myers_difference(list1, list2) do
len1 = Enum.count(list1)
len2 = Enum.count(list2)
path = {0, 0, list1, list2, []}
find_script(0, len1 + len2, [path])
end

defp find_script(envelope, max, _paths) when envelope > max do
nil
end

defp find_script(envelope, max, paths) do
case each_diagonal(-envelope, envelope, paths, []) do
{:done, edits} -> compact_reverse(edits, [], false)
{:next, paths} -> find_script(envelope + 1, max, paths)
end
end

defp compact_reverse([], acc, _), do: acc

defp compact_reverse([{kind, el} | rest], [{kind, next_elem} | acc], false) do
compact_reverse(rest, [{kind, [el, next_elem]} | acc], true)
end

defp compact_reverse([{kind, el} | rest], [{kind, next_elements} | acc], true) do
compact_reverse(rest, [{kind, [el | next_elements]} | acc], true)
end

defp compact_reverse([elem | rest], acc, _) do
compact_reverse(rest, [elem | acc], false)
end

defp each_diagonal(diag, limit, _paths, next_paths) when diag > limit do
{:next, Enum.reverse(next_paths)}
end

defp each_diagonal(diag, limit, paths, next_paths) do
{path, rest} = proceed_path(diag, limit, paths)
with {:cont, path} <- follow_snake(path) do
each_diagonal(diag + 2, limit, rest, [path | next_paths])
end
end

defp proceed_path(0, 0, [path]), do: {path, []}

defp proceed_path(diag, limit, [path | _] = paths) when diag == -limit do
{move_down(path), paths}
end

defp proceed_path(diag, limit, [path]) when diag == limit do
{move_right(path), []}
end

defp proceed_path(_diag, _limit, [path1, path2 | rest]) do
if elem(path1, 1) > elem(path2, 1) do
{move_right(path1), [path2 | rest]}
else
{move_down(path2), [path2 | rest]}
end
end

defp move_right({x, y, list1, [elem | rest], edits}) do
{x + 1, y, list1, rest, [{:ins, elem} | edits]}
end

defp move_right({x, y, list1, [], edits}) do
{x + 1, y, list1, [], edits}
end

defp move_down({x, y, [elem | rest], list2, edits}) do
{x, y + 1, rest, list2, [{:del, elem} | edits]}
end

defp move_down({x, y, [], list2, edits}) do
{x, y + 1, [], list2, edits}
end

defp follow_snake({x, y, [elem | rest1], [elem | rest2], edits}) do
follow_snake({x + 1, y + 1, rest1, rest2, [{:eq, elem} | edits]})
end

defp follow_snake({_x, _y, [], [], edits}) do
{:done, edits}
end

defp follow_snake(path) do
{:cont, path}
end
end
``````

As you can see I modified only some things:

1. variable names - now we don’'t have a string and chars (graphemes) - instead we have list and elements
2. `compact_reverse` method has one more (boolean) parameter and case - it’s for create/join a list of changes (no conflict with nested list)
3. `myers_difference` main method is modified to support list (len1 and len2 variables)

Here is the link for source of original implementation.

If someone is interested in `Map` implementation I can try modify the code.

What do you think about my changes?

4 Likes

I’d think a myers difference of various base types would be quite fascinating. A good diff algorithm is useful in a lot of ways. Should make an elixir library? There is a diff package on hex but it only works on list of base values (like integers, graphemes, etc…), and a more generic one (also protocol driven for custom user structs and such perhaps?) would be quite useful.

2 Likes

I know a diff hex package.
I’m working on a differ package (compare, patch and revert) for all types that can be saved in Ecto. Just see that implementation for String had 99% of implementation for Lists and post my code + question. It’s not so more important for me to add this into elixir code. Just modify it and see if someone interested. Got first like, so I see it’s not a bad idea . Maybe in this month I post my differ package, but there are a lot of things I want to implement to it. I have done it for String, Atom (also boolean), Integer and Float types. Now I’m working on List and Map (also structs).

2 Likes

We did discuss at least Enum.myers_difference but I don’t remember where we settled on that. I will ping @lexmag and follow up on that.

3 Likes

The problem with `Enum.myers_difference/2` is that it won’t work with every enumerable.
Algorithm works on sequences but enumerables are collections (and only sometimes sequences, maps, for example, have no order).

We could provide `List.myers_difference/2` and implement `String.myers_difference/2` in terms of it. I think this would be generic enough and people can use it on their own, for example, to diff tuples.

2 Likes

What about turn `Map` into `Keyword`?
With my code in `iex`:

``````old = Enum.into(%{a: 5}, [])
new = Enum.into(%{a: 5, b: 10}, [])
result = Example.myers_difference(old, new)
assert result == [eq: {:a, 5}, ins: {:b, 10}]
``````

Turning map into keyword won’t help as we get keyword with non-deterministic order.
Maps with less than 33 elements have order but it is an implementation detail and this fact should be ignored.

3 Likes

33 elements - ok now I see “special case” - I tested it for small maps

``````old_map = %{...}
new_map = %{...}
old_data = old_map |> Enum.into([]) |> Enum.sort
new_data = new_map |> Enum.into([]) |> Enum.sort
# create diff from old_data and new_data here ...
``````

I see that big maps lost order, but after turning them into `Keyword` and then sort them it already works. We have order and we can compare two maps (as sorted keywords).
Do we need to have ordered maps after patching and reverting when we will have already done creating diff of ordered keywords?
What do you think about it?

EDIT
In my differ library (under development) I have not problems with compare, patch and revert big maps, but I think ordered diff is much more cleaner (what differ already do), so I added `&Enum.sort/1` in `Map` support.
Do you see any other special case?

Keywords ordering will improve edit script, however, it will be far from optimal and non-representative. Myers’ algorithm is designed to work on sequences, but maps have different semantics. It would be wasteful application of algorithm with inaccurate result while there are much faster ways to find edit script for maps, for example.

2 Likes

As I wrote I’m working on differ library. If you know that better implementation for `Map` will be added in next versions I can stop work on this (I mean `Map` support) and wait for new release(s) of Elixir.
Note: I started wrote about myers algorithm, because don’t know any Elixir implementation for other data types.
I’m looking for all diff algorithms for data that could be saved in Ecto.
Can you list all Elixir (also in future releases) algorithms you plan to support?

If I good understand:

1. `List` and `String` could have implemented myers_difference own method (TODO implementation for List).
2. `Atom` and binary could use an `String` implementation.
3. `Atom` (boolean), `Decimal`, `Float` and `Integer` do not need any advanced diff algorithm.

What about `Map`? Any public API or plans for this?
What about UUID, `Date`, `DateTime` and `Time`? For data and time infos I want to get difference in milliseconds, but I don’t find any API for it.

EDIT: Topic updated.

1 Like

Can you list all Elixir (also in future releases) algorithms you plan to support?

Currently we don’t have plans for adding more diff algorithms.
`List.myers_difference/2` was added today in master branch.

What about Map? Any public API or plans for this?

I recommend checking `ExUnit.Diff` module to see what we do for maps and other data types. In particular, for maps we use O(N) algorithm there.

4 Likes

In other thread @josevalim wrote that is not good idea to use non-public API:

You are not supposed to be calling it and there is no guarantee the module nor the function will exist in future Elixir versions.

I think copy it to my project is not good idea too, because I need to compare changes each release.
Anyway thanks for your respond. I will use temporary `Map` as sorted `Keyword` until somebody add `ExUnit.Diff` module to public API.
I will try to implement diff for other data types.

See simple Map.meyers_difference here: https://gist.github.com/boydm/ab7aec16d54709862ec3491c2b01ddda

@boydm: thanks, but keep in mind what was already said: