Algorithm challenge: the Partition Problem

dimitarvp · January 18, 2021, 6:38pm

Hey community,

Lately I needed to calculate the total of the sizes of all my external HDDs, SSDs and USB flash sticks lying around so I can make a mirrored RAID setup for my NAS. Say I had disks that are 1TB, 2TB, 3TB and 4TB, then I’d have usable storage of 5TB, mirrored in two groups of disks: (a) 1TB+4TB and (b) 2TB+3TB.

This led me to rediscover the Partition problem and I went ahead and made a small project here:

This project is an invitation for everyone who wants to try implementing the algorithm through PRs.

I already included automatic unit and property test harness. All you have to do is add one function adhering to a spec (described in the project’s README) to a module and then just run mix test (shout out to @LostKobrakai for helping me with the code that generates the tests) . If your algorithm fails you’ll see the broken tests prefixed with impl_<YOUR_NAME> which could help iterating on it.

I also plan to add benchmarking code if anyone contributes an alternative algorithm. Mine was about 15 years old Java class that I adapted to Elixir (and discovered an almost complete clone of it online in the meantime).

PRs for other hardcoded values for the unit tests, or additional property tests, are welcome as well.

Would love to see your ideas, folks!

dimitarvp · February 5, 2022, 1:42pm

Encouraged by the success of this thread – Can you improve this? Zipping two lists, the result must be same size as the first list – I figured I’ll bump this one in case people are interested.

lud · February 7, 2022, 10:17am

This problem has a well known solution, you just keep track of each partition size and add the biggest remaining part to the smallest size. The basic implementation with recursive functions will already be optimized.

You can add streams and stuff but it will only be slower.

I propose to change the tests with a more generic specification, that is asking for a number of partitions instead of always having two partitions.

I tested with the code below for two partitions, but this is failing the tests as the tests expects the specific results of your implementation. I would pass if the tests only asserted the expected sums but not lists items.

  def impl_lud(list) when is_list(list) do
    # solve the problem
    result = impl_lud(list, 2)

    # comply with the shape expected from the test
    [{left_list, left_sum}, {right_list, right_sum}] = Enum.sort_by(result, &(-1 * elem(&1, 1)))
    {left_list, right_list, left_sum, right_sum}
  end

  defp impl_lud(list, parts) do
    partitions = List.duplicate({[], 0}, parts)

    list
    |> Enum.sort(:desc)
    |> Enum.reduce(partitions, &impl_lud_reducer/2)
  end

  defp impl_lud_reducer(x, [{partition, size} | partitions]) do
    impl_lud_insert({[x | partition], size + x}, partitions)
  end

  defp impl_lud_insert({_, size} = p, [{_, smaller} = candidate | rest]) when smaller < size do
    [candidate | impl_lud_insert(p, rest)]
  end

  defp impl_lud_insert(p, rest) do
    [p | rest]
  end

With those errors:


  1) test impl_lud: [1, 3, 4, 5, 7, 8, 9, 11, 65, 74, 83, 100] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[4, 7, 9, 65, 100], [1, 3, 5, 8, 11, 74, 83], 185, 185} ==
              PP.impl_lud([1, 3, 4, 5, 7, 8, 9, 11, 65, 74, 83, 100])
     left:  {[4, 7, 9, 65, 100], [1, 3, 5, 8, 11, 74, 83], 185, 185}
     right: {[1, 3, 7, 9, 65, 100], [4, 5, 8, 11, 74, 83], 185, 185}
     stacktrace:
       test/pp_test.exs:36: (test)

.....

  2) test impl_lud: [2, 3, 10, 5, 8, 9, 7, 3, 5, 2] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[2, 3, 5, 7, 10], [2, 3, 5, 8, 9], 27, 27} == PP.impl_lud([2, 3, 10, 5, 8, 9, 7, 3, 5, 2])
     left:  {[2, 3, 5, 7, 10], [2, 3, 5, 8, 9], 27, 27}
     right: {[2, 3, 5, 8, 9], [2, 3, 5, 7, 10], 27, 27}
     stacktrace:
       test/pp_test.exs:36: (test)

..

  3) test impl_lud: [3, 1, 1, 2, 2, 1] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[1, 1, 3], [1, 2, 2], 5, 5} == PP.impl_lud([3, 1, 1, 2, 2, 1])
     left:  {[1, 1, 3], [1, 2, 2], 5, 5}
     right: {[1, 2, 2], [1, 1, 3], 5, 5}
     stacktrace:
       test/pp_test.exs:36: (test)

dimitarvp · February 7, 2022, 10:23am

Yeah, I’ve read about it in the meantime, I guess my goal here shifted to have code that’s super short and very readable?

When I started off though, it was just a toy problem to me so that explains the limitations that you spotted.

Do you think it’s wrong to assert on the specific pieces of the list in the tests?

lud · February 7, 2022, 10:29am

It depends on how much you want balanced partitions.

If you only expect partitions to be as evenly sized as possible, then no yes it’s wrong. But if you add in the requirements that the average item size in each partition should also be balanced then ~~yes~~ no it’s not wrong, it could be interesting to expect some sizes.

For instance {[2,2], [1,1,1,1]} are “sum-balanced”, but if we consider the average then the right answer should be {[2,1,1], [2,1,1]}. And even there, the test should take the list items, and sort them before comparison with the expected answer. Because some algorithms may return {[2,1,1], [1,2,1]} which is still valid.

lud · February 7, 2022, 10:37am

I edited my answer because I inverted yes/no

I remembmer a hiring test from Dropbox, it had to be done in C lang IIRC. It was kind of the same problem but it was in two dimensions: given a square of N * N tiles, and a list of boxes of n*m tiles, write an algorithm that packs the most boxes into the big square.

dimitarvp · February 7, 2022, 12:04pm

Ohhh, yes, it’s coming back to me now. I was specifically looking into this secondary optimization: find lists that are both sum-balanced and more closer in size to each other.

Aetherus · February 7, 2022, 3:06pm

So some cases in the pp_test.exs should be modified, right?

For example, the expected value of the case [1, 3, 4, 5, 7, 8, 9, 11, 65, 74, 83, 100] could be {[1, 3, 5, 11, 65, 100], [4, 7, 8, 9, 74, 83], 185, 185}

Aetherus · February 7, 2022, 3:08pm

I implemented the solution with subset-sum and dynamic programming, though I’m not a dynamic programming expert

I’m using the process dictionary as the memo.

defmodule PP.Impls.Aetherus do
  @spec solve([pos_integer()]) ::
    {
      {partition1_sum :: non_neg_integer(), partition2_sum :: non_neg_integer()},
      {partition1_length :: non_neg_integer(), partition2_length :: non_neg_integer()},
      {partition1 :: [pos_integer()], partition2 :: [pos_integer()]}
    }
  def solve(nums) do
    sum = Enum.sum(nums)
    len = length(nums)
    sum_target = div(sum, 2)
    len_target = div(len, 2)

    {sum_diff, len_diff, partition1} =
      Task.async(fn ->
        closest_subset_sum(Enum.with_index(nums), sum_target, len_target)
      end)
      |> Task.await(:infinity)

    partition2 = nums |> List.myers_difference(partition1) |> Keyword.get_values(:del) |> List.flatten()

    {
      {sum_target + sum_diff, sum - sum_target - sum_diff},
      {len_target + len_diff, len - len_target - len_diff},
      {partition1, partition2}
    }
  end

  @spec closest_subset_sum([pos_integer()], non_neg_integer(), non_neg_integer()) ::
          {integer(), integer(), [pos_integer()]}
  defp closest_subset_sum([], sum_target, length_target) do
    {-sum_target, -length_target, []}
  end

  defp closest_subset_sum([{h, i} | t], sum_target, length_target) do
    key = {i, sum_target, length_target}

    case Process.get(key) do
      solution when solution != nil ->
        solution

      nil ->
        {sum_diff1, len_diff1, solution1} =
          closest_subset_sum(t, sum_target - h, length_target - 1)

        {sum_diff2, len_diff2, solution2} = closest_subset_sum(t, sum_target, length_target)

        cond do
          abs(sum_diff1) < abs(sum_diff2) ->
            {sum_diff1, len_diff1, [h | solution1]}

          abs(sum_diff1) > abs(sum_diff2) ->
            {sum_diff2, len_diff2, solution2}

          abs(len_diff1) <= abs(len_diff2) ->
            {sum_diff1, len_diff1, [h | solution1]}

          true ->
            {sum_diff2, len_diff2, solution2}
        end
        |> tap(fn solution -> Process.put(key, solution) end)
    end
  end
end

Tests failed:

  1) test impl_aetherus: [3, 1, 1, 2, 2, 1] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[1, 1, 3], [1, 2, 2], 5, 5} == PP.impl_aetherus([3, 1, 1, 2, 2, 1])
     left:  {[1, 1, 3], [1, 2, 2], 5, 5}
     right: {[3, 1, 1], [2, 2, 1], 5, 5}
     stacktrace:
       test/pp_test.exs:36: (test)



  2) test impl_aetherus: [2, 3, 10, 5, 8, 9, 7, 3, 5, 2] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[2, 3, 5, 7, 10], [2, 3, 5, 8, 9], 27, 27} == PP.impl_aetherus([2, 3, 10, 5, 8, 9, 7, 3, 5, 2])
     left:  {[2, 3, 5, 7, 10], [2, 3, 5, 8, 9], 27, 27}
     right: {[2, 3, 10, 5, 7], [8, 9, 3, 5, 2], 27, 27}
     stacktrace:
       test/pp_test.exs:36: (test)



  3) test impl_aetherus: [1, 3, 4, 5, 7, 8, 9, 11, 65, 74, 83, 100] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[4, 7, 9, 65, 100], [1, 3, 5, 8, 11, 74, 83], 185, 185} ==
              PP.impl_aetherus([1, 3, 4, 5, 7, 8, 9, 11, 65, 74, 83, 100])
     left:  {[4, 7, 9, 65, 100], [1, 3, 5, 8, 11, 74, 83], 185, 185}
     right: {[1, 3, 5, 11, 65, 100], [4, 7, 8, 9, 74, 83], 185, 185}
     stacktrace:
       test/pp_test.exs:36: (test)



  4) test impl_aetherus: [6000, 3000, 480, 240, 120] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[6000], [120, 240, 480, 3000], 6000, 3840} == PP.impl_aetherus([6000, 3000, 480, 240, 120])
     left:  {[6000], [120, 240, 480, 3000], 6000, 3840}
     right: {[6000], [3000, 480, 240, 120], 6000, 3840}
     stacktrace:
       test/pp_test.exs:36: (test)

.

  5) test impl_aetherus: [1, 1, 1, 1, 1, 1, 1, 50, 50, 100] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[1, 1, 1, 1, 100], [1, 1, 1, 50, 50], 104, 103} ==
              PP.impl_aetherus([1, 1, 1, 1, 1, 1, 1, 50, 50, 100])
     left:  {[1, 1, 1, 1, 100], [1, 1, 1, 50, 50], 104, 103}
     right: {[1, 1, 1, 50, 50], [1, 1, 1, 1, 100], 103, 104}
     stacktrace:
       test/pp_test.exs:36: (test)

..

  6) test impl_aetherus: [1, 2, 3, 4, 5] (PPTest)
     test/pp_test.exs:35
     Assertion with == failed
     code:  assert {[1, 2, 5], [3, 4], 8, 7} == PP.impl_aetherus([1, 2, 3, 4, 5])
     left:  {[1, 2, 5], [3, 4], 8, 7}
     right: {[2, 5], [1, 3, 4], 7, 8}
     stacktrace:
       test/pp_test.exs:36: (test)

....

Finished in 0.04 seconds (0.00s async, 0.04s sync)
14 tests, 6 failures