Algorithm Assistance - Conditional Grouping of Tuples

jnnks · March 10, 2022, 3:01pm

I need some assistance with data aggregation. Functional programming is new to me and I cannot wrap my head around the Elixir way to tackle the following problem:

Consider a list of tuples:

input = ["A", "B", "C", "D", "E", "F"]
tuples = Enum.zip([input, Enum.drop(input, 1)])
# --> [{"A", "B"}, {"B", "C"}, ..., {"E", "F"}]

Consider a mock function that marks some of these tuples as valid and some not.
The function is a black box and will return different results depending on the input. The cases here are just for example.

db_mock = fn {a, b} -> case {a, b} do
  {"A", "B"} -> false
  {"D", "E"} -> false
  _ -> true
end end
results = Enum.map(tuples, db_mock)
# --> [false, true, true, false, true]

I now want for form chains of valid tuples and group them together like so:

[["A"],  ["B",  "C",  "D"],  ["E", "F"]]
#     fls,   tru,  tru,   fls,   tru

For every mismatch, a new group is started, which will be the accumulator until the next mismatch, and so on. These are also valid outcomes:

[["A",  "B",  "C",  "D", "E"], ["F"]]
#    tru,  tru,  tru,  tru,  fls

[["A"],  ["B"],  ["C"],  ["D"],  ["E"],  ["F"]]
#     fls,    fls,    fls,    fls,    fls

[["A",  "B",  "C",  "D",  "E",  "F"]]
#    tru,  tru,  tru,  tru,  tru

The tuples need to stay in the original order, but can be shuffled during the algorithm
I am interesed in the cuts, that caused by invalid tuples

Algorithm in Python

input = ["A", "B", "C", "D", "E", "F"]
tuples = list(zip(input, input[1:]))
results = [False, True, True, False, True]
combined = zip(tuples, results)

[first_word, *_] = input
biglist = []; smalllist = [first_word]
for (left, right), result in zip(tuples, results):
    if result:
        smalllist.append(right)
    else:
        biglist.append(smalllist)
        smalllist = [right]
        
if len(smalllist) > 0:
    biglist.append(smalllist)

print(biglist)
# [['A'], ['B', 'C', 'D'], ['E', 'F']]

What is the best way to do that?

tomkonidas · March 10, 2022, 3:35pm

I am unsure of the whole issue, but something like this might make it easier?
Instead of having a list of true/false just store the valid/invalid and then you can perform on that?

input = ["A", "B", "C", "D", "E", "F"]
tuples = Enum.zip([input, Enum.drop(input, 1)])

Enum.reduce(tuples, %{valid: [], invalid: []}, fn
  {"A", "B"} = data, acc ->
    Map.update!(acc, :invalid, &[data | &1])
  {"D", "E"} = data, acc ->
    Map.update!(acc, :invalid, &[data | &1])
  data, acc ->
    Map.update!(acc, :valid, &[data | &1])
end)

# %{
#  invalid: [{"D", "E"}, {"A", "B"}],
#  valid: [{"E", "F"}, {"C", "D"}, {"B", "C"}]
# }

jnnks · March 10, 2022, 4:03pm

Thanks for your reply. I updated the question, hopefully that makes it clearer.

The tuples need to stay in order, so grouping them in categories does not work for me unfortunately.

jnnks · March 18, 2022, 9:56pm

I got some interesting solutions here: