Enum Running Total Calculations with chunk

I have a data set for which I need to calculate a running total for sub-segments. My data set looks like this:

trip_id, station_code, departure_time, station_sequence
1, A, 08:00:00, 1
1, B, 08:05:00, 2
1, C, 08:07:00, 3
1, D, 08:12:00, 4
2, Z, 08:30:00, 1
2, Y, 08:37:00, 2
2, X, 08:41:00, 3

For each record in this data set, I want to add another field, called trip_duration_min. This will be the total time of the trip since the first stop sequence.

trip_id, station_code, departure_time, station_sequence, trip_duration_min
1, A, 08:00:00, 1, 0
1, B, 08:05:00, 2, 5
1, C, 08:07:00, 3, 7
1, D, 08:12:00, 4, 12
2, Z, 08:30:00, 1, 0
2, Y, 08:37:00, 2, 7
2, X, 08:41:00, 3, 11

I’m wondering what is the best way to calculate this value using Stream or Enum. My thought was to use Stream.chunk_by to break up the data set into individual enumerables for each trip_id, then use Stream.scan to accumulate the total time in minutes of each trip. However, I’m not sure what the best way is to reference the prior record in an enumerable.

Any thoughts?

I would use stream to go through the file and pattern match the line content.

1 Like

Here’s what I did. I put the data set into a list of maps using Stream.map. It looks like this:

[%{trip_id: 1, station_code: "A", departure_time: "08:00:00", station_sequence: 1}, 
 %{trip_id: 1, station_code: "B", departure_time: "08:05:00", station_sequence: 2},
 %{...}]

I then output the Stream to a list:

|> Enum.to_list

I used pattern matching to calculate the trip_duration_min values. Here’s my function.

def put_trip_duration_min([], _, _, acc), do: acc
def put_trip_duration_min([%{departure_time: departure_time,
                               station_sequence: station_sequence} = schedule|t],
                               prior_departure_time, trip_duration_min, acc) do
  trip_duration_min =
    case station_sequence do
      1 ->
        0
      _ ->
        duration(departure_time, prior_departure_time, trip_duration_min, :minutes)
    end

  acc = [Map.put(schedule, :trip_duration_min, trip_duration_min)|acc]
  put_trip_duration_min(t, departure_time, trip_duration_min, acc]
end

defp duration(time, prior_time, duration, units) do
  time = Time.from_iso8601!(time) |> Timex.Duration.from_time
  prior_time = Time.from_iso8601!(prior_time) |> Timex.Duration.from_time
  duration + Timex.Duration.diff(time, prior_time, units)
end

I then call the function and reverse the list back to it’s original order at the end!

|> put_trip_duration_min(nil, 0, [])
|> Enum.reverse

I hope this is helpful for someone else down the road :+1:

1 Like