CSV file parsing with grouped records using flow

How to use Flow for parsing grouped records in CSV file.

Let’s say I have a csv file with 100 records.
One attribute called group_id groups the records.
Lets say first 10 records have have a group_id 1.
another 50 have group_id 2
and so on.

The file can have a min 4000records to max a Million records.

I have to run it in a most efficient way. Where I will group the records and assign them to a supervised worker specified to work on the group only.

1 Like

I tried this solution

config
    |> config_parser_and_return_file_path
    |> File.stream!(read_ahead: chunk_size)
    |> CSVParser.parse_stream
    |> Stream.drop(config["params"]["rows"] |> skip_rows)
    |> Flow.from_enumerable()
    |> Flow.group_by(&Enum.at(&1, mapping["group_id"]["position"]))
    |> Flow.map(&parse_data_columns(&1))
    |> Flow.run

Any performance issue it may arise for a dataset of 1Million records