Parallel Downloading GitHub API using GenStage/Flow

I have been experimenting with GenStage / Flow and I’ve been bugged by a few questions. Here’s what I want to do:

Say given a GitHub username, I want to:

  • retrieve a list of all the star gazers of the repositories owned by this user
  • sort repositories by descending number of issues

Here’s my thought process:

Both require a JSON list of repository information coming from https://api.github.com/users/octocat/repos, so this will be the producer.

So in the next step, two producer/consumers will read from the list of repositories. So far, after some fiddling and #elixir-lang, I managed to use GenStage.BroadcastDispatcher to have the same events sent to the producers/consumers.

Each event received by the above producer/consumer is a Map representation of a repository:

%{ 
"stargazer_url" => "https://api.github.com/repos/octocat/linguist/stargazers",
"issue_url" => "https://api.github.com/repos/octocat/linguist/issues",
"many_other_keys" => "were omitted"
}

The first producer/consumer will read stargazer_url and fetch the URL.

The second producer/consumer will read issue_url and fetch the URL.

Both will produce the resulting JSON from the GET request.

Question: At this point, am I modeling things right?

Assuming I didn’t screw up, the next step would be to connect the consumers to their respective producers/consumers. Each consumer will gather the resulting responses (either from stargazer_url or issue_url) and combining them together to form the final result.

Question: This feels like there’s something off. It seems like I’m missing a step, and not leveraging concurrency.

Also, I’m not too sure how to model this using Flow.

1 Like