Streaming http response without running out of memory

I intent to stream a large XML document over http, containing a list of objects. Each object is to be converted into a struct and put inside an elixir stream. This stream is given to client code which does arbitrary data processing. The idea behind this is to not load the entire document or all structs into memory before passing it to client code.

The syntax would look something like:

stream("http://..")
|> Stream.map(...)

But my concerns already start at the HTTP library, handling async requests by sending http chunks as messages to another process. For example:

HTTPoison.get! "http://", %{}, stream_to: self

When i execute the following:

HTTPoison.start
HTTPoison.get! "http://mirror.nforce.com/pub/speedtests/250mb.bin", %{}, stream_to: self
:observer.start

I can see in the observer the memory usage increase and the messages to my process stacking up until the entire file is loaded. This (in my head) poses a problem, as the processing is potentially way slower that retrieving the file.

Should I be concerned about this? As far as I know it is not a good idea to load the entire file into memory. On the other end, data processing has been started at the moment the first object is received, and telling the webserver to back-off has it’s own problems (more open connections, taking server-side resources longer than needed, etc.)

I would look at using :hackney directly. HTTPoison does not surface the API that :hackney has for doing demand driven http streaming.

Yes, the HTTPoison just sends messages and doesn’t have any back-pressure mechanism. The end result is exactly what you are seeing. I would open up an issue in the HTTPoison issues tracker too, possibly linking back here, the :stream_to API seems to be a very dangerous one.