Send StringIO to Stream

pmargreff · May 11, 2020, 2:36pm

I have a process where I’m creating and accumulating a string into a StringIO file, the creation and aggregation, it is simple, I open the string IO and send the PID with process state.

# process init    
{:ok, pid} = StringIO.open("stream_buffer")

# call proccess to aggregation 
IO.write(pid, text <> " ")

The problem is, when I’m finishing the process, I want to send it directly to a stream. I use to save it to a file, but when the buffer is too big (1 or 2 GBs), the process to send it to a file and get it back takes more time than I wish to wait.

My first try to send it directly into a stream was something like this:

    pid
    |> IO.binstream(:line)

The problem with this try (and I don’t understand if it is intentional or not) is that this instead of getting the content written on this IO returns the content I’ve used on open. So instead of transform, my content aggregated trough IO.write it always transforms the content I’ve used as the “stream name”, example:

{:ok, pid} = StringIO.open("buffer")
IO.write(pid, "And here we should have the buffer content")

# will evaluate to {"buffer", "And here we should have the buffer content"}
pid
|> StringIO.describe()
|> IO.inspect()

# will evaluate "buffer", but I expect it evaluate the whole buffer content, not only the initial value
pid
|> IO.binstream(:line)
|> Stream.map(&IO.inspect/1)
|> Stream.run()

What I’m doing now, it’s creating a new StringIO with the old one content, what I’m feeling it’s really wrong:

{:ok, other_pid} = pid
|> StringIO.flush()
|> StringIO.open()

So I yank the content from the PID where I was aggregating to a new device, the problem is:
1 - It really feels wrong.
2 - When the String is huge this yank operation from the old to the new device still takin a lot of time (lass than create and read a file from disk, but still slow)

What I’m doing wrong here? What are the other options to process it on the fly using streams, are other possible options besides StringIO?

akash-akya · May 11, 2020, 3:39pm

I think there is a confusion here, StringIO is not really a buffer in the way you are trying to use.

Maybe this example might help.
Think of it like a terminal and you are running a program inside this terminal. so your program gets its input from the terminal and write its output to terminal. All IO functionality are there from the perspective of this program. That means Enumeration is pulling the input from the terminal (you can not pull output in this context). And you push your output to the terminal, that is Collectable.

For example:

defmodule Capitalize do
  def run(terminal) do
    stream = IO.binstream(terminal, :line)

    stream
    |> Stream.map(&String.capitalize/1) # we are pulling `input` here
    |> Stream.into(stream)
    |> Stream.run()
  end
end

input = ["a", "b", "c", "d"] |> Enum.join("\n")
{:ok, terminal} = StringIO.open(input)

Capitalize.run(terminal)

{"", output} = StringIO.contents(terminal)
# "A\nB\nC\nD"

Also StringIO might not be efficient in the way you are using. With the quick glance its just concatenating the output https://github.com/elixir-lang/elixir/blob/v1.10.3/lib/elixir/lib/string_io.ex#L287. You are better of using iodata