vivus-ignis

vivus-ignis

Emulate File.stream! for a string variable

Hello there!

I have a mix task which grabs some data from a remote API, collects it into a file (a biggish xml), then sucks it in as a stream and processes doing a bunch of text transformations.
Now I’m trying to sketch an integration test for the processing part (skipping the getting-the-data part) and I wonder… is it possible to use a string variable instead of a file for that purpose?

This is what I have in my mix task code:

    File.stream!("#{@download_dir}/#{category}.xml", [:read])
    |> ... processing part I want to test follows

And this is how I’m trying to simulate (unsuccessfully so far) the File.stream! part:

    input_xml = """
     ... xml fragment ...
    ...
    """
    input_xml
    |> Stream.unfold( &(String.split(&1, "~n")) )
    |> ... processing part ...

Of course what I’m getting after this Stream.unfold is different from File.stream! – the Stream.unfold results in all the newlines being removed.
And then my processing part breaks as it relies on newlines in certain places (yeah, it sounds crazy, but inside that xml I have wiki markup-formatted fragments where newlines do matter).

So my question is: is it possible to split a string by newlines in such a way that I can preserve those "\n"s? That is, can I emulate FIle.stream! without an actual file?

Would appreciate any hints. Thank you!

Marked As Solved

wmnnd

wmnnd

You’re in luck, you can simply use StringIO.open/1 and IO.binstream in order to stream any String:

{:ok, stream} =
  "abc\ndef\nghi\n"
  |> StringIO.open()

stream
|> IO.binstream(:line)
|> #your own stream processing

Also Liked

ku1ik

ku1ik

Alternative solution to using StringIO + IO.binstream would be to use Stream.unfold with binary pattern matching:

  def binary_stream(b, chunk_size \\ 50_000) when is_binary(b) do
    Stream.unfold(0, fn skip ->
      case b do
        <<_skipped::binary-size(skip), chunk::binary-size(chunk_size), _rest::binary>> ->
          {chunk, skip + chunk_size}

        <<_skipped::binary-size(skip)>> ->
          nil

        <<_skipped::binary-size(skip), chunk::binary>> ->
          {chunk, skip + byte_size(chunk)}
      end
    end)
  end

I tested this on pretty big XML docs (hundreds of megabytes) and it seems to be performant and doesn’t require much memory due to not copying/cloning any part of the binary data.

PS: for this particular case (XML parsing), it’s not necessary to read by line, and in fact some XML documents (or SOAP API responses) return whole XML doc as a single long line without line breaks.

michalmuskala

michalmuskala

Why not use one of the string functions? I understand you want to get an enumerable of lines from the string. This can be achieved eagerly with String.split(str, "\n") or lazily with String.splitter(str, "\n").

NobbZ

NobbZ

Because String.split/2 and String.splitter/2 will remove the split-points, but the OP said he needs them intact.

aseigo

aseigo

Ok .. I think I have found the issue …

In every call to get a line, the process returned by StringIO.open does this:

defp io_request({:get_line, encoding, prompt}, s) do
    get_line(encoding, prompt, s)
end

get_line, in turn call Erlang’s :unicode.characters_to_list. This converts the whole bitstring to a list with the proper encoding.

If this succeeds, StringIO.do_get_line is called which iterates over the items in the list until it finds a termination (newline or no more data) and returns that line and the rest of the string. It then goes back to Erlang calling :unicode.characters_to_binary on both the line just retrieved and the remainder of the string.

Which means that the longer the string the bigger the lists and then resulting binaries that are being generated will be on each iteration requesting a line. I expect this is doing some unhappy things to the memory management. A potential fix would be to do the conversion to a list once and keep that in the state data of the StringIO process and then iterate over that one line at a time.

antoine

antoine

Thanks for this post, it was very usefull !

But I think the post marked as solution is not the more appropriate.
The solution using String.splitter(str, "\n") seems to behave more as expected.

Explanations:
When using this:

str = "abc\ndef\nghi\n"

{:ok, stream} = str |> StringIO.open()

s = stream |> IO.binstream(:line)

it do not work as expected:

iex> s |> Enum.take(1)
["abc\n"]
iex> s |> Enum.take(1)
["def\n"]

=> The result should always be the same as it’s the same operation.

Like we have here:

iex> s = File.stream!("/tmp/foo.csv")
iex> s |> Enum.take(1)
["hey\n"]
iex> s |> Enum.take(1)
["hey\n"]

Instead, String.splitter(str, "\n"), do the job as expected:

iex> str = "abc\ndef\nghi\n"
iex> s = String.splitter(str, "\n")

iex> s |> Enum.take(1)
["abc"]
iex> s |> Enum.take(1)
["abc"]

Where Next?

Popular in Questions Top

aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
electic
Hi, I am new to Elixir. I am trying to use the DateTime component to insert a date into MySQL however the there seems to be no way to fo...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
shahryarjb
Hello, I get Persian date from my client and convert it to normal calendar like this: def jalali_string_to_miladi_english_number(persi...
New
JulienCorb
I am trying to implement my new.html.eex file to create new posts on my website. new.html.eex: &lt;h1&gt;Create Post&lt;/h1&gt; &lt;%= ...
New
script
If I have a string “1000 cfu/ml” . I want to remove the characters and / and space . So the string is like this "1000" What is the ...
New
nobody
Hi! In PHP: $_SERVER[‘SERVER_ADDR’] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
komlanvi
Hi everyone, I was playing with phoenix liveView but I run into an issue. I have a form and want to validate each input text when the te...
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New

Other popular topics Top

albydarned
Hello all! I am typing this post from my new MacBook Pro with the M1 chip. I’m loving it so far, and will probably use it as my daily dr...
New
AstonJ
Posting this to see if we can make things easier for people to get into Neovim. If you use Neovim and have a favourite distro please let ...
New
ovidiubadita
Hey all, I discovered Elixir and I love it. I always wanted to learn a functional programming and I intended to go for Haskell, but afte...
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
AngeloChecked
What learn first? Rust or Elixir Hi Elixir community! I’m here because i want learn a new language. I’m a junior developer and mainly i ...
New
alice
Hey, Just curious what are the main benefits of Elixir compared to Clojure? When is Elixir more useful than Clojure and vice versa? Th...
New
nobody
Hi! In PHP: $_SERVER[‘SERVER_ADDR’] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New
PeterCarter
There are pre-rolled solutions for other frameworks that do work. However, Phoenix does not seem to have these. Have people had good expe...
New
AstonJ
Seen any cool LiveView demos, sample apps or examples? Please post them here! :003:
New

We're in Beta

About us Mission Statement