String handling question

Hi,

So I have a string-representation of an Elixir data-structure as can be seen in this link:

https://raw.githubusercontent.com/nhpip/misc/main/string_example

As you can see it has a lot of newline ‘\n’ characters. These are fine for the most part, except where they are contained in the binary. So the question is how do I remove them from the parts of the string that represent the binary, but not the rest of the string?

I’d thought of Code.eval_string and use the IO libraries to convert it back, but the only thing that does is remove all new-lines, plus the string normally contains pids in this form “#PID<0.1412.0>” so that doesn’t help.

I did cook-up this lambda to do it, but it seems like a simpler way must exist:

my_trim = fn(string) -> 
              {parsed,_} = String.codepoints(string) 
                 |> Enum.chunk_by(fn e -> e == "<" or e == ">" end)
                 |> Enum.map_reduce(:no, 
                              fn(block, _acc) when block == ["<", "<"] -> {List.to_string(block), :yes}; 
                                 (block, _acc) when  block == [">", ">"] -> {List.to_string(block), :no}; 
                                 (block, :yes = acc) -> {List.to_string(block) |> String.replace(["\n"," "], ""), acc}; 
                                 (block, acc) -> {List.to_string(block), acc} 
                  end)
                  List.to_string(parsed) 
           end

This is an interesting question but before we try to answer it specifically, can you elaborate on your use case? Trying to parse a string out of arbitrary Elixir code is going to be a lot harder than just having the string in a more easy to access format, is there no way around this problem?

EDIT: As a first step in solving the problem as stated I would check out this function: Code.string_to_quoted which will parse the Elixir code and turn it into AST. String literals are then pretty easy to pull out from there.

2 Likes

Where does this string come from?

Do you know

> Bin = term_to_binary(hello).
<<131,100,0,5,104,101,108,108,111>>
> hello = binary_to_term(Bin).
hello

?

Basically we convert the structure to a string that gets dispatched via a REST call to a python application for processing and display. I’d rather not change the python app since the original code owner has left the company. Plus my python skills are rusty at best.

If possible I’d use Erlang -- External Term Format
There are implementations for python available (never used one).

@nhpip I’m not asking why you want a string, I’m asking how the string ends up embedded in Elixir code.

1 Like

Oh, we have quite complex data structures that we want to send to the python app.

We use code like this to convert them:

Inspect.Algebra.group(Inspect.Algebra.to_doc(complex_structure %Inspect.Opts{limit: :infinity})) |> Inspect.Algebra.format(80) |> IO.iodata_to_binary

The inspect protocol is not an interchange format. What is complex_structure? You should use regular Elixir functions to turn that into a value that you send, you should not try to inspect it and then parse the inspect result.

3 Likes