What is the best way for Parsing a string and convert to map

shahryarjb · July 18, 2023, 8:24pm

Hello, I wanted to use the NimbleParsec library to convert a string into the map I need. But I think I got it wrong and this is more for me to equle with something and not to make the output I need like regex. it’s true?

For example I have this string

"sanitize(trim, lowercase) validate(not_empty, max_len = 20)"
# And I need something like this:
# Output: %{sanitize: ['trim', 'lowercase'], validate: ['not_empty', {'max_len', '20'}]}

It should be a bit dynamic for example:

"sanitize(trim, lowercase, downcase, replace= T) validate(not_empty, max_len = 20, not_integer)"
# Output: %{sanitize: ['trim', 'lowercase', 'downcase', {'replace', 'T'}], validate: ['not_empty', {'max_len', '20'}, 'not_integer']}

Now, without regex is there a good way? or NimbleParsec can create something like this?

It should be noted I have many items for each part like `trim, lowercase, downcase and etc…

Thank you in advance

hst337 · July 18, 2023, 9:56pm

NimbleParsec is okay for this task. You can handroll the parser yourself, since it is pretty simple parser with only two different states.

shahryarjb · July 19, 2023, 4:27am

Hi @hst337, you mean the the string convertor should be for example regex? or string pattern and function?
And what is the place of NimbleParsec here?

would you mind giving me an example please, what you meant?
Thank you

cmo · July 19, 2023, 6:55am

With NimbleParsec, you create rules for the things you want to parse and then define a parser, e.g parsing a gmail-like search string. Should be pretty straight forward to parse your text format.

  ...

  field =
    choice([
      string("message:") |> replace(:message),
      string("user:") |> replace(:user),
      string("ip:") |> replace(:ip)
    ])
    |> ignore(optional(whitespace))

  param =
    repeat(
      lookahead_not(concat(optional(whitespace), choice([field, value_field, time_field])))
      |> optional(whitespace)
      |> utf8_string([not: ?\s], min: 1)
    )
    |> ignore(choice([whitespace, eos()]))
    |> post_traverse(:join)

  ...

  defparsec :parse_search_string,
            ignore(optional(whitespace))
            |> times(
              choice([
                concat(field, param),
                value_field_param,
                time_field
              ])
              |> post_traverse(:group)
              |> ignore(optional(whitespace)),

D4no0 · July 19, 2023, 7:30am

I think he means using lex/yacc, in theory it should be very easy and pretty versatile implementation.

dimitarvp · July 19, 2023, 10:48am

If you will never parse anything more complex than that then a plain String.split with a regex should be fine forever.

Though never extending such code is a rarity so I’ll support the others that say that you should indeed do this with NimbleParsec.

shahryarjb · July 19, 2023, 10:56am

I always wanted to use NimbleParsec, but I do not know why I can not understand it . it is good try to learn more, if I can find a good video for it, it can be help full

dimitarvp · July 19, 2023, 10:59am

Yeah I can relate, it can be a bit arcane. I had some threads saved where people explain in more detail but couldn’t find them easily. Try searching for the nimbleparsec tag or just “NimbleParsec” in the title/text.

LostKobrakai · July 19, 2023, 11:32am

I found this to be a good stepping stone, as it explains the ideas behind parser combinators instead of showing how to use a certain library: Saša Jurić - Parsing from first principles - WebCamp Zagreb 2019 - YouTube

guibbv20111 · July 19, 2023, 10:13pm

Hi, you can change this input ? like this “sanatize: arga, argb”

benwilson512 · July 19, 2023, 10:36pm

I do not understand what you are asking, how does this relate to what the poster is trying to do?

shahryarjb · July 19, 2023, 10:41pm

could you explain why should I change the string? please?

guibbv20111 · July 19, 2023, 11:37pm

Sorry, I wasn’t clear, depending on the string we need to go through more steps… I’m testing on a smartphone and maybe you can do it like this:

“sanitize(a, b), validate(c, d)”
|> String.splitter([“(”, “)”], trim: true)
|> Enum.map(&String.trim(&1))
|> Enum.map(& if String. contains?(&1, “,”) == true,
do: String.split(&1, [“,”, " "], trim: true),
else: String.to_atom(&1)
|> Enum.chunk_every(2)
|> Enum.map(&List.to_tuple(&1))
|> Map.new

Sorry again to very poor code…

hst337 · July 19, 2023, 11:42pm

The shortest solution is

for line in String.split(input, ")"), do: Code.string_to_quoted!(line <> ")")

shahryarjb · July 20, 2023, 8:39am

Yes and thank you, but NimbleParsec always remains a problem for me! I tried several times to solve this with NimbleParsec that were not successful even at the beginning :))

  date =
    choice([
      string("validate"),
      string("sanitize")
    ])

for example it returns

{:ok, ["sanitize"],
 "(trim, lowercase, downcase, replace= T) validate(not_empty, max_len = 20, not_integer)",
 %{}, {1, 0}, 8}

but I could not be able to get validate or separate the (_). or if user just put the validate, how can I do it

LostKobrakai · July 20, 2023, 9:19am

NimbleParsec is not regex. It’s not “searching” through your input.

A parser works by going though the input string front → end and at each step you need to define what can be matched next. In your example you tell it to match either "validate" or "sanitize". It starts from the front, finds "sanitize" and then the next char is (, which matches neither of your expected values, so parsing halts.

NimbleParsec — NimbleParsec v1.3.1 has an example of something similar, parsing things wrapped in " instead of parenthesis.