What is the best way for Parsing a string and convert to map

Hello, I wanted to use the NimbleParsec library to convert a string into the map I need. But I think I got it wrong and this is more for me to equle with something and not to make the output I need like regex. it’s true?

For example I have this string

"sanitize(trim, lowercase) validate(not_empty, max_len = 20)"
# And I need something like this:
# Output: %{sanitize: ['trim', 'lowercase'], validate: ['not_empty', {'max_len', '20'}]}

It should be a bit dynamic for example:

"sanitize(trim, lowercase, downcase, replace= T) validate(not_empty, max_len = 20, not_integer)"
# Output: %{sanitize: ['trim', 'lowercase', 'downcase', {'replace', 'T'}], validate: ['not_empty', {'max_len', '20'}, 'not_integer']}

Now, without regex is there a good way? or NimbleParsec can create something like this?

It should be noted I have many items for each part like `trim, lowercase, downcase and etc…

Thank you in advance

NimbleParsec is okay for this task. You can handroll the parser yourself, since it is pretty simple parser with only two different states.

1 Like

Hi @hst337, you mean the the string convertor should be for example regex? or string pattern and function?
And what is the place of NimbleParsec here?

would you mind giving me an example please, what you meant?
Thank you

With NimbleParsec, you create rules for the things you want to parse and then define a parser, e.g parsing a gmail-like search string. Should be pretty straight forward to parse your text format.

  ...

  field =
    choice([
      string("message:") |> replace(:message),
      string("user:") |> replace(:user),
      string("ip:") |> replace(:ip)
    ])
    |> ignore(optional(whitespace))

  param =
    repeat(
      lookahead_not(concat(optional(whitespace), choice([field, value_field, time_field])))
      |> optional(whitespace)
      |> utf8_string([not: ?\s], min: 1)
    )
    |> ignore(choice([whitespace, eos()]))
    |> post_traverse(:join)

  ...

  defparsec :parse_search_string,
            ignore(optional(whitespace))
            |> times(
              choice([
                concat(field, param),
                value_field_param,
                time_field
              ])
              |> post_traverse(:group)
              |> ignore(optional(whitespace)),
              
2 Likes

I think he means using lex/yacc, in theory it should be very easy and pretty versatile implementation.

1 Like

If you will never parse anything more complex than that then a plain String.split with a regex should be fine forever.

Though never extending such code is a rarity so I’ll support the others that say that you should indeed do this with NimbleParsec.

1 Like

I always wanted to use NimbleParsec, but I do not know why I can not understand it :pleading_face:. it is good try to learn more, if I can find a good video for it, it can be help full

Yeah I can relate, it can be a bit arcane. I had some threads saved where people explain in more detail but couldn’t find them easily. Try searching for the nimbleparsec tag or just “NimbleParsec” in the title/text.

1 Like

I found this to be a good stepping stone, as it explains the ideas behind parser combinators instead of showing how to use a certain library: Saša Jurić - Parsing from first principles - WebCamp Zagreb 2019 - YouTube

1 Like

Hi, you can change this input ? like this “sanatize: arga, argb”

I do not understand what you are asking, how does this relate to what the poster is trying to do?

1 Like

could you explain why should I change the string? please?

Sorry, I wasn’t clear, depending on the string we need to go through more steps… I’m testing on a smartphone and maybe you can do it like this:

“sanitize(a, b), validate(c, d)”
|> String.splitter([“(”, “)”], trim: true)
|> Enum.map(&String.trim(&1))
|> Enum.map(& if String. contains?(&1, “,”) == true,
do: String.split(&1, [“,”, " "], trim: true),
else: String.to_atom(&1)
|> Enum.chunk_every(2)
|> Enum.map(&List.to_tuple(&1))
|> Map.new

Sorry again to very poor code…

The shortest solution is

for line in String.split(input, ")"), do: Code.string_to_quoted!(line <> ")")
3 Likes

Yes and thank you, but NimbleParsec always remains a problem for me! I tried several times to solve this with NimbleParsec that were not successful even at the beginning :))

  date =
    choice([
      string("validate"),
      string("sanitize")
    ])

for example it returns

{:ok, ["sanitize"],
 "(trim, lowercase, downcase, replace= T) validate(not_empty, max_len = 20, not_integer)",
 %{}, {1, 0}, 8}

but I could not be able to get validate or separate the (_). or if user just put the validate, how can I do it

NimbleParsec is not regex. It’s not “searching” through your input.

A parser works by going though the input string front → end and at each step you need to define what can be matched next. In your example you tell it to match either "validate" or "sanitize". It starts from the front, finds "sanitize" and then the next char is (, which matches neither of your expected values, so parsing halts.

NimbleParsec — NimbleParsec v1.3.1 has an example of something similar, parsing things wrapped in " instead of parenthesis.

3 Likes