Replace string if found on some patterns

yudy · March 7, 2019, 3:21pm

Hi all, I have a case where I need to replace the string if it matches with the pattern. The pattern itself is inside the string. Below is the string

docFmt = "{np}-{dst}{don}/{do:SJ,inv:INV}/SFG/{do:ST,inv:FGH}"

Basically I need to replace those string if it’s inside {}.
It has two pattern here:
1. One item inside {}, for example {np}. I can simply use String.replace("{np}", “BARCODE”) here.
2. Two items inside {}, for example {do:SJ,inv:INV}. We can say key and value here. For this one, it will replace {do:SJ,inv:INV} to SJ if current state is “do” and it will be replaced to INV if current state is “inv”.
The same with {do:ST,inv:FGH}. It will replace to ST if current state is “do” and it will be replaced to FGH if current state is “inv”.

Thanks
Yudy

Qqwy · March 7, 2019, 3:50pm

I wonder: What is the reason behind doing this? I just want to ensure that you’re not falling into the XY-problem. There might be a much simpler solution to the general problem you’re having.

As for the problem with these particular kinds of strings, I think you have two options:

Try to concoct some Regular-Expression-based solution. Probably very difficult and unreadable.
Write your own parser that works with this syntax. Probably more work, but maybe this is the only way to do it for all your inputs. (This depends on, for instance, if the syntax is always {do:...,inv:...} or if there are other inputs or orders possible as well).

yudy · March 7, 2019, 4:12pm

Oh, here is the case.
I have an input form to be used by customer. This input form actually is the numbering format. Instead I offer standard pattern, I would like to give more flexible way for user to create their own format.

Below is the example for using this (dynamic) format
Let say user A want the format like this (on simple example) "{pc}-{pn}". Where {pc} = product code and {pn} = product name.

Given that format and data. The final result will be like this (example) “0900-Coca cola”

But this format can be used on different situation (depend on the state).
For example "{pc}-{pn}-{n:p, p:pp}". Where:
n = normal
p = price
p = promo
pp = promo price

So, if current state is normal(n), the format above will be changed to "{pc}-{pn}-{n}". So when I have variable normalPrice, I can use String.replace("{n}", normalPrice)

The key-value itself can be differ in one line (not limited to n and p).

Yes I have tried to create my own parsing and give up. Maybe I just don’t know how to do it or it’s impossible to do it?

al2o3cr · March 7, 2019, 4:29pm

One approach would be to start by splitting the “static” parts from the “dynamic” parts (with {}), something like:

iex(30)> s = "{pc}-{pn}-{n:p, p:pp}"                                 
"{pc}-{pn}-{n:p, p:pp}"
iex(31)> String.split(s, ~r/({[a-z, :]+})/, include_captures: true)  
["", "{pc}", "-", "{pn}", "-", "{n:p, p:pp}", ""]

That breaks the problem down into three pieces:

split the parts
handle each dynamic part individually, producing a string
join the static & dynamic parts together

yudy · March 9, 2019, 7:33am

@al2o3cr, thanks for this, it’s help me a lot.
I ended up kind of like below. It’s welcome for correction to be shorter / cleanliness.

        docFmt = "{np}-{dst}{don}/{do:DO,inv:INV}/SG-TAN/{do:DT,ind:INT}"
        state = "do"
        
        dynamicInput = String.split(docFmt, "{") 
                  |> Enum.map(fn(x) -> String.split(x, "}") end) 
                  |> Enum.map(fn(x) -> String.trim(Enum.at(x,0)) end)
                  |> Enum.reject(fn x -> (x == "" or x == nil) or (x != nil and not String.contains?(x, ",")) end)
        
        listDynInput = Enum.reduce dynamicInput, [], fn dyInput, arr ->
            splitDi = String.split(dyInput, ",")
            arr ++ [[dyInput,splitDi]]
        end

        arrStrReplacement = Enum.reduce listDynInput, [], fn arrDiInput, arr ->
          replacedInp = Enum.reduce Enum.at(arrDiInput, 1), [], fn strReplace, arrRepInp ->
            listStr = String.split(strReplace, ":")
            tmpArr = if Enum.at(listStr, 0) == state do
              [["{"<>Enum.at(arrDiInput, 0)<>"}", Enum.at(listStr, 1)]]
            end
            arrRepInp ++ tmpArr
            
            if tmpArr != nil, do: arrRepInp ++ tmpArr, else: arrRepInp

          end
          arr ++ replacedInp
        end

        newDocFmt = Enum.reduce(arrStrReplacement, docFmt, fn [old, new], docFmt -> String.replace(docFmt, old, new) end)
        
        IO.inspect docFmt
        IO.inspect newDocFmt

The result for last IO.inspect are:
“{np}-{dst}{don}/{do:DO,inv:INV}/SG-TAN/{do:DT,ind:INT}”
“{np}-{dst}{don}/DO/SG-TAN/DT”

amnu3387 · March 9, 2019, 2:32pm

I think you can simplify the logic a bit by using Regex.replace/3 which takes a function to apply to the matches:

defmodule Testing do
  def format_string(string, state) do
    Regex.replace(~r/\{(.*?)\}/, string, fn(_match, capture) ->
      apply_format(capture, state)
    end)   
  end

  def apply_format(capture, state) do
    case String.split(capture, ",", trim: true) do
      [val] -> "{" <> val <> "}"
      [_|_] = split -> extract_correct_state(split, state)
    end
  end

  def extract_correct_state([], state), do: "{MISSING:#{state}}"
  def extract_correct_state([h | t], state) do
    case String.split(h, ":", trim: true) do
     [^state, rem] -> rem
      _ -> extract_correct_state(t, state)
    end
  end
end

test = "{np}-{dst}{don}/{do:DO,inv:INV}/SG-TAN/{do:DT,ind:INT}"
state = "do"
Testing.format_string(test, state)

#> "{np}-{dst}{don}/DO/SG-TAN/DT"

test = "{np}-{dst}{don}/{missing:DO,inv:INV}/SG-TAN/{do:DT,ind:INT}"
Testing.format_string(test, state)

#> "{np}-{dst}{don}/{MISSING:do}/SG-TAN/DT"

If you can rely on the formatting and correctness of the original, then this would be fine, if not you’ll need to add some more handling on both apply_format and extract_correct_state

yudy · March 12, 2019, 3:20pm

@amnu3387 this is neat and awesome. Thank you very much for this. I really didn’t know it can be this simple and short. Two thumbs up for this.

Btw, I don’t get your last comment

If you can rely on the formatting and correctness of the original, then this would be fine, if not you’ll need to add some more handling on both apply_format and extract_correct_state

Could you please help to describe / give the example?

amnu3387 · March 12, 2019, 5:26pm

Great that it helped.

Well, I just meant that if you know you won’t be getting possibly wrong formats like {,} then it’s fine, if not for instance that would result in no matches on the string split in apply_format which would raise an error and other cases like that.

As an addendum, if there’s any sort of specification that you can rely on the formatting, for e.g. the state will always be either 2 or 3 characters long then you can improve further the way it works. If you know it’s always either 2 or 3 then you can improve a bit that to not split when not needed when there isn’t a list of values, and to not require splitting to match the correct state.

defmodule Testing do
  def format_string(string, state) do
    Regex.replace(~r/\{(.*?)\}/, string, fn
	(_, <<n_state :: binary-size(2)>>) -> "{" <> n_state <> "}"
	(_, <<n_state :: binary-size(3)>>) -> "{" <> n_state <> "}"
	(_, capture) ->
      		apply_format(capture, state)
    end)   
  end

  def apply_format(capture, state) do
     capture
     |> String.split(",", trim: true) 
     |> extract_correct_state(state)
  end

  def extract_correct_state([], state), do: "{MISSING:#{state}}"
  def extract_correct_state([<<state :: binary-size(2)>> <> ":" <> rem | t], state), do: rem
  def extract_correct_state([<<state :: binary-size(3)>> <> ":" <> rem | t], state), do: rem
  def extract_correct_state([_|t], state), do: extract_correct_state(t, state)
end

Even the apply_format could be made to not require splitting, but it needs several clauses to work for 2 and 3 so without a macro it’s a bit boring to write by hand.

yudy · March 16, 2019, 7:41am

@amnu3387 thank you again for the great explanation. It seems you are not suggesting to do the splitting. I mean from previous example [val] -> "{" <> val <> "}" is simple enough. Is there any performance consideration or anything else so that you avoid to do splitting and change to binary size instead?

For the format, I think I will just process the correct format which is {,}. And simply not to continue if it’s not correct.