Regex replace in string for chat

MarcusF · November 10, 2016, 5:37pm

I have a chat, works great, tracks users, rooms, etc. (Loving Elixir so far) but now I need to do some processing of the chat messages before publishing live.

I need to escape all html of course, but also implement a simple subset of “tags” such as [b\]some bold text[/b], [i\]Some italic text[/i] and so on.

Is there a recommended solution for something like this?

I’ve implemented it in a previous chat using a mixture of regex and counting matches, but not sure how to do it in Elixir

kip · November 11, 2016, 6:14am

Would you consider markdown instead of html-ish tags? You could then use a markdown library such as Earmark Thats what this forum uses, also what Slack, Github and others use so its quite a familiar syntax. Of course that assumes you want to render to HTML.

A second approach might be to roll your own lexer/parser using leex and yacc which are part of the Erlang tool chain. A little bit of a learning curve, but produces fast code and removes the uncertainties that creep in with regex processing.

And of course you can also do it with regexes in Elixir too.

MarcusF · November 11, 2016, 2:26pm

Thanks for the info kip, but I’m afraid I’m stuck with these tags as I’m replacing a chat for a client and they want to keep the existing use-format, otherwise I’d happily go with markdown.

The way I’ve done it before is to loop the string and replace as needed, with a final check to make sure everything is matching up, no stray < strong> tags etc, but with the immutability I can’t figure out how to do it.

kip · November 11, 2016, 2:54pm

Depending on how regular your markup is you could split the string and then process the elements as required. Something like:

iex(17)> String.split "[i]some text[\\i]", ~r/\[(\\)?.\]/, include_captures: true, trim: true
["[i]", "some text", "[\\i]"]

Then you can recurse over the markup and do what you want. Maybe that helps a bit?

MarcusF · November 11, 2016, 4:16pm

I have to try it, but that looks like what I need, thanks Kip, I appreciate it

Qqwy · November 11, 2016, 8:16pm

One of the things that a simple regular expression will not help you with, is making sure that tags match closing tags etc. Also if your syntax only becomes a little more complicated, the regexp will become unmaintainable.

I would suggest looking into e. g. Combine which lets you write parser combinators.

A parser combinator is a small function(/macro) that combines simpler parsers (giving each next one the yet unparsed input remainder of the previous one, or trying multiple after another on the same input until one matches, etc) to create a newer, slightly more complex one. These complexer ones can be combined in exactly the same way.

In this case, you’ d want a simple parser to parse a start tag like ˋ[i]ˋ, which could be built by trying to parse the single character ˋ[ˋ, then trying to parse one or more letters until the next character is not a letter, and then trying to parse the single character ˋ]ˋ. A similar one can be made for ˋ[/i]ˋ. Then, you could make a more complex parser that takes these two simple parsers, and matches anything in-between, as long as the start and end tags contain the same text.

Writing parsers like this has a little bit of a learning curve, but the end result will be a lot more readable/maintainable/extendable.