Regular expression required to split a string into parts to bold what the user has entered

annad · March 16, 2024, 10:18pm

After wasting 30 minutes trying to refine a request on two AI engines (including ChatGPT), I’m less worried about AI taking over the world. Wondering if a human could help me come up with a regular expression that will work for my needs.

I have your basic Type-to-Select component. I just want to split a string into parts so that I can bold what the user has entered. I need a regular expression to use with String.split/3.Here’s the result I’m looking for.

There is a search_string and an input_string.
If the search_string is found at the beginning or end of the input_string, then I’d get 2 parts from String.split/3
If the search_string is found in the middle of the input_string, then I’d get a maximum of 3 parts from String.split/3.
It needs to be case-insensitive

For example:

search_string: "san"
input_string: "San Francisco, California, US"

I would get two parts: ["San", " Francisco, California, US"]
NOTE: Spaces need to be preserved for when I join it back together

search_string: "fran"

produces 3 parts: ["San ", "Fran", "cisco, California, US"]

search_string: "US"

produces 2 parts: ["San Francisco, California, ", "US"]

annad · March 16, 2024, 10:48pm

I may have a solution from claude.ai. This seems to work:

String.split(display_string, ~r/(\s*#{Regex.escape(input)}\s*)/i, parts: 3, include_captures: true, trim: true)

al2o3cr · March 16, 2024, 10:55pm

This will satisfy all the requirements you listed, but it has bugs (see below):

String.split(input_string, ~r{#{search_string}}i, trim: true, include_captures: true)

trim: true ensures that a match at the start of a string doesn’t produce a leading ""
include_captures: true puts the matches in the output

The first bug is that metacharacters in search_string will be interpreted normally, so searching for . splits the string into individual characters. This can be fixed with Regex.escape:

String.split(input_string, ~r/#{Regex.escape(search_string)}/i, trim: true, include_captures: true)

The second bug is that a repeated match will split into more fields:

search_string = "a"

produces ["S", "a", "n Fr", "a", "ncisco, C", "a", "liforni", "a", ", US"]

Setting parts can help, but has corner-cases with matches at the beginning / end.

A way to avoid that second bug is to write a regex with exactly what you mean:

Regex.run(~r/^(.*)(#{Regex.escape(search_string)})(.*)$/i, input_string, capture: :all_but_first)

This will require some cleanup for the leading + trailing cases, since .* can match "".

joelpaulkoch · March 16, 2024, 11:00pm

Do you absolutely need to solve it with a regex? Couldn’t you try pattern matching instead and then split accordingly?

dimitarvp · March 16, 2024, 11:52pm

OFF-TOPIC:

I am willing to bet $50 that within 24 hours somebody is going to post this to Reddit or 9GAG.

I really was not prepared to burst out laughing on the first post I see on ElixirForum today.

annad · March 17, 2024, 11:48pm

That works great! I tested it on some of the cities with long names and special characters and it broke it up into correct parts. You’re right in that some parts are empty strings, but I just Enum.filter those out.

One of my previous regex expressions was doing exactly what you described in the second bug. It was breaking the string into a part for every letter. Your suggestion works perfectly. Thank you so much!

AI: 0 Human: 1
Not that I’m keeping score.

annad · March 18, 2024, 12:02am

Here is the final code for anyone else trying to bold just part of a string based on user input. Please see “Solution” above by @al2o3cr for explanation of the regular expression.

defp matches?(search_result, input) do
    String.contains?(String.downcase(search_result), String.downcase(input))
  end

  defp format_search_result(display_string, input) do
    if matches?(display_string, input) do
        Regex.run(~r/^(.*)(#{Regex.escape(input)})(.*)$/i, display_string, capture: :all_but_first)
        |> Enum.filter(&(&1 != ""))
        |> Enum.map(fn part ->
              if String.match?(part, ~r/#{Regex.escape(input)}/i ) do
                "<strong>#{part}</strong>"
              else part end
           end)
        |> Enum.join()
    else
      display_string
    end
  end

al2o3cr · March 18, 2024, 1:16am

Oops, I skimmed past the part of your original post that mentioned why the string was being split apart.

The pattern “find matches and replace them” is common enough to have a name: String.replace.

Using String.replace, the result you’re looking for can be written:

String.replace(display_string, ~r/#{Regex.escape(input)}/i, "<strong>\\1</strong>", global: false)

Edit: forgot global: false!