Inconsistencies between Elixir/Phoenix and Elixre

I am using a regular expression in the following

Regex.scan(~r/^\* (\S.*)$/m, text)

In Elixre, the regex recognizes what should, e.g.,

* Google: parsec parser combinator
blah blah

However, in Phoenix, it does not. If I replace ‘*’ by ‘-’ in both the regex and the text, it works fine. What is wrong?

I can’t see any difference.

IEx:

iex(1)> text = "* Google: parsec parser combinator"
"* Google: parsec parser combinator"
iex(2)> Regex.scan(~r/^\* (\S.*)$/m, text)         
[["* Google: parsec parser combinator", "Google: parsec parser combinator"]]

Elixre shows the same output:

["* Google: parsec parser combinator",
 "Google: parsec parser combinator"]

Maybe the issue is not within Regexp, but somewhere else in your code?

2 Likes

WFM:

iex(1)> text = """
...(1)> * Google: parsec parser combinator
blah blah
...(1)> """
"* Google: parsec parser combinator\nblah blah\n"
iex(2)> Regex.scan(~r/^\* (\S.*)$/m, text)
[["* Google: parsec parser combinator", "Google: parsec parser combinator"]]

which version of OTP and elixir are you on?

edit

I just checked sources of the application. elixre itself does require elixir version ~> 1.4 in its mix.exs, so it is very likely that you do have an older version which has a bug in its unescape-map?

2 Likes

I am using Elixir 1.4. How do I check on the OTP version?

Here is a test I ran:

test “regex” do

  text = """
  foo + bar = foobar
  "* Google: parsec parser combinator"
    blah blah
 """

  result = Regex.scan(~r/^\* (\S.*)$/m, text)
  assert result != []
  [_| target] = result
  assert target == "Google: parsec parser combinator"

end

And here is the result:

  1) test regex (ExperimentalTest)
 test/lib/mu/experimental_test.exs:5
 Assertion with != failed, both sides are exactly equal
 code: result != []
 left: []
 stacktrace:
   test/lib/mu/experimental_test.exs:14: (test)

Finished in 0.05 seconds
1 test, 1 failure

"\"* Google: parsec parser combinator\"" does NOT match your given regex!

2 Likes

You have an error in your regexp, or the test data is wrong.
Based on the test data, I think you mean:

Regex.scan(~r/\* (\S.*)/m, text)

^ and $ matches the begin and the end of the line.

Yes, my first test data was wrong. I revised it, but still get a “fail” (see below). My intent is to recognize lines which begin with “*” followed by a space – hence the ^ and the $

 test "regex with *" do

   text1 = """

foo + bar = foobar

  • Google: parsec parser combinator
    blah blah
    “”"

     text2 = "yada yada\n* Google: parsec parser combinator\nfoo, bar"
     result = Regex.scan(~r/^\* (\S.*)$/m, text2)
     assert result != []
     [_| target] = result
     assert target == "Google: parsec parser combinator"
    

    end

The test result is

  1) test regex with * (ExperimentalTest)
 test/lib/mu/experimental_test.exs:6
 Assertion with == failed
 code:  target == "Google: parsec parser combinator"
 left:  []
 right: "Google: parsec parser combinator"
 stacktrace:
   test/lib/mu/experimental_test.exs:18: (test)

The Elixre test for ^\* (\S.*)$ with option m and text

yada
* Google: parsec parser combinator
blah blah

is

# yada
# * Google: parsec parser combinator
# blah blah
# 

["* Google: parsec parser combinator",
"Google: parsec parser combinator"]

Regex.scan returns a list of lists (here: [["* Google: parsec parser combinator", "Google: parsec parser combinator"]]). What you want to do is:

[[_| target]] = result
1 Like

Ah – yes! Thankou!!

Or maybe you wanted to use Regex.run instead of scan? https://hexdocs.pm/elixir/Regex.html#run/3

Actually, scan was what I wanted – I’m just being stupid. I am using scan in the app, but there is some weird interaction between some of the regex functions I use.

I have a question. I have little markup language in my app that runs a pipeline of regex-and-replace functions. Very crude, but it works pending building a real parser. I would like to move the regexes into a library. At the moment I have this:

defmodule MU.Regex do

  def unordered_list_item_regex do
      ~r/^\* (\S.*)$/m
  end

end

Is this the best way to proceed? My library of “constants” would basically be a module of constant functions.

Looks good to me, and I’m pretty sure the regex’s will be compiled inline and become constants in the module BEAM so it should work quite well.

Great – thanks!