Inconsistencies between Elixir/Phoenix and Elixre

jxxcarlson · April 11, 2017, 5:39pm

I am using a regular expression in the following

Regex.scan(~r/^\* (\S.*)$/m, text)

In Elixre, the regex recognizes what should, e.g.,

* Google: parsec parser combinator
blah blah

However, in Phoenix, it does not. If I replace ‘*’ by ‘-’ in both the regex and the text, it works fine. What is wrong?

grych · April 11, 2017, 6:00pm

I can’t see any difference.

IEx:

iex(1)> text = "* Google: parsec parser combinator"
"* Google: parsec parser combinator"
iex(2)> Regex.scan(~r/^\* (\S.*)$/m, text)         
[["* Google: parsec parser combinator", "Google: parsec parser combinator"]]

Elixre shows the same output:

["* Google: parsec parser combinator",
 "Google: parsec parser combinator"]

Maybe the issue is not within Regexp, but somewhere else in your code?

NobbZ · April 11, 2017, 6:00pm

WFM:

iex(1)> text = """
...(1)> * Google: parsec parser combinator
blah blah
...(1)> """
"* Google: parsec parser combinator\nblah blah\n"
iex(2)> Regex.scan(~r/^\* (\S.*)$/m, text)
[["* Google: parsec parser combinator", "Google: parsec parser combinator"]]

which version of OTP and elixir are you on?

edit

I just checked sources of the application. elixre itself does require elixir version ~> 1.4 in its mix.exs, so it is very likely that you do have an older version which has a bug in its unescape-map?

jxxcarlson · April 11, 2017, 6:20pm

I am using Elixir 1.4. How do I check on the OTP version?

Here is a test I ran:

test “regex” do

  text = """
  foo + bar = foobar
  "* Google: parsec parser combinator"
    blah blah
 """

  result = Regex.scan(~r/^\* (\S.*)$/m, text)
  assert result != []
  [_| target] = result
  assert target == "Google: parsec parser combinator"

end

And here is the result:

  1) test regex (ExperimentalTest)
 test/lib/mu/experimental_test.exs:5
 Assertion with != failed, both sides are exactly equal
 code: result != []
 left: []
 stacktrace:
   test/lib/mu/experimental_test.exs:14: (test)

Finished in 0.05 seconds
1 test, 1 failure

NobbZ · April 11, 2017, 6:22pm

"\"* Google: parsec parser combinator\"" does NOT match your given regex!

grych · April 11, 2017, 6:25pm

You have an error in your regexp, or the test data is wrong.
Based on the test data, I think you mean:

Regex.scan(~r/\* (\S.*)/m, text)

^ and $ matches the begin and the end of the line.

jxxcarlson · April 11, 2017, 6:45pm

Yes, my first test data was wrong. I revised it, but still get a “fail” (see below). My intent is to recognize lines which begin with “*” followed by a space – hence the ^ and the $

 test "regex with *" do

   text1 = """

foo + bar = foobar

Google: parsec parser combinator
blah blah
“”"

 text2 = "yada yada\n* Google: parsec parser combinator\nfoo, bar"
 result = Regex.scan(~r/^\* (\S.*)$/m, text2)
 assert result != []
 [_| target] = result
 assert target == "Google: parsec parser combinator"

end

The test result is

  1) test regex with * (ExperimentalTest)
 test/lib/mu/experimental_test.exs:6
 Assertion with == failed
 code:  target == "Google: parsec parser combinator"
 left:  []
 right: "Google: parsec parser combinator"
 stacktrace:
   test/lib/mu/experimental_test.exs:18: (test)

The Elixre test for ^\* (\S.*)$ with option m and text

yada
* Google: parsec parser combinator
blah blah

is

# yada
# * Google: parsec parser combinator
# blah blah
# 

["* Google: parsec parser combinator",
"Google: parsec parser combinator"]

grych · April 11, 2017, 6:52pm

Regex.scan returns a list of lists (here: [["* Google: parsec parser combinator", "Google: parsec parser combinator"]]). What you want to do is:

[[_| target]] = result

jxxcarlson · April 11, 2017, 6:55pm

Ah – yes! Thankou!!

grych · April 11, 2017, 7:03pm

Or maybe you wanted to use Regex.run instead of scan? https://hexdocs.pm/elixir/Regex.html#run/3

jxxcarlson · April 11, 2017, 7:11pm

Actually, scan was what I wanted – I’m just being stupid. I am using scan in the app, but there is some weird interaction between some of the regex functions I use.

I have a question. I have little markup language in my app that runs a pipeline of regex-and-replace functions. Very crude, but it works pending building a real parser. I would like to move the regexes into a library. At the moment I have this:

defmodule MU.Regex do

  def unordered_list_item_regex do
      ~r/^\* (\S.*)$/m
  end

end

Is this the best way to proceed? My library of “constants” would basically be a module of constant functions.

OvermindDL1 · April 11, 2017, 7:15pm

Looks good to me, and I’m pretty sure the regex’s will be compiled inline and become constants in the module BEAM so it should work quite well.

jxxcarlson · April 11, 2017, 7:16pm

Great – thanks!