Parsing a single Elixir expression from a string which may contain extra content afterwards

Suppose I want to embed literal Elixir code inside some other file format that already exists. To make it more concrete, I’m parsing (or @kip is, but we’re kinda working together) the ICU message format. The ICU message format supports interpolation using the following syntax:

"Good day, {username}"

as well as more complex elements such as plurals:

"{n, plural, one {Un ejemplo} other {# ejemplos}}

If I wantet to support something like

"Good day, {@username}"

or

"Good day, {get_username_from_context()}"

it would be helpful to have a function that parses the first valid elixir expression out of a string and returns the rest of the string. For example:

iex> Code.read_first_expression("a + b(x) the rest is not valid Elixir")
{quote(do: a + b(x)), " the rest is not valid Elixir"}

iex> Code.read_first_expression("a + b(x), plural, ...}")
{quote(do: a + b(x)), ", plural, ...}"}

Currently the only way I can think of doing this is by performing a binary search on the string while using Code.string_to_quoted/1 as an oracle.

NB: I’m aware I can do much better with simple heuristics, I’m asking more out of curiosity. NO I CAN’T AND THIS IS STUPID

For simple expressions Regex the dynamic bits then evaluate with a scope?

Hi {username}

Hi <> scope[username]

For complex ones define your own structure and parse accordingly?

These are guesses from templating stuff I did many years ago

How should it parse the following:

  • a sentence like this - this is valid Elixir expression which is equivalent to a(sentence(like(this)))
  • 1 + 2; maybe?
  • test a + b(x), plural, ... - test(a + b(x), plural, ...) is also valid Elixir

SO in short, unless you impose some restrictions, it is not feasible to extract “minimal leading valid Elixir expression” out of string.

3 Likes

None of the languages I worked with had plain sentences being possible as functions true but unless you see handlebars, you would treat the expression as string.

You might be overthinking it. Looking at the examples on the page you have given.

messages_en.yaml

film_won_awards: >
    All said, {n, plural, 
        one {{film} won # award}
        other {{film} won # awards}}

plural-w-interpolation-view.js

// in UI
format('film_won_awards', { film: 'The Godfather', n: 27 });

// in English => "All said, The Godfather won 27 awards"

My notes

1st scan
... {integer, method, [args]}...

for args do
  key<space>{text}
  parse text -> extract variables with handlebars
end

create map, execute methods with the right variables

scope.<plural>(number: <integer>, key: <key>, [exctracted_vars...]

Overriding CLDR Plural Rules

We can customize our messages using single number specifiers. We use the =42 syntax to provide these custom forms, which override CLDR locale forms like one or other.

message_en.yaml

film_won_awards: >
    All said, {n, plural, 
        =0 {{film} did not win any awards}
        one {{film} won # award}
        other {{film} won # awards}}

plural-w-overrides.js

// in UI
format('film_won_awards', { film: 'The Godfather', n: 0 });

// in English => "All said, The Godfather did not win any awards"

I would start with the easiest example and slowly build up the cases on that page with tests.

I have never used internalization but gettext already does something similar for singular, plural, have a dig through the codebase too.

When you definitely need to use eval

https://hexdocs.pm/elixir/Code.html#eval_string/3

Good luck!

I’m sorry, but I don’t get what you mean by your post.

1 Like

I thought I was replying to you but in fact I was replying to someone else’s comment. Please ignore.

Well. IEx does something like this.
You break things by a new line until a valid expression is found, and it is evaluated. I think a similar approach can be applied to this case

IEx does the opposite - it checks if the given value as as a whole is valid Elixir expression, and if not, then it will request more data. @tmbb wanted to extract the minimal prefix expression (at least it is how I understand it).

So in IEx when you have a + b(x) the rest is not valid Elixir it will fail to parse it, because it cannot parse it as a whole. @tmbb wanted to split it into a + b(x) and rest of the string.

1 Like