Complicated binary pattern matching in function heads

I’m trying to implement some functionality by pattern matching purely as a learning exercise to familiarize myself with bitstrings/charslists/binaries.

Imagine I want to write a function parse_symbol that takes the symbol for a security and parses it into it’s constituent parts.

A security symbol is a string that looks like this:
"NU 05/19/2023 4.50 P", or "AAPL 03/31/2023 145.00 C"

Each of those symbols is made up of {stock ticker} {expiration date} {strike price} {option type}

So AAPL 03/31/2023 145.00 C is an “apple call option, with a $145 strike price, and an expiration date of 03/31/2023”.

The stock ticker part is of variable length: it could be one character (F is the symbol for the Ford Motor Company) or multiple. The “Option type” could be "C" for a call option or "P" for a put.

It’s relatively simple to write a regex that parses out the components of the symbol. Is it possible to parse them out in a function head by pattern matching?

So something like:

def parse_symbol(<<ticker," ", month,"/",day,"/",year," ",strike," ",type>>) do
    %{ticker: ticker, strike: strike, type: type}
end

Warning: I know that the above code is probably broken/gobledeegook in all sorts of ways, I’m just trying to grok how binaries work :slight_smile:

1 Like

It is not possible to write a single pattern to parse this string. However, it is possible to write a pattern for dates matching:

[ticker, date, price, option] = String.split(input, " ")
<<month :: binary-size(2), "/", day :: binary-size(2), "/", year :: binary-size(4)>> = date
price = String.to_float(price) # Or decimal parse
[year, month, day] = Enum.map([year, month, day], &String.to_integer/1)
3 Likes

Thank you!

What is it that makes it impossible to write a single pattern to parse the string? Is it the variable length of the ticker & price?

Yes, it is variable length. Binary pattern matching was designed to help in the development of fixed-offset binary protocols.

3 Likes

I know you are doing this as an exercise, but to me this sounds like a good fit for a parser combinator. Look into how NimbleParsec handles it. You can view the compiled binary pattern matching clauses with debug: true. That might help you with your task.

2 Likes

You could if you wanted to… but you’d have to know min and max number of characters for the ticker length.

Example

# ticker of length = 2
<<ticker::binary-size(2)," ",rest::binary>> = "NU 05/19/2023 4.50 P"

#ticker of length = 4
<<ticker::binary-size(4)," ",rest::binary>> = "AAPL 03/31/2023 145.00 C"

So basically you’d have to create as many function heads for parse_symbol/1 as there are variations in the length of the ticker (assuming everything else is of constant length). But this assumption probably doesn’t hold true for the price (again, assuming you’re looking to extract it using pattern matching) - so you’d need to combine with some other approach if you need to read all values out of the security symbol

3 Likes

Thank you so much for this! Not just because it’s very cool, but because I didn’t know about the concept of parser combinators at all. Super powerful.

After reading the docs a bit…it begs the question: if a parser combinator is possible, should one even bother writing a regex?!