String.split edge case on weird empty space. How can I solve for this?

The string I’m working with seems to have this weird space character in the text that makes the String.split not work properly.

IO.inspect("Size is: #{size_string}")
IO.inspect(String.length(size_string))
IO.inspect(String.graphemes(size_string))
IO.inspect(String.split(size_string, " "))
IO.inspect(String.split("483.64 MB", " "))

"Size is: 483.64 MB"
9
["4", "8", "3", ".", "6", "4", " ", "M", "B"]
["483.64 MB"]
["483.64", "MB"]

You can see the param string doesn’t split properly. But if I type it in manually, the split actually works.

The param comes from HTML and I’m using Floki to fetch it. It looks like this: 483.64 MB

The solve here seems to be to split using Regex and split on all whitespace characters.

[size, unit] = String.split(size_string, ~r/\p{Zs}+/u)

Hi,

just for information, you had a non-breaking space (U+A0) here, because of this  , not a regular space (U+20) so, as you found out, you need to split with a regex (~R/\s+/u - in Unicode mode) or explicitly remove/replace it (eg: String.replace(string, "\u00A0", " "))

1 Like