Phillipp

Phillipp

Cldr.Number.Parser.parse quite slow?

Hey,

I have an application that loads data from an XML API, parses that response using XmlToMap and then iterates over the parsed data to transform it into maps and generally a better structure for future processing.

Part of that transformation is to parse the values I get. They can be booleans, text, numbers and timestamps (also just numbers tho).

Here is what I did:

  def parse_value("true"), do: true
  def parse_value("True"), do: true
  def parse_value("TRUE"), do: true
  def parse_value("false"), do: false
  def parse_value("False"), do: false
  def parse_value("FALSE"), do: false
  def parse_value(value) do
    case Cldr.Number.Parser.parse(value) do
      {:ok, number} -> number
      _ -> value
    end
  end

Yes, not the prettiest but it did the job. Unfortunately, the Cldr.Number.Parser.parse slows everything down which leads to an execution time of the entire transformation of 5 seconds (my dev system) and 16 seconds (Raspberry Pi). This is pretty bad, because my plan was to fetch the data every 5 seconds and store it in an influxDB database.

Is there a way to improve the performance of Cldr.Number.Parser.parse or can I use something else? I chose Cldr.Number.Parser.parse because I can just throw stuff at it and it returns it, no matter if its an integer, or float, or whatever.

Comparison:

iex(23)> :timer.tc(MyApp, :parse_value, ["true"])
{3, true}
iex(21)> :timer.tc(MyApp, :parse_value, ["1"])
{8520, 1}

This function goes over my data map and returns a huge list of tuples.

iex(20)> :timer.tc(MyApp, :generate_influx_tuples, [data])
{584, [...]}

Marked As Solved

kip

kip

ex_cldr Core Team

@Phillip, I’ll take a look (I’m the author) and see if there is anything I can do to improve the performance. Feel free to add an issue so I can track it properly as well.

Note however that Cldr.Number.Parser.parse/2 is probably not the best tool for this job. If you know you are only receiving data that has no localisations in it (no separators, no localised decimal digits, …) then using the standard library tools would be better. For example the following code parsers both integers and floats that have no formatting in them (no localisations) in about 1.46 μs versus 1.01 μs for Float.parse/1:

def parse_number(x) do
  case Integer.parse(x) do
    {integer, ""} -> integer
    other -> case Float.parse(x) do
      {float, ""} -> float
      _other -> x
    end
  end
end

``Cldr.Number.Parser.parse/2` is designed to be quite resilient in the face of localised and formatted numbers and that means there is definitely more work going on. Its typical use is to enhance user experience when parsing user-provided text input.

Also Liked

kip

kip

ex_cldr Core Team

I have published ex_cldr version 2.17.1 which improves your example code by about 40x. The changelog entry is:

Bug Fixes

  • Significantly improve the performance of Cldr.default_locale/0. In previously releases, the default locale was being parsed on each access. In this release it is parsed once and cached in the application environment. This improves performance by about 40x. Thanks to @Phillipp who brought this to attention in Elixir Forum
11
Post #4
kip

kip

ex_cldr Core Team

I did some more digging here and the issue isn’t primarily number parsing. Its related to repeatedly parsing the default locale which happens here if one is not supplied and if one wasn’t set with Cldr.put_locale/1.

You can probably get a speed up of 40x with the following addition to your original code:

  # Assumes you are only using one locale
  @locale Cldr.Locale.new!("en", MyApp.Cldr)

  def parse_value(value) do
    case Cldr.Number.Parser.parse(value, locale: @locale) do
      {:ok, number} -> number
      _ -> value
    end
  end

I will fix the underlying issue here and publish a new version of ex_cldr.

Nevertheless, my suggestions in the previous message still hold.

kip

kip

ex_cldr Core Team

In this specific case it’s the system wide default so the process dictionary isn’t really suitable. The system wide default has always been in the app environment, but as a binary and therefore parsed on each access. The only change is to now also store the parsed version as well, which does save about 1ms per access which a really big win.

Where Next?

Popular in Questions Top

9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
siddhant3030
Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...
New
mcarvalho
What is the difference between System.get_env and Application.get_env? For example, what are best practices to use one versus another.
New
stefanchrobot
What’s the safe way to decode a JSON string into a struct? I want to avoid calling String.to_atom. Jason.decode can give me a map with st...
New
beno
I will often find my self writing things similar to: case some_value do nil -> something() "" -> something() _ -> someth...
New
LegitStack
I’m trying to make a websocket server in Phoenix or raw Elixir. I heard about gun, I think I could use cowboy, but since I’m not that sma...
New
ashish173
I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

Other popular topics Top

sorentwo
Hello! tl;dr Announcing Oban, an Ecto based job processing library with a focus on reliability and historical observability. After spen...
985 42842 311
New
siddhant3030
Hi, I have to write a raw query for one of my project. But till now I have used ecto queries and don’t have much experience writing raw ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
vrod
I am using the Starship cross-shell prompt – it seems pretty nice, but I get some errors: [WARN] - (starship::utils): Executing command ...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
boundedvariable
I am going through the kafka architecture. All the features what the kafka is providing are already in Erlang. I would like hear your opi...
New
AstonJ
Please see the new poll here: Which code editor or IDE do you use? (Poll) (2022 Edition) It’s been a while since we first asked this, I...
208 31107 143
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
openscript
Hello! Sorry for this astonishing simple question, but I’m really stuck. I try to set up the intellij-elixir plugin, but I don’t know ho...
New

We're in Beta

About us Mission Statement