Hello!
So I currently have the following problem: I want to take a map and “sanitize” all the integers fields. Basically verifying if that field value is really an int
and replace the parsed value in the map, furthermore, if the verified value is not an valid int
it should save nil
to the field, for example:
Input Map
%{
int_a: "1",
int_b: "not_a_int!"
%}
Output Map:
%{
int_a: 1,
int_b: nil
%}
On the system that I am currently working I know that all the integers fields in the map will start with the int_
prefix (important to say that I don’t have full access to the struct of the map, so I can’t write a function that uses a specific field) so I wrote the following code:
def parse_map_int_fields(map) do
ints_map =
map
# Filter all the Integers field
|> Enum.filter(fn {k, _v} -> Atom.to_string(k) |> String.starts_with?("int_") end)
# Remove all fields that are nil or ints
|> Enum.filter(fn {_k, v} -> !is_integer(v) and !is_nil(v) end)
# Parse the values to int, returning nil if invalid
|> Enum.map(fn {key, val} ->
case Integer.parse(val) do
:error ->
{key, nil}
{parsed, _} ->
{key, parsed}
end
end)
|> Map.new()
# Merge the original map with the "sanitized" map
Map.merge(map, ints_map)
end
Another point is that this code will be ran in a lot of maps (5 to 10 millions maps) with on average 15 fields per map, so I have two questions:
- Most important, can this code be simplified? From my perspective it look a little bit convoluted, I would appreciate any tips!
- How one would optimize this function? I have wrote a simple benchmark using Benchee that gave me the following results:
Operating System: Linux
CPU Information: AMD Ryzen 7 2700X Eight-Core Processor
Number of Available Cores: 16
Available memory: 15.63 GB
Elixir 1.11.2
Erlang 23.2.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s
Benchmarking parse_map_int_fields...
Name ips average deviation median 99th %
parse_map_int_fields 857.59 K 1.17 μs ±2149.16% 1.01 μs 2.12 μs
Code used for the benchmark:
defmodule Sanitizer do
def parse_map_int_fields(map) do
ints_map =
map
|> Enum.filter(fn {k, _v} -> Atom.to_string(k) |> String.starts_with?("int_") end)
|> Enum.filter(fn {_k, v} -> !is_integer(v) and !is_nil(v) end)
|> Enum.map(fn {key, val} ->
case Integer.parse(val) do
:error ->
{key, nil}
{parsed, _} ->
{key, parsed}
end
end)
|> Map.new()
Map.merge(map, ints_map)
end
end
test_map = %{
int_a: "1",
int_b: "not_a_int!",
str_a: "This is a string field",
str_b: "Another string field",
fl_a: 0.0
}
Benchee.run(%{
"parse_map_int_fields" => fn -> Sanitizer.parse_map_int_fields(test_map) end
})
Which isn’t all that bad for my use case, but in the interest of learning I would like to know if something could be done different.
Thanks to all!