While looking for performance bottlenecks i stumbled into this which is quite surprising.
@some_string "1632767814.9532351 00 0 0 1 0 0 0 0 0 0 0 0 0 0 00 00 0 00 00 00 00 00 0e 01 14 3c 3c 96 20 99 ba 1e 10 80 93 a7 1d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 0 0 0"
def bench_string_split_trim(n) do
fun1 = fn ->
Enum.map(1..n, fn(_x) ->
String.split(@some_string, " ", trim: true) end)
end
fun2 = fn ->
Enum.map(1..n, fn(_x) ->
String.split(@some_string, " ") |> Enum.filter(fn(x) -> x != "" end) end)
end
Benchee.run(
%{
"vanilla" => fun1,
"filter" => fun2,
},
formatters: [
{Benchee.Formatters.HTML, file: "samples_output/my.html"},
Benchee.Formatters.Console
]
)
end
Replacing trim: true
with filter
more than doubles the performance; just my 2 cents, might be handy for someone out there.
cheers!
3 Likes
Nice find! I think it would make sense to open an issue on the Elixir’s issue tracker. Seems like there is a room for performance improvement in standard lib.
2 Likes
I think that’s related to the fact that String.trim
does not simply check for empty strings after removing spaces; it probably deals with Unicode spaces as well and this is where the performance difference is likely coming from.
I think this option has nothing to do with whitespaces… It is a bit confusing, I know. From the docs.
* `:trim` (boolean) - if `true`, empty strings are removed from
the resulting list.
1 Like
Yep, I got tripped on the naming then, sorry.
1 Like
Blockquote
Don’t blame yourself for that! Look at the options for :binary.split/2
Erlang -- binary
trim
Removes trailing empty parts of the result (as does trim in re:split/3 .
trim_all
Removes all empty parts of the result.
1 Like
If any, it is something that should be fixed upstream in OTP.
{:infinity, false} ->
:binary.split(string, pattern, [:global])
{:infinity, true} ->
:binary.split(string, pattern, [:global, :trim_all])
/*
* %CopyrightBegin%
*
* Copyright Ericsson AB 2010-2020. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* %CopyrightEnd%
*/
This file has been truncated. show original