Private functions: naming and additional parameter position

niels_bom · March 9, 2019, 7:19pm

While learning Elixir I’ve been writing and rewriting some practice functions. Some of the rewriting was to ensure recursive functions are tail-call-optimized (TCO) or to prevent expensive operations like concatenating lists or Enum.count(list).
(and yes mr Knuth I know about premature optimization being bad)

Some things I’ve noticed myself doing while rewriting functions is:

naming: for public function foo name the corresponding private function _foo
the private functions often have an extra parameter (so I can prepend to list for TCO)
prepended result lists are reversed before they’re returned
in public function heads pattern match for simple cases (as I can often do that without the need for the extra parameter used in the private function)

And I’d like to get some feedback on whether what I’m doing is idiomatic/common practice.

My code is here: https://gist.github.com/nielsbom/8aaec8838d927cac9f72b0e7c28770b2
Just $ elixir bla.exs to run the tests.

Thanks!

blatyo · March 9, 2019, 9:48pm

From what I’ve seen in other Elixir and Erlang code, it’s typical to name the private functions do_blah when it’s an implementation detail of blah. Since it’s private, it shouldn’t matter much what you choose though. I think it’s more important that it’s near he public function.

Very common for TCO. The extra argument is typically called the accumulator. It’s also a hint that you can turn most recursive operations into a form of reduce to make them TCO.

If it’s my own private code, I sometimes don’t bother with splitting the functions and just pass the accumulator as an optional argument with a default. For example, your faster filter could be defined like this:

  def ffilter(list, func, result \\ [])
  def ffilter([], _, result), do: Enum.reverse(result)
  def ffilter([head | tail], func, result), do:
    # every item that is 👍 is added to it (prepending to LL is cheap)
    # then when we reach the end, reverse and pass it back (reverse is cheap?)
    # Tail-call optimized 👍
    _ffilter(
      tail,
      func,
      (if(func.(head), do: [head | result], else: result))
    )

That’s common. Doing the work in order during recursion can easily recreate an entire list multiple times while prepending to the head is O(1).

Yes, this is a good practice. If you validate at your boundary it means all subsequent code can ignore bad values.

Qqwy · March 10, 2019, 3:14pm

Most of what you are doing is indeed idiomatic/common. do_foo is more common than _foo but since these are private, internal functions anyway there is no real problem in naming them anything else, since they can always be renamed later without problems.

I disagree with @blatvo about ‘just passing the accumulator as default’. I think it is bad style (because it exposes an implementation detail the rest of the world) and I would advise against it, even in your own personal code.

I do have a comment on your code example. Most notably in ffilter, you write an inline if-statement:

  # faster filter
  def ffilter(lst, func), do: _ffilter(lst, func, [])

  defp _ffilter([], _, result), do: Enum.reverse(result)
  defp _ffilter([head | tail], func, result), do:
    _ffilter(
      tail,
      func,
      (if(func.(head), do: [head | result], else: result))
)

I think that the following is more readable (and therefore, arguably, more idiomatic):

  # faster filter
  def ffilter(lst, func), do: _ffilter(lst, func, [])

  defp _ffilter([], _, result), do: Enum.reverse(result)
  defp _ffilter([head | tail], func, result) do
    if func.head() do
      _ffilter(tail, func, [head | result])
    else
      _ffilter(tail, func, result)
    end
  end

And as an aside: Is there a reason you used comments rather than @documentation in your example code?

Oh, and another interesting fact about tail-call recursive functions about enumerables, which I don’t think Elixir itself currently does, is that those reversal steps are only required if the next step of a sequence of enumeration-manipulations requires that order. In other words: If something does not rely on the order we can postpone the reverse, or maybe even skip it all-together. (Such as: some_list |> Enum.filter only_nice_elements |> Enum.count.
I know Haskell has a set of (very complicated and fiddly…) compiler rewrite-pragmas to optimize common cases of this.

peerreynders · March 10, 2019, 6:38pm

Is do_foo actually common for tail recursive implementations though?

Ultimately functions are not identified by name - they are identified by the combination of name and arity i.e. foo/1 and foo/2 are already different functions. Adopting either do_foo or _foo is equivalent to going down the Hungarian Notation rabbit hole - (many people would argue that if you need type/protection hints then use an IDE rather than burying that information in the function name).

So I’m more familiar with the foo/1 and foo/2 convention to highlight that these distinct functions are part of the same functionality behind the foo name.

I think there is some conflation going on here.

Being tail recursive and having private access is orthogonal. They may coincide with your choice of implementation. Alternately in Erlang it’s the public function that has to be exported i.e. all functions are private by default.
That “extra parameter” tends to be a differentiator between implementing body or tail recursion.
- Body recursion leaves intermediate results on the call stack.
- Tail recursion explicitly takes control of the “bag of intermediate results” so that it does need to leave additional data on the call stack. So that “extra parameter” is the “bag of intermediate results” that the tail recursion explicitly manages - in a way a faux call stack.

Edit: Changed “stack of intermediate results” to “bag of intermediate results” i.e. holder of intermediate data with whatever internal structure is appropriate to solving the problem.

blatyo · March 10, 2019, 6:52pm

Not defending the usage, as I sort of dislike do_, but if you search for do_ in these examples, it at least seems like a pattern.

github.com

elixir-lang/elixir/blob/v1.8.1/lib/elixir/lib/list.ex

defmodule List do
  @moduledoc """
  Functions that work on (linked) lists.

  Many of the functions provided for lists, which implement
  the `Enumerable` protocol, are found in the `Enum` module.

  Additionally, the following functions and operators for lists are
  found in `Kernel`:

    * `++/2`
    * `--/2`
    * `hd/1`
    * `tl/1`
    * `in/2`
    * `length/1`

  Lists in Elixir are specified between square brackets:

      iex> [1, "two", 3, :four]

This file has been truncated. show original

github.com

erlang/otp/blob/master/lib/stdlib/src/lists.erl

%%
%% %CopyrightBegin%
%%
%% Copyright Ericsson AB 1996-2023. All Rights Reserved.
%%
%% Licensed under the Apache License, Version 2.0 (the "License");
%% you may not use this file except in compliance with the License.
%% You may obtain a copy of the License at
%%
%%     http://www.apache.org/licenses/LICENSE-2.0
%%
%% Unless required by applicable law or agreed to in writing, software
%% distributed under the License is distributed on an "AS IS" BASIS,
%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
%% See the License for the specific language governing permissions and
%% limitations under the License.
%%
%% %CopyrightEnd%
%%
-module(lists).

This file has been truncated. show original

While you could technically have a public foo/1 and a private foo/2, there are cases where you want the public interface to have a foo/2 arity as well, but still need a private function for implementation details. I personally prefer some convention to separate the name of the public interface from the implementation detail.

peerreynders · March 10, 2019, 7:13pm

In the Elixir source there does seems to be a tendency to use do_ prefix for private functions. The question is whether it should be considered idiomatic when it isn’t mentioned in the naming conventions or style guide.

Due to already established conventions for leading underscores, _foo probably isn’t the greatest choice for a function name.

In the Erlang source do_flatten/2 seems more like an escape as flatten/2 was needed for export.

Qqwy · March 10, 2019, 7:17pm

No: It is common for private functions, which tail-recursive implementation functions often are. The do_* prefix is not the only one used, but definitely the most common, and the one adopted by e.g. the standard library.
EDIT: Above was written before seeing your last response. You are completely astute that this convention is not in the style guide and that other naming schemes are ‘just as good’. As well as that starting function names with a leading underscore is a BadIdea .

In many of the simpler cases, a tail-recursive solution does not need this ‘extra’ stack. But even if we have this extra parameter, there are many cases in which an accumulator is not a stack: Only when talking about linked lists (which are essentially stacks) will we end up with accumulators that are lists/stacks. (Of course there are other enumerables that would like to be transformed into a list and back during enumeration, but there are other enumerables, like trees, for which this is not the case).

clinton · September 10, 2020, 10:36pm

I picked up the defp _foo() naming convention from one of Dave Thomas’s books and I prefer it to defp do_foo() because having do_ in front of every private function name makes my private function area look like a sea of meaningless do_s and it is harder for me to read quickly.

I also prefer the underscore convention over defp foo() because the underscore makes my code more readable by letting me know in-line that the _foo() function is private and located in the current file. I don’t have to go look for the function definition and figure out where it came from. I instantly know at a glance.

The other nice thing about using defp _foo() for private functions is that underscored function names are not imported into other files. So, if I make _foo() a public function so I can fiddle with it in iex and then forget to make it private again, it still won’t get imported into other modules and accidentally used.

l00ker · September 11, 2020, 8:57pm

This is what I do as well for exactly the same reasons. Using the _foo() convention quickly tells me that the function is defined within the same module and not an imported function from another module.

If you use def _foo(), as you mentioned, it is basically hidden from TAB completion in iex etc., but remains callable from outside the module if you know it’s there. It acts like the Reflection functions in Ecto.Schema - i.e. __schema__(:fields) etc.

Maybe it’s not for everyone, but I’ve found it to be quite useful. I’m glad someone else sees it that way as well

wolf4earth · September 12, 2020, 7:30am

One thing of note is that tail-recursive functions are not strictly faster or more efficient than their body-recursive counterparts.

@PragTob wrote quite an interesting piece on that a while ago:

With a follow up here:

kwando · September 12, 2020, 8:12am

This is also mentioned in the Erlang documentation https://erlang.org/doc/efficiency_guide/myths.html