Style: distinguishing "entry point" into functionality from implementation details

I grok that when implementing the details of a public function foo, especially if we need to add arguments like initializing an accumulator, we’re encouraged to call the private function do_foo. Fair enough. But sometimes when I’m down a few levels in the details, implementing the details of defp bar that was called from elsewhere, it seems a bit over-verbose to always make it do_bar.

One thing I tried on a few recent http://exercism.io exercises is to simply add an argument – not just to distinguish, which would be a horrid kluge, but to encapsulate the needed initialized accumulator, and this also provided a distinguishable “entry point” into the functionality.

For instance, the normal Enum.zip ends when either of the enums end, and I didn’t want that, so I made defp my_zip(first, second). This in turn called my_zip(first, second, []) (where of course the [] is the initialization of an accumulator), rather than do_my_zip. All usage of my_zip other than that /2 “entry point” is /3, and there are no default args, so there’s no ambiguity. I had initially thought of calling it with an initialized accumulator, as I’ve often seen done, but that struck me as leaking an implementation detail. I then toyed with the idea of making the intially called thing actually /3 with a default initial accumulator, but I just wanted to try this approach, and it works fine.

So… your thoughts? What is generally preferred, in what situations, how strongly, and why? The alternatives I can identify offhand are:

  1. Have the caller actually know what sort of initialized accumulator is needed. This way, my_zip(one, two) would not even exist. Instead, the caller would call my_zip(one, two, []). I’ve seen in most of the Elixir I’ve seen so far, but it seems “leaky”. Then again, it’s mostly been in exercises rather than “real” code, so maybe they weren’t really considering “proper” style.

  2. Always prepend do_ for implementation details even of private functions. This way, my_zip(one, two) would call do_my_zip(one, two, []). This seems like making more names than necessary, and often longer.

  3. Encapsulate the initial accumulator value as I did, by first calling a function taking only the args the caller would know about, and have that add the accumulator. This way, my_zip(one, two) would call my_zip(one, two, []). (The difference from the above being the lack of do_.)

  4. Encapsulate the initial accumulator value with a default value. This way, my_zip(one, two) would actually be a call to defp my_zip(one, two, acc \\ []). This seems like it could run into trouble if there are other uses of the function with the lower arity, but for a private function in a smallish module it should be OK. It may also reduce the need for another function clause, but may also make it require a function head if there weren’t already any defaults.

Thanks,
Dave

3 Likes

I tend to #3, with the addition of making my_zip/3 private

1 Like

Where did that notion come from? I can’t think of any Elixir code I’ve read that follows that pattern. Or maybe I’m just confused by the “do_” convention.

1 Like

I tend to go for #2, although #3 is also fine. The do_ thing is a bit clumsy, but I prefer it to having functions of the same name but different arities with one being public, and the other private. Although that’s maybe fine here, since they are doing the same thing.

I wouldn’t advise #1 (unless you specifically want a caller to pass the accumulator). Same thing for #4. Basically what you say - it’s leaky.

1 Like

The do_ thing is pretty common. I have no data to back that claim, but I’ve seen it around, and use it myself. That doesn’t mean it’s a convention. My impression is that some people hate it, while others use it. I’m not a fan of it myself, but didn’t find a better alternative so far :slight_smile:

2 Likes

As @sasajuric said, it’s pretty common – though like I said earlier, most of the Elixir I’ve seen has been academic/hobby exercises, not “real” (production) code, so maybe it isn’t as common as I thought, “in the wild”. I was also going to say something about the Elixir Style Guide I found at https://github.com/levionessa/elixir_style_guide but I think I misread when skimming that; what I was going to quote turns out not to be recommending implementing whatever's guts in do_whatever, unless whatever is public and do_whatever is private.

1 Like

A quick search in Elixir master revealed 440 occurrences of defp do_, so I wouldn’t say it’s uncommon :slight_smile:

3 Likes

This is entirely the usual erlang-way when the argument count changes.

1 Like

I do a combination of 1) and 3). I do 1) if the entry point is also private, 3) for public APIs.

I was used to use the do_ style which I inherited from Erlang. Today I usually avoid it though given def do_... do ends up being too many do’s in a row.

3 Likes

David Brady would probably say it starts looking like a pile of “do do”. :wink:

3 Likes

Thinking about this further, perhaps using zip_into for the 2nd function might work. I never tried that approach, but I’ll give it a go next time I encounter this situation.

1 Like

Hmmmm, good point, perhaps I was just subconsciously having to deal with CS Hard Thing #1. :wink:

Yeah, that’s a hard problem for all of us :slight_smile:

What do you do when you run out of elements in one of the lists?
What element do you use in the pairs?

Cheers,
Torben

In general |zip(A, B)| = min(|A|, |B|)

I had it just use the elements left in the other list. There were a couple other customizations too, for the specific purpose I was using it for. If you’re curious, see http://exercism.io/submissions/a236d7e095a245cd855426338c122244 .

Not sure why your my_zip reverses the result at the end.

If we forget the reverse I would write

def my_zip([], []), do: []
def my_zip([x|xs], [y|ys]) do
  [[x,y]| my_zip(xs, ys)]
end
def my_zip([], ys), do: ys
def my_zip(xs, []), do: xs

The Beam is optimised for this.

Cheers,
Torben

1 Like

His my_zip needs a reverse as it is not directly returning the zipped list but pushing the values onto an accumulator so they are in the reverse order. Hence the reverse.

Which is better s a debatable, even the efficiency is not always clear. Pushing the values onto an accumulator means that the function tail-recursive which is good but you pay for it with an extra reverse. Directly returning the list is more straight-forward. Take your pick.

One time you do want to use the accumulator version is when you need to return multiple values in a tuple, then it can get messy if you return directly.