Proposal: Private modules (implementation specific) (closed)

josevalim · January 14, 2019, 10:44pm

The goal of private modules is to define a module that cannot be trivially accessed by other modules where they are not visible to.

In this proposal, private modules work by declaring exactly which other module prefixes can access it:

defmodulep MyApp.Private, visible_to: [MyApp] do
  def hello do
    IO.puts "hello world"
  end
end

In the definition above, only MyApp and modules nested under it can access MyApp.Private. In this other example:

defmodulep MyApp.Nested.Schema, visible_to: [MyApp.Nested] do
  def hello do
    IO.puts "hello world"
  end
end

only modules in MyApp.Nested and under it can access MyApp.Nested.Schema.

To access a private module, you must explicitly require and alias it:

defmodule MyApp.Other do
  require MyApp.Private, as: Private
  Private.hello
end

The require is necessary to validate the visibility rules. The alias is required to give the private module a proper name (as we will learn later on, private modules live in different namespaces).

Private modules can be arbitrarily nested too:

defmodulep MyApp.Private, visible_to: [MyApp] do
  defmodulep Nested, visible_to: [MyApp] do
    def hello do
      IO.puts "hello world"
    end
  end
end

Requiring MyApp.Private does not automatically require MyApp.Private.Nested. It still need to be explicitly required either directly:

require MyApp.Private.Nested, as: Nested

If you have already required Private, you can also require Nested from the Private alias:

require MyApp.Private, as: Private
require Private.Nested, as: Nested

Nesting

defmodulep works as defmodule as it can be accessed directly following its definition:

defmodule Foo do
  defmodulep Bar, visible_to: [MyApp] do
    ...
  end

  Bar # We can access bar here even if not in visible_to
end

In other words, a more correct description of defmodulep is that it is visible to any following module declared in the same file or to any module declared in visible_to. In fact, :visible_to may be skipped for nested private modules which means they are only accessible to the following modules in the same file.

Testing

In order to test a private module, you need to make sure the private module is visible to the test module. Since most private modules are visible to their own rootname, testing just works if you follow Elixir’s testing conventions. For instance, a private module MyApp.Foo.Bar is likely visible to MyApp or MyApp.Foo, which means the default test module, which is MyApp.Foo.BarTest, should have access to the private module. In other words, the following code should work just fine:

# lib/my_app/foo/bar.ex
defmodulep MyApp.Foo.Bar, visible_to: MyApp.Foo do
  ...
end

# test/my_app/foo/bar_test.exs
defmodule MyApp.Foo.BarTest do
  use ExUnit.Case

  require MyApp.Foo.Bar, as: Bar
  ...
end

Inspecting private modules

Private modules work by being assigned a different naming structure. If you define a private module Foo.Bar, it will actually be compiled as :"modulep_DDD_Elixir.Foo.Bar", where DDD will be a arbitrarily assigned number, instead of the usual Elixir.Foo.Bar. The number is arbitrary to discourage developers from accessing the underlying module directly, as this number may change at any time. The only way to safely access a private module is by requiring and aliasing it first.

Proof of Concept

I have written a proof of concept that is "ready to use today"™ for those willing to try this idea out:

However, the proof of concept has certain limitations:

Since we can’t change the behaviour of require, the library introduces a requirep to require private modules.
If you define defmodulep Foo and then defmodule Foo, the proof of concept won’t warn.
If you invoke SomePrivateModule.foo without requiring it, the error message says the module does not exist, without giving any hints the module is actually private (this may or may not be a feature).
Private modules appear literally as :"modulep_DDD_Elixir.Foo.Bar" but it could show up as Foo.Bar when inspected by updating the Inspect implementation for atoms
If you define a module defmodule Public nested inside defmodulep Private, Public cannot be accessed directly but only via requirep Private, as: Private and then by calling Private.Public. This will be fixed if we add this to Elixir by making Module.concat/1 to be aware of modulep_DDD_ prefixes.

All of those limitations could be addressed by adding defmodulep
to Elixir.

Your turn

I would love to hear feedback on:

The feature
Implementation details and concerns on this area
The proof of concept

Also, I would love examples of how other languages tackle private modules. A common implementation is to have the visibility of your modules associated to the “idea of a package” but Elixir does not quite have the concept of a package. Elixir does provide the idea of “applications” but they are define only after the code is compiled. That’s why the “package” approach has been ruled out in favor of a explicit visible_to control.

hauleth · January 14, 2019, 11:44pm

I have mixed feelings about it. On the one hand private modules are nice idea, on the other hand I think that this proposal is quite “too much”. What I would like to see instead is rather compiler warning and module attribute (maybe even with EEP), so this would not be a hard error, but compilation warning, something like:

defmodule MyApp,Private do
  @private [MyApp]

  def hello do
    IO.puts "hello world"
  end
end

This would be less strict than Your proposal, but I think it would be more “in line” with rest of the platform.

Eiji · January 15, 2019, 12:46am

This is really interesting idea!

I’m not sure about it. On one hand it could be probably nicer for iex sessions (not sure about it), but on another I would like to see a module which is only accessible for specific library/scope.

josevalim:

In this proposal, private modules work by declaring exactly which other module prefixes can access it:
defmodulep MyApp.Private, visible_to: [MyApp] do
  def hello do
    IO.puts "hello world"
  end
end

What would happen if somebody would do:

defmodulep NotMyLibrary.Private, visible_to: [NotMyLibrary] do
  # …
end

and then I would do something like:

defmodule MyApp do
  # …
end
# …
defmodule NotMyLibrary.SimpleHack do
  require NotMyLibrary.Private, as: Private
  # …
end

Is it possible to add something like:

defmodulep NotMyLibrary.Private, ensure_app: [:not_my_library], visible_to: [NotMyLibrary] do
  # …
end

so in case of compile time definition of NotMyLibrary.SimpleHack in my_app app we would receive error/warning like there is one on overriding modules.

Even if DDD is different for each compilation then it’s possible to find it like:

defmodule Example do
  def sample(module) do
    source = Keyword.keys(:code.all_loaded())
    regex = ~r/\Amodulep_[0-9]+_#{to_string(module)}\Z/
    matcher = &String.match?(&1, regex)
    source |> Enum.map(&to_string/1) |> Enum.find(matcher) |> String.to_existing_atom()
  end
end

real_module_to_alias = Example.sample(Foo.Bar)

Of course it’s not as safe as depending on official way, but I believe that in such way we could guess real full module atom. Hope I did not missed anything here.

I have also two questions here:

Would be there easy way to use private modules inside iex session? It would be awesome if it could be possible only for iex session started only for restricted app.
How Elixir (and therefore ex_doc) would handle @moduledoc and @doc attributes inside private modules? I see here a nice win-win to close @docp and related topics.

It would be awesome if it will be introduced into Elixir core as people would be forced to stop depend on private APIs.

joaoevangelista · January 15, 2019, 1:26am

This part might get affected, like if you iterate using the repl to validate quickly a function that now is on a private module. Unless given an option, mix could compiled your modules to be visible to iex if you are in dev or test env.

I think most of the languages that has a VM or interpreter can be monkey patched somehow. Like in Java you can call a private method or read a private property using reflection, and also I’ve seen people creating the same package structure alongside your’s, to emulate the dependency one so you could extend a protected class.

asummers · January 15, 2019, 1:40am

While I understand the technical reasons for doing so, I dislike the required alias in require. We generally do not alias things for greppability reasons, so having a way to be able to use the FQN would be nice. Up until now, alias has been opt in and composes independently with other features but this requires agreeing with two concepts to use one.

JEG2 · January 15, 2019, 2:38am

I find myself not wanting this change. It feels like a lot of special cases, for mild protection. I feel like people who use code with something like @moduledoc false on it today know what they are doing and will not be deterred by this feature. The protection is easily defeated as shown in this thread.

Maybe I just haven’t felt the pain this feature is targeted at though. Perhaps a good example use case would help me see the value.

blatyo · January 15, 2019, 2:56am

I agree with everything @JEG2 said.

I don’t know anyone who wants to call a private API; I know a lot of people who need to call one. I’d prefer a warning sign rather than a guard dog. For me, @moduledoc false/@doc false have been enough.

chrismccord · January 15, 2019, 5:57am

I’m generally for this feature since I think the latter case leads folks to reaching into private APIs out of “necessity”, where they would otherwise ideally engage with library authors and team members on required API changes to accomplishes their needs. Over time these kind of quick fixes bite folks and maintainers alike, so if things could be locked down as private I think it would stop a lot of pain in the long term, for maybe some effort in the short term.

I’ve seen it both ways. Some folks simply are simply unaware or miss the doc notation and blissfully call APIs they shouldn’t, while others go rouge and intentionaly break the rules

I think this feature would help in both cases. Newcomers wouldn’t accidentally ship brittle code against private APIs and more seasons devs would avoid the lazy trap of calling the API because they can.

bottlenecked · January 15, 2019, 7:11am

A common implementation is to have the visibility of your modules associated to the “idea of a package” but Elixir does not quite have the concept of a package. Elixir does provide the idea of “applications” but they are define only after the code is compiled. That’s why the “package” approach has been ruled out in favor of a explicit visible_to control.

I think a mix project is roughly analogous to a package in this case: a bundle of related code files, so restricting visibility to just the project (and by dropping the visible_to attribute) would be OK.

That would remind me of the internal keyword used in C# (class visible in current project/assembly only). Since it’s common in C# to create a separate assembly for tests, you would often declare internals visible to: other_assembly to be able to test the functionality.

As for the main question, adding private modules I dont think will make things any clearer, it could just lead to a bigger language surface without clear benefits.

Again, from my experience coding in C#, we’d often come across library code that we had to extend but the authors decided that their code should never be extended (private/final classes etc.). Since making things less accesible makes more sense in library code (which is meant to be shared) we often had to find workarounds (reflection, decompiling, forking repos etc) just to be able to do our work.

All in all, I think the potential for confusion and abuse is there and it’s real, and I can’t say I’ve missed any of these things in Elixir. Perhaps a warning like others said is more than enough.

josevalim · January 15, 2019, 7:39am

Answering many comments at once…

First of all, in regards to the feature as a whole being necessary: it absolutely is. For an example, just look at the Elixir v1.7 release which broke many packages that were using Elixir private modules. When talking to developers who were using these APIs, most of times, they simply did not notice it wasn’t documented. On large systems, this leads to a cascade effect that makes it very hard to update only parts of the system, because in order to update Elixir, you also need to update dependency X, which may break Y and Z due to private APIs, and so on.

In regards to the feature having workarounds: yes, there are workarounds and even simpler than the ones posted on this thread. But let’s be honest here: there is no implementation that will forbid someone from bypassing the visibility boundaries if someone really wants to do it. All languages that I have explored while writing this proposal has this “flaw”. In a nutshell, we mostly need a better way to document intent.

That said, I think @hauleth does provide a good point: all of the features above could be achieved with just warning. So we need to do a choice between a hard failure (this proposal) or a warning (as mentioned by @hauleth). Implementing it via a warning would be much simpler but such warnings would be a “best effort” and they would be quite easy to bypass. For example, by doing:

mod = SomePrivateModule
mod.foo()

Is the warning worth it even if it is not guaranteed? Thoughts?

josevalim · January 15, 2019, 7:42am

The compiler has no idea about applications. Apps are purely a build tool concern that are assembled after all modules are compiled. In a way, this is great, as the language core is small and it builds new concepts on top of existing ones, but it means this feature (and modules in general) cannot integrate with applications.

mkaszubowski · January 15, 2019, 8:19am

I’m absolutely for this change

Most people focus on library code, but I think it’s also important inside a single codebase, maintained by one team. Currently, we have no language support for setting up boundaries and establishing clear interfaces between different parts of the application, which can lead to issues with maintainability.

Sure, with enough discipline you can make sure that nobody uses these “private” modules, but the same could be said about private functions, and yet we have them in a language. Unfortunately, disciple and conventions fades when we face a deadline, are tired, unexperienced or new on the team. That’s why having a help from the compiler is essential. If we care about visibility of a function, why shouldn’t we care about visibility of a module? After all, the purpose of both of them is to organise code.

dimitarvp · January 15, 2019, 8:26am

I am very much in favor of this proposal, for these reasons:

Clear communication of intent. As mentioned several times, most languages we the community here are familiar with have a mechanism to bypass a private module / function boundary. It’s not the point to have a perfectly private code pieces. The point is to discourage people when trying to use parts of your library which are supposed to be implementation details. Are there people dedicated enough to cross the boundaries? Of course. But, by doing that they make a conscious decision to rely on internal and brittle APIs and they are likely locking themselves to one version of your library. I would bet that most devs wouldn’t do it when faced with a compiler error – even if they can bypass it. People just want to get their work done and move on. They won’t reverse-engineer your library unless you leave them no other choice.
Maturity of the language and the ecosystem. Reading through HN and Reddit regularly, I get the impression that many still view Elixir as a toy language – and having the ability to poke in the guts of any of your dependencies at runtime is one of the reasons why they think so. IMO having decent private module/function mechanism – as this proposal is – sends the message that this community and its tech are ready for even more serious work. (Personally, I was convinced the moment I found OTP but many others need more convincing.)
It helps with the single-responsibility principle programming. Example: it has been pointed out many times in this forum that when an app grows enough, it’s a bad practice to directly use the Ecto schema modules. At certain point your DB design trails behind your domain schemata and requirements and it’s IMO much better for only the domain modules (e.g. Phoenix contexts) to have access to the schema modules.
It can help facilitate understanding of the app/library. Given this code:

defmodulep Internal, visible_to: [Public] do
end

…trying to use Internal anywhere else but its intended namespace can give you a compile-time error like this:

The `Internal` module is private. See `Public` for more information.

This can help people guide newcomers to the proper place to use their library (or even a singular module inside a company project).

In favor.

dimitarvp · January 15, 2019, 8:40am

I tend to agree with that one. I think the alias part should be optional – unless the feature couldn’t work without it?

LostKobrakai · January 15, 2019, 9:08am

I really like the intention, but also don’t really favor the require call. Would be nice if only the private module would need to say to whom it’s available and any module using it wouldn’t need to care (or just have some generic use Private). Needing to keep track of the relationship from both sides seems like a lot of boilerplate. Like e.g. a phoenix context might easily gather up quite a lot of private modules to access. On the other hand I also like the explicitness. It’s probably worth some exploration anyways.

I’d also add my vote for a way to have it just warn and not fail compilation. My ideal would be failing by default, but allowing compilation with warnings via a cli flag. This way we don’t hinder discoverability. If I want to check out how some private code works I can try out any implementation I aspire and be much more focused in making an effort of making parts or the whole functionality public with any maintainers involved. Also it’s local to your own project this way and wouldn’t compile e.g. as a hex package (besides maybe telling people to also use the flag, which is like a big flag of doing something not the supposed way).

josevalim · January 15, 2019, 9:15am

@LostKobrakai the require+alias are necessary if we want hard failures. If we want a warning, then it would be on a best effort fashion and it would be quite trivial to bypass it. For example, if we move it to a warning, I could bypass any visibility check like this:

 mod = SomethingPrivate
 mod.foo()

sztosz · January 15, 2019, 9:36am

I’m all against hard failures. Just look at Python, encapsulation is done by simple convention and name mangling. Quick search gave me this nice article explaining how things are done https://radek.io/2011/07/21/private-protected-and-public-in-python/ People who use private API’s are to blame themselves. If they were not aware that given API was private, then we can improve this, sure, make information more clear that something is not to be used outside of given app, mix project, whatever. But if someone has a strong need to use private API for whatever reasons, then he will do it anyway, but will have to write hacks for accessing private modules. Beside even as an author of given library, if I allow people to use it… who am I to say this part you but that one you can’t?

arkgil · January 15, 2019, 9:44am

I’m all for this feature, for the reasons mentioned by @mkaszubowski and @dimitarvp. It clearly demonstrates the intent of the author and helps to maintain discipline in larger codebases.

josevalim · January 15, 2019, 9:54am

The problem with this line of thought is that it gives an impression of instability since packages break whenever there is a new release of something they were using a private API of. It also discourages communication in favor of quick work arounds. Well, if you need private functionality, why not start a discussion on the best way to expose it?

My experience coming from the Ruby community which (at the time) did not value contracts and visibility that much is that this leads to a lot of pain down the road, especially as systems grow in complexity. Updating only a small part of the system becomes impossible, because a minimal change breaks many unwarranted things along the way. It usually goes like this: let’s update Elixir! Unfortunately, updating Elixir breaks package X because X used a private API. So we have to update package X too but wait! That breaks Y and Z. And so on and so on.

Also beware of “truths made along the way”. It is very likely a community ends-up accepting that “being able to call privates is a good thing” because this behaviour was there since the beginning and it is impossible to change it now, so the best they can do now is to focus on the pros despite the cons. Note this is not a criticism to Python nor I am implying it is the case here, as I am not that familiar with the Python community, but it is an effect we see in all communities, including Elixir’s.