Elixir source code obfuscation

NotQuiteLagom · October 11, 2022, 8:58pm

I’m wondering about the best approach to “obfuscating” Elixir source code that needs to be available for a group of software developers as they need to use the components/functions but just the component API as the internals should be a black box.
I had a recent case of a group leaving the company and taking source code with them.
So, I was thinking…

pre-compiled Deps?
Obfuscating source code like in Javacscript but with the Mix format…
So, if you use a component/modular approach where each component/module has one owner but it’s needed for the global project and developers can customise those components (for example in the presentation layer like a button, but not only) in their code, what would be the best approach.
Thank you in advance!

cmo · October 11, 2022, 9:13pm

You can give them a service to hit rather than the code. Most things can be decompiled these days.

Lawyers instead?

NotQuiteLagom · October 11, 2022, 9:16pm

Yes, but I prefer to avoid having troubles.
And remember I want to have an “internal/component” API so that they can customise components that they use in their source code.
I believe this is an interesting question…can we control what our extended developer team can access in terms of source code?

cmo · October 11, 2022, 9:25pm

Sound to me this topic is entirely about creating troubles for yourself

hst337 · October 11, 2022, 9:50pm

Obfuscated code like in JavaScript can still be easily deobfuscated. Providing .beam is not secure because they can be easily decompiled (even by hand).

dimitarvp · October 11, 2022, 10:24pm

The BEAM languages are compiled into a fairly transparent bytecode so you absolutely cannot make it as opaque as you would like.

w0rd-driven · October 11, 2022, 11:42pm

I would definitely approach this from what decompilation tools are available and how user friendly they are. With .NET (though this may not be true now), I could decompile a release into somewhat full featured source code files and their respective project containers. Like beam languages, the code is interpreted as MSIL but being able to read that bytecode or put it in my IDE of choice is no simple task.

If tools existed to only view MSIL I’d personally consider it a small threat because the average developer isn’t going to choose to work at that level. The same would be true for Erlang’s bytecode. If code can be decompiled to a relatively working project, that’s when I’d be concerned with obfuscation.

I suspect the average Elixir developer when confronted with a release and no source is likely not too concerned to reverse engineer intellectual property from bytecode. There will always be determined hackers. As an anecdote, JavaScript obfuscation absolutely does not stop me from changing Vue to developer mode so I can use Vue tools extensions to analyze the components of popular websites. JS can obfuscate function names but magic strings have to leak through.

Taking your API example, anyone using your library would have to be obfuscated with it. Otherwise they’d have to use function names like MyModule.c() or some random thing that would ideally change every obfuscation run. Obfuscation in JS works because the source is bundled together then minified. If I were a developer presented with a library that required I also obfuscate my application, I’d likely look for a new library. What about connecting with iex? Would I have to run MyModule.my_function() or MyModule.c()? I’d use .c() exactly once before I ripped it out. You could be in the enterprise space where I’d be forced to use your library but I wouldn’t willfully choose something that put those kind of stumbling blocks in my path. That’s only my 2 cents but those would be the things I wrestled with.

Having said all that, you may want to look at DockYard’s BeaconCMS (beacon/lib/beacon/loader at main · BeaconCMS/beacon · GitHub) as pages and components are generated with unique names. Its been a minute since I analyzed how the modules in that directory worked but in the console you can see your components have hashed names to prevent collisions. My guess is this is so one CMS can handle multi-tenancy or multiple sites without one page or component stepping on the other. This may be what you’re looking for but the word “obfuscate” obviously triggers some things with me.

mindok · October 12, 2022, 1:56am

IP protection is an org-level discussion, not just a technical one. Obfuscation is way down the list of things to think about - code is always exposed to being copied, decompiled etc by someone working on the codebase and honestly, the vast majority of the time it really makes no difference.

Really, the focus should be on the “secret sauce” unique to your application (hopefully there is such a thing). Secret sauce can be algorithms, datasets (e.g. used to train ML models), customer data etc, so understanding where and how value is created is key to any IP protection strategy. A lot of wiring up code is common to a large number of applications and if folks steal it, well, whatever - it shouldn’t make or break your company if it’s taken.

How you protect the secret sauce is then a combination of legal, social & technical. I have seen “inner-circle” developers who are “motivated” (better pay, equity, proven trustworthy, explicit legal threats or whatever) to be “stay close” being the only ones allowed to work on the core algorithms and outer-circle developers (e.g. contractors, new hires) only access the core algorithms via well defined APIs. This can be taken a step further if the algorithms are particularly valuable by, for example, offering an alternate, inferior algorithm with the same API for the purpose of building and testing integrations / UIs etc, but only deploying the real one into production to minimise the possibility of reverse-engineering.

Also, if an algorithm is valuable enough to want to prevent casual IP theft, it’s probably valuable enough to build some bits in a different language where the build artefacts are harder to reverse-engineer.