Vet - Elixir dependency security scanner

Vet is a dependency security scanner for Elixir. It detects supply chain attacks by walking the AST of every dependency in your lock file and flagging patterns that have no legitimate reason to appear in a library.

The Problem

On March 24th, someone compromised the PyPI publishing token for LiteLLM, an open-source AI gateway with 3.4 million daily downloads. They pushed a version with a .pth file — the kind Python executes automatically when the interpreter starts. The payload swept SSH keys, AWS tokens, and Kubernetes secrets, then exfiltrated them. The poisoned package was live for three hours. In that window, hundreds of thousands of systems downloaded it. Mercor, a $10 billion AI startup, lost 4TB of data — candidate records, source code, video interviews.

The entire attack was three lines of work: steal a publishing token, push a package, wait. The ecosystem did the rest.

In Elixir, the same pattern works. A package calls System.get_env("AWS_SECRET_ACCESS_KEY") inside a @before_compile hook and POSTs the result during mix deps.compile. Your application hasn’t started. Your tests haven’t run. The BEAM doesn’t distinguish between your code and your dependency’s code.

What Vet Does

Vet walks the AST of every dependency in your lock file and flags patterns with no legitimate reason to appear in a library:

  • Compile-time system commands (System.cmd, :os.cmd, Port.open)

  • Credential access (environment variables containing SECRET, KEY, TOKEN, AWS_*)

  • Network calls to suspicious endpoints

  • Compile-time hooks (@before_compile, @after_compile)

  • Obfuscated payloads (high-entropy strings, Base64+eval patterns)

  • Atom exhaustion DoS attacks

  • Slopsquatting detection (attackers register names that LLMs commonly hallucinate)

mix vet.check catches these before mix deps.get — no execution, no risk.

Why Elixir is Positioned Well

Elixir has the tools to do this properly. Code.string_to_quoted and Macro.prewalk let you walk dependency code with the same tools the compiler uses. Python can’t do this. That said, AST-level checks are ultimately a compiler concern — the compiler already walks every node, sees macro-expanded code, and can’t be skipped. What the compiler can’t do is check download counts, score dependency depth, or detect slopsquatting. The full solution is both.


21 Likes

Hey reall cool idea and would love to implement vet into my projects!

Have you tried it on a bare bone Phoenix app with Ecto? It returns quite some errors.
Also it treats mix aliases (mix precommit, mix ecto.setup etc.) as if they were packages.

Thanks for actually trying it! You found two real bugs.

The aliases-as-packages issue: the AST walker was scanning the entire mix.exs looking for {atom, _} 2-tuples, which happily caught keyword pairs from your aliases/0 function (setup:, precommit:, "ecto.setup":). Those then got passed to mix vet.check, which looked them up on hex.pm, found nothing, and reported them as CRITICAL phantom packages. Embarrassing.

Fixed by scoping extraction to the body of the deps/0 function only. The new version also handles 3-tuple deps with options like {:phoenix_live_view, "~> 1.0", only: :dev} (which in AST is {:{}, meta, [name, version, opts]} — different shape from a plain 2-tuple, easy to miss).

“Returns quite some errors”: the built-in allowlist only covered ~20 packages, but mix phx.new --database postgres pulls in ~30 deps, most of which legitimately trip Vet’s checks — bandit calls the network because it is the network, esbuild runs system commands, gettext reads files at compile time, plug_crypto uses :crypto. So you got a wall of false positives. Added ~50 entries covering the Phoenix 1.7+ ecosystem (bandit, thousand_island, phoenix_pubsub/html/live_view/live_dashboard/live_reload/ecto/template, postgrex, db_connection, telemetry_metrics/poller, gettext, swoosh, esbuild, tailwind, dart_sass, dns_cluster, plug_crypto, castore, nimble_pool, floki, websock_adapter, bcrypt_elixir, …).

Both fixes are in db64127. Added a regression test using a Phoenix-shaped mix.exs with both deps and aliases, plus a property test that generates random mix.exs files with both and proves no alias name leaks. Full suite is 62 properties + 457 tests, all green.

Pull main and try again — if anything is still noisy on your project I’d genuinely like to know. Real test cases beat synthetic ones every time.

2 Likes

Awesome! Thank you.

So how does allowlist works with versioning or over the time?
If we allowlist some packages at time T, they could be compromised at time T + N (in the future)?

I am thinking of what you mentioned in:

On March 24th, someone compromised the PyPI publishing token for LiteLLM, an open-source AI gateway with 3.4 million daily downloads.

As a thought experiment, let’s assume that we have had vet and litellm allowlisted, how would vet have prevented the security issue then?

1 Like

Good question, Ryan. It exposed a real weakness that needed fixing.

Short answer: the allowlist would not have caught a LiteLLM-style attack on an allowlisted package. But thanks to your question, as of commit 00d87ad, Vet now automatically diffs every dependency against its previous version on Hex, and those findings bypass the allowlist entirely.

How it works: Hex keeps every published version permanently. When Vet scans your lock file, for each dependency it fetches the previous version from Hex and compares the two. If the version transition introduced new dangerous patterns (compile-time env access, network exfiltration, new file categories), those get flagged as [VERSION DIFF] findings. These findings are not subject to the allowlist. The allowlist says “we reviewed this package’s existing behavior.” The version diff says “the behavior changed.”

Your thought experiment: If litellm were allowlisted and version 1.82.8 was compromised, Vet would fetch 1.82.7 from Hex, diff the two, detect the new @before_compilebefore_compilebefore_compilebefore_compile hook reading AWS_SECRET_ACCESS_KEY and POSTing it, and flag it as a CRITICAL profile shift. The user never needed to have 1.82.7 installed. Hex has it.

This works for first-time installs too. If you install litellm for the first time at 1.82.8, Vet still diffs against 1.82.7 (fetched from Hex) and catches the transition.

Vet also runs a lookback diff (3c616df), comparing the current version against the version from 10 releases ago (or the earliest available version if the package has fewer than 10 releases). This catches gradual introduction of malicious code across multiple small versions where no single step looks suspicious but the aggregate does.

Your earlier feedback about the aliases bug and the Phoenix false positives already led to two significant fixes. This question led to two more. The project is measurably better because you tried it on a real project and asked hard questions. You rock.

Pull main and try mix vet on your project. You should see [VERSION DIFF] entries for any dependency where the version transition looks unusual. mix vet --no-diff disables it if you want faster scans.

3 Likes

Awesome!

Feel free to QA that tool on famous Elixir repos (Ecto, Phoenix, Ash, Oban, etc.) and backtest it against known vulnerabilities so you can provide devs with a set of proofpoints / guardrails against attacks.

I love to see more of those meta code analysis tools in Elixir, whether it’s for refactoring, performance or security.

1 Like

Seems nice, but from quick check I see that for malicious party it would be trivial to avoid any detection, as I can simply do:

mod = System
func = :cmd

apply(mod, func, ["rm -rf /"])

And no detection will be done, as for Vet it will look perfectly fine.

1 Like

Thanks for poking at it. Vet does detect that pattern. The obfuscation check flags apply/3 and Kernel.apply/3 calls where the module or function argument is a variable rather than a literal atom (apps/vet_core/lib/vet_core/checks/obfuscation.ex, match_apply_pattern). Your example assigns mod and func as variables, which is exactly the shape that triggers the finding.

The reasoning: legitimate code rarely needs dynamic dispatch with a variable module and function, so the indirection itself is the signal. Vet doesn’t need to resolve what mod is at scan time; the fact that someone wrote code to hide what function gets called is enough. A correlation rule then promotes it to critical severity if the same dependency also contains network access, on the theory that dynamic dispatch plus network egress is an exfiltration shape.

Static analysis is a tripwire, not a wall. A determined attacker can always add another layer of indirection, so the goal is to make evasion itself look suspicious so the cost of hiding rises faster than the cost of detecting.

If you find a variant that slips through, let me know. Curious to hear what you try.

1 Like

You want to have separate GH issues, single, or I should provide list there?

I have opened some GH issues for low-hanging security breaches that aren’t caught by Vet:

1 Like

This is a great idea, and extremely timely considering the rise in supply chain attacks!

1 Like

Thanks! Fireship just published a video on a massive supply chain attack:

A rich hacker just penetrated 31 WordPress plugins…

I am hoping to get a lot of critical feedback from the Elixir community on Vet - I think it can be a key ecosystem advantage for Elixir to be highly resistant to supply chain attacks.

So please: attack Vet! Help me make it stronger!

1 Like

I see you provided the list below. Great! My preference is that its given on this thread in some fashion, as you have done, but otherwise what is most convenient to you. I appreciate the critical feedback, hauleth.

Thank you for filing #4 through #9. All six were real gaps. Fixes are live in commit 9d8d621.

The reasoning behind each:

  • #4 (:compile.file/forms): the code_eval check only covered Elixir’s Code.* family, which left the Erlang :compile module as a free bypass for dynamic code execution. The threat is identical, so the five main entry points now fire with the same severity as Code.eval_string.
  • #5 (File.open): the function list had read/write/stream but not open. That meant the pipe File.open!(path) |> IO.read(:eof) was invisible, which is worse than the direct File.read! form it was presumably bypassing.
  • #6 (Erlang :file): same attack, different wrapper. Elixir’s File.* ultimately calls :file.*, so checking one without the other just moves the evasion one character to the left.
  • #7 (:gen_tcp.listen): I had written the check with exfiltration as the threat model, which is why only connect was there. But a malicious package opening a listener for command-and-control is the same class in reverse, so listen, accept, and controlling_process now fire too.
  • #8 (UDP and SCTP): I simply had not considered them. Supply-chain attacks don’t owe us the courtesy of using TCP.
  • #9 (:socket): the low-level OTP socket API lets you do everything :gen_tcp, :gen_udp, and :ssl do while sidestepping all three. Added as a wildcard, because on a library the mere use of :socket.* is already anomalous.

The deeper issue your reports exposed: the detection was whatever patterns I happened to think of, not a reasoned map of the actual attack surface. A targeted unit test for each pattern couldn’t catch the class of bug you found, because the failure mode is a silently absent pattern, not a broken one. So I also added three layers that together bound the detection surface exactly.

First, a coverage sweep test that drives every declared pattern through the check engine. If someone removes or mistypes an entry in the check module, the test fails loudly instead of quietly shrinking what gets detected.

Second, a symmetric equivalence assertion: each check module now exposes its target list, and the test asserts declared-set equals swept-set. Adding a new pattern without also adding a sweep row now fails, because the other direction would go uncovered. That assertion caught its first regression the moment I wrote it, flagging 10 :file functions I had declared but forgotten to sweep.

Third, a StreamData precision property. For each of the three checks, 100 random {module, function} pairs are generated from a pool that excludes every declared pattern, and the test asserts zero findings. 300 adversarial inputs per run. The sweep proves recall, this proves precision, the equivalence keeps them aligned.

If you find more, tell me. Curious what you try next.

I have some:

  • :ssh and :ssh_sftp aren’t checked - I can freely open some connection to remote server or open remote shell
  • :ftp is not protected, so I can download or upload any data I want
  • :httpd is not checked, so the second “hidden” HTTP server can be ran
  • I can use :prim_file module - while it isn’t public API, I think that attacker will not give a damn about breaking API (and once upon a time I needed genuinely use that module, because “stable” API do not provide such functionality)
  • I can run any external executable using Mix.shell().cmd(…) (which is by the way an approach that should be used in Mix tasks)
  • :erlang.open_port is free to be used
  • I can use Tesla or hackney or other HTTP client just fine, the same with sockets where I can use procket library to open any unsafe socket
  • I can use defdelegate to circumvent any function call check
    defmodule TestMod do
      defdelegate foo(str), to: String, as: :to_atom
    
      def convert(str) do
        foo(str)
      end
    end
    
  • While I cannot use apply I can still write function like def call(mod), do: mod.connect(~c"i.will.not.to.anything.promise.dev", []) and be able to connect to remote server
  • I can use def "$handle_undefined_function"(name, args) to do more nasty things
  • I can use :inet_res functions to leak data by sending prepared DNS requests containing private data
  • I can use :epp.scan_file to read file content (it will be mangled, but in many cases it may be possible to extract required data from there

I appreciate your effort, the problem I see is that Erlang VM is just unsafe VM and I do not believe that there is a way to do it in a way, that will not create false sense of security. I strongly believe that tool that provide false sense of security is worse option than no tool at all.

What this tool currently provide is more of a sanity check against accidental data leakage than checking against actually malicious party trying to obscure their actions. It may be useful anyway, just not as a defence against malicious library providers (I deliberately do not use term “supply chain”, as open source authors aren’t “supply chain”).

If you want to secure against actively malicious parties, then:

  1. You need to think as malicious attacker. If someone wants to hide what they are doing, they for sure will not fall into trap where you can simply use “grep on steroids” to see that it is doing bonkers stuff
  2. In my opinion if someone want to hide what they are doing, then without higher level sandbox (either Erlang VM level or OS level) you cannot really catch that. This library can be circumvented in super simple way already - just write some Erlang module in your project, and you are safe to go and do anything, as it scans only Elixir code.
2 Likes

This sounds like a decent use case for LLMs? We definitely can’t get enough human eyes to audit Hex packages, but AI eyes are well within reach, although not free. It would make total sense for companies to run some sort of audit like that on CI privately, but maybe there’s a room for that at the ecosystem level :thinking:

2 Likes

That falls very much under

There are approaches for “community driven reviews” like Crev, however these also have trust issues - how to implement a library of trusted reviews.

Unfortunately that isn’t a simple thing to do, because that is not technical challenge, but social one.

1 Like

Fair point. Although I think at this point it’s worth taking a closer look at how false the sense of security can be. One extreme is “grep on steroids”, which we know is pretty false because we know exactly how to bypass it. Another extreme is deep human review, which is probably the best audit one can get, but it also can be classified as false: humans are prone to error and either have limited knowledge (if you do the review yourself) or rely on trust (if review is done by someone else).

My intuition is that Opus will be quite good as spotting suspicious stuff that doesn’t belong to a library. I would put it a lot closer to humans on a scale of false sense of security.

How hard do you think it is to obfuscate malicious code in a way that would pass security review from an LLM?

1 Like

IMO the mere fact of trying to obfuscate it would make it easier to flag. The hacker must be a godlike genius to be able to have completely innocuous code that passes cursory review.

…And even then, one prompt like “This library must strictly adhere to the activities X and Y. Do a thorough audit for outliers” will give you 80-90% certainty that even the innocuous genius insecure code would be found.

I, like @hauleth, don’t have complete trust in LLMs (and nobody should) but I’ve also seen Opus do stuff that’s close to magic and I have gradually learned to guide it in a way that makes the magic more likely to be produced.

It often does not. I do not have access to sensible AI reviewer, but I think it could hiccup on code that looks innocent to people, but is malicious when used together. For example a Vet (and I think neither will any other AI agent) will see anything particularly wrong in using :ranch_tcp.listen or stuff like that. Especially when you will do that across different libraries.

Obfuscation of the attack do not mean that the code is less readable, it makes code flow harder to comprehend.

One of the possible vectors of the obfuscation I see is that someone will write code, where in mix.exs they will do “innocent” mistake in form of:

defp elixirc_paths(:dev), do: ["lib"]
defp elixirc_paths(_), do: ["lib", "test/support"]

And now they will put malicious code in test/support/file_used_only_in_tests_i_promise.ex. Now it will contain module named for example NoHarmPromise.FooControler while in lib/no_harm_promise/foo_controller.ex they will have NoHarmPromise.FooController. In their router they did another silly mistake and they have written post "/nothing-imporant", FooControler. But now, for both the LLM as well as human, it may look like there is no harm really. Code compiles, tests passes, but the code has slightly “unexpected” behaviour.


It is getting off topic there, but I have tried to assess known and intentionally malicious code from Underhanded C Competition from 2015 (so any LLM should have it in their learning corpus). It failed terribly there in both Gemini as well as ChatGPT (I do not have accounts for other LLMs so I didn’t bother). Both have noticed some small problems (like NaN handling or potential division by 0), but none catches the elephant in the room.

I do not really believe, that with determined enough attacker (who also had access to all these tools) it could be avoided. Especially with Jia Tan-level of dedication to prepare the attack.

1 Like