What is the average profile of an Elixir's developer?

Background

On a recent discussion with my team, someone brought up the fact we should probably not rely too much on Hex.pm because some of the packages there might be created by people who know very little about programming and that this is a viable attack vector. This person was making a clear comparison of Hex.pm to NPM.

As we know, basically every week we get notifications from Github’s bot telling us some security issue was found in some JS library some project uses.

NPM has, over the time, become quite infamous regarding the quality of its content:

My counter-argument to this was that, in general, Elixir developers have more experience than JS developers. As an example I mentioned the team behind Phoenix and Ecto, where many of people working there have previous working experience in Ruby. Similarly, a considerable portion of the Elixir community came from Ruby.

However, even though I believe this to be the common profile for someone doing Elixir these days (after several talks I had with members of the community) I lack real data to make a point.

I also understand that I cannot generalize the opinions of a select few individuals from this forum and apply them to the whole of the community.

So basically my argument is quite poor. I argue that:

Elixir developers are usually people with more experience than JS people, and most of Elixir developers come from other languages, like Ruby. For this reason we should not worry that a toddler writing a package for NPM is going to do the same for Hex, because the developer’s profile for Elixir is quite different, and by default more experienced.

My idea of Elixir’s developer profile needs a citation.

Research

While I was able to find an SO survey where Elixir developers are overall better payed than other developers:

I could not find a direct link that says: “Better salary means you also have more professional experience”.
For this reason I cannot support my claim either.

Question

  • Are there any studies or articles that have a view on what is the average developer profile of an Elixir developer (regarding years of experience) ?
  • Do you think it is fair to compare NPM with Hex, alongside with its issues? (do they suffer from the same ?)
2 Likes

I would likely sidestep the immediate questions and wonder if it actually makes sense to have answers for them. There’s imo different ways to look at the problem you seem to face.

Are third party dependencies possible attack vectors?
Yes – no matter how one turns the coin.

Having acknowledged that there’s two options:

  1. Not using third party code and writing everything on your own.
    Works, but isn’t necessarily effective. (Also there’s no guarantee you’ll do better)
  2. Using third party code, accepting the risk, but doing the best to mitigate it.

For option 2 what can be done lessen the possibility to accidentally pull in code, which poses a security risk to whatever the code was pulled into?

You seem to want to argue about the people building hex packages. I don’t think one can make a good argument over the whole of all developers with packages on hex. Also everyone of us makes errors and mistakes – even the most senior/experienced of developers. So imo your argument will always be kinda moot.

There are other means of reducing risk though, which is review. Whenever a library is added it goes through a review process (in however fancy form that may be, could even include how much “professional experience” you expect out of the author) and is only used once reviewed. Whenever the library is updated you can use e.g. diff.hex.pm to make sure nothing changed in a way violating previous review checks. Some companies even run their own internal registries, which holds only reviewed packages. One actual benefit of the elixir ecosystem over the node one is that it’s likely much easier to go the review route, as you’re not running with 1000s of dependencies. I’d expect that number to be more in the 100-200 range for most projects.

Github also recently added elixir/erlang into it’s security advisory database. That can be another source of being made aware of issues.

9 Likes

So, in your opinion, Hex is not really that different from NPM, despite the community differences, correct?

One factor is probably also that Javascript is slightly more common than Elixir. So the bigger the community the bigger chance you have to infect someone. Same reason why there is a lot of virus etc for windows. Just because it’s so much more common than for example linux (for desktops at least). If I wanted to attack the max amount of projects and we assume JS and Elixir devs are exactly the same amount of good when it comes to security (probably not true but lets assume) I would still choose JS because its just more potential victims.

4 Likes

Deciding generally if you should use packages from a certain ecosystem based on the perceived quality of the average developer in that ecosystem seems strange. Not using Hex packages and rewriting them from the scratch seems like a huge amount of wasted effort and will likely lead to less reliable software. How long do you think it would take to get a feature parity with packages such as Phoenix or Ecto, not to mention fix all the bugs that have already been discovered by their thousands of users?

You need to make your decision on a package by package basis to determine if you can trust it. We have some tools to help you with that such as https://diff.hex.pm and https://preview.hex.pm

12 Likes

I’m going to put my “Pointy Haired Boss” hat on and echo some of what the others have said, but hopefully add some food for thought.

I think there’s a problem with the premise that underlies your posting which I read as: generalizations about a community are useful in making software library choices. Part of the problem is that of timing. Before you adopt a technology (the BEAM and it’s languages, JVM, etc.) I think the kinds of generalizations about how robust the community and resources are for finding ready-made libraries can help guide your decisions about whether to bother taking the first step on the path of adoption or not. However, once you’re on that path, as @ericmj has said: you can only make choosing a library with information discovered on a case by case basis; the generalizations and estimates become useless because any one library will vary in quality. If you find the ecosystem to be by and large reliable, but somehow manage to pick the one library that is managed unprofessionally or maliciously: you’re screwed. If you’re trying to use the generalization to avoid the heavy lifting of case-by-case evaluation, you’ve already set yourself on a risky path. You might rationalize an argument that a more professional community would be less risky than a less professional one, but you’d still have the fundamental problem of inadequate diligence by your own team and you still wouldn’t know if you were relying on reasonably solid libraries or stinkers.

So once you’re in a technology and you find a need for a library: you’d be foolish to discount a service where most libraries are found: for Elixir it’s Hex… at least as a starting point. Once you find a library, then you decide whether or not it’s going to be useful and trustworthy. There are paths that don’t rely on line-by-line evaluation to disqualify them or make them candidates. Look at how the library is managed, has a community clustered around the library? is there only a single maintainer? Do the maintainers have some credible reputation in the community? how long has it been around? how are PRs handled? Is it keeping up with the rest of the ecosystem (e.g. does it support recent Elixir versions)? Are there meaningful code reviews of PRs? etc. All of that can help you save time prior to jumping in. Also know what you’d do if that library became unreliable or unavailable. Of course, it goes without saying to not use libraries at all without compelling reason. If using a library is going to save you a couple hours… it may not be worth it. If it’s going to save you a month or it provides something your team isn’t expert in (maybe encryption for example) then it may well be worth it.

Finally, there is one general evaluation and re-evaluation I think is useful. How well is Hex itself being managed? Much of the problem of NPM was how the service itself was managed and the insecurities or policies allowed for libraries to be compromised or to simply disappear in destructive ways to those projects using them; this isn’t a problem of community but of the service’s management team. This can change over time which is why I think you should check it every so often.

4 Likes

Wait until they find out about programming languages. And operating systems. And embedded firmware.

If there’s code running or hardware hardware-ing that YOU didn’t design, there’s some level of supply-chain risk. The question is deciding what level is acceptable for your organization, and then verifying that things meet your requirements.

“Well I’ve heard this community’s developers are smarter / better / faster / longer than $OTHER_COMMUNITY” isn’t a part of that process.

7 Likes

I understand. Even strong-manning my argument, and assuming Elixir developers are the best in the world, this would still not cover for those developers that would actually want to create hex packages to serve their malicious intentions.

I also agree with this.

Overall, it I think the best argument here is: “Use 3rd party packages, but be mindful of their code”.
I understand this means you have to 100% understand what you are using.

Now, to counter this argument, I will say that the amount of additional overload any developer would incur when using a library this way would be gigantic. I have never met someone in my professional life that knew Phoenix inside out, never mind Phoenix + Ecto + Liveview.

Sure, they grasp the basic concepts. But pretty much everyone using them does (they literally have to, it is the barrier to entry).

Now scale this to apps using those frameworks, plus 100s of other dependencies. It becomes an impossible exercise.

At this point, some level of trust must be given. This is my main problem here. I don’t know the answer to: “when do you start applying the principle of trust?”

So I choose to trust Hex packages by default until proven otherwise.

Is this the best course of action?
I am not sure. Hence this discussion.

@sbuttgereit makes some very good remarks using his boss hat.
However, I have some questions regarding this. Let’s take for example, cowboy: GitHub - ninenines/cowboy: Small, fast, modern HTTP server for Erlang/OTP.

Many would say cowboy is a staple app in Elixir’s ecosystem today. I agree. But if we check, the most recent changes were made 14 months ago and some of them 2 years ago. Would you, based on the lack of apparent updates, discard this library?

I guess what I am trying to say is: How do you @sbuttgereit decide when to use (or when not to) a library? What are your personal checkpoints? (do you accept only a library if it has been updated in the last year? If it has more than X thousand users? If the estimation of time saved surpasses Y metric?. Or do you and your team have a more informal discussion about this?)

To summarize, I believe I have not yet found an algorithm to reliably answer the question of:

If you guys know of any blog posts or online resources discussing this topic, please feel free to share.
I did find this annedoctical story of someone who was brilliant and went on the direction of not using any dependencies whatsoever:

2 Likes

No, you wouldn’t discard it for that reason if only for the reason that community does still consider it a staple component and that it is used in clearly well maintained packages. I’ve offered some indicators for library health that might save in depth technical commitment to evaluation, not a pass/fail checklist. All the short cuts I offer are mere tea leaves to be read that, in their totality, may give you shortcut to a decision assuming that the original premise seeks to avoid the resource commitment of a detailed analysis prior to a decision to use. And any shortcut/estimate will be imprecise. Estimates will occasionally be wrong, too. So it’s possible you might dismiss Cowboy depending on how you judge those shortcuts… that’s the cost of taking shortcuts. Of course, dismissing Hex generally suffers the same flaw that my approach has except the blast area is bigger.

My answer here will be unsatisfactory to many looking for a hard/fast criteria to apply. The kinds of shortcuts I think can have practical value in deciding to commit resources to more thorough evaluation are at best fuzzy. This is why I say I made my comments with my manager’s hat on. Ultimately I ask those questions, and others, to decide if the totality of the picture I get by the answers I find, filtered through my experience, justifies going deeper. Ultimately it boils down to a gut feeling about “am I wasting my time here or not”. Every other less fuzzy valid approach will take time and dedication which is what I’m suggesting can be saved in many cases if the original motivation was to avoid the time commitment that relying on Hex was going to take.

2 Likes

I think what he’s trying to say is that you, as a team, have to make that judgement for yourself, and then depending on how you feel perhaps look at the options he suggested.


I also agree with what has already been said that you really must carry out due diligence and ascertain for yourself how trustworthy a package might be either based on the code itself or how you feel about the author/team who wrote it.

I do think you make an interesting point about how communities differ tho, and again, aside from you having to make that judgement yourself I would like to add some thoughts.

  • In a way, we do have a type of community review process here as many library authors post their packages on the forum and you often get comments and suggestions from others. More recently, as the community has been growing, we’ve also added a ‘Seeking Feedback’ section as well as a Mentoring section for when people want to start sending PRs to established packages or tools. How many other language related forums/communities are continually adapting or have things like this? Is this ‘worth’ something to you and your team?

  • Something that might help you weigh things up is to look at community and culture as a whole. Perhaps specifically, the culture of learning. There’s no surprise why publishers like PragProg have released so many Elixir books - it’s because they sell. That means a lot of people are using professionally published ‘high quality’ resources to learn the language and the tooling around it. In fact as a community we go a step further. I’m sure you’ve no doubt noticed we have given away a lot of books here on the forum over the years and have often mentioned that even when you don’t win yourself - you still do - because that knowledge, even in the hands of others, helps go towards more informed Elixir developers which can lead to better libraries and tools, blog posts or even answers to questions on places like this very forum. There’s also another reason why these giveaways have, in my opinion, been highly beneficial, because many people entering them may not traditionally have been book readers or aware of just how valuable they are. I am a huge fan of professionally published books and naturally, when you value something so much you want to tell all your friends about it because you feel they can benefit too - and what better way to get people to appreciate the value of something than by giving them a taste of it! I have no doubt that our efforts have helped many people realise the benefit of high quality learning material and I have no doubt this has had a positive impact on the knowledge-base of the community.

So in answer to your question, yes, I personally feel more confident about the standard of Hex packages and the Elixir and Erlang community than I do about many other languages and eco-systems for all of the reasons above (and more!)… but like everyone has been saying, you need to make your own mind up about this.

6 Likes

Hot take here, but as a 15 year Ruby developer, I don’t think this is furthering your case…

My background is of C/C++, and I’m currently very interested in Rust… but even still, I consider my 15 years in Ruby to basically be “the dark ages”.

The Ruby VM is bad, and because it’s such an “easy” language, it’s filled with the same type of people you’re professing to be bad for the Node/JS community.

I know that sounds gatekeep-y, but hey… I was one of them for 15 years. And I’m not trying to shut anyone out, hell, I’m evangelizing functional programming and trying to share what I’ve learned by getting into Elixir.