API url security

I work as a Developer Advocate for API and Mobile security, therefore I feel that I have the duty to inform you about this:

Please don’t use numerical ids to access resources on your backend, because all it takes to an attacker to enumerate all them is to increase an ID by one.

The use of an ID, also known as Broken Object Level Authorization, is the first risk listed in the OWASP API TOP 10:

I know that Phoenix and a lot of other frameworks, in all programming languages, encourage the use of the ID, but that is only to make the frameworks more appealing to use by developers, because it makes so dam easy to build a CRUD demo.


To build a :slug I use this:

defmodule UtilsFor.Text.Slug do

  def slugify(text, symbol)
    when is_binary(text)
    and byte_size(text) > 0
    and is_binary(symbol)
    and byte_size(symbol) > 0 do

    text
    |> UtilsFor.Text.Trim.to_single_whitespace()
    |> String.downcase()
    |> String.replace(~r/[^\w#{symbol}]+/u, symbol)
    |> String.replace(~r/#{symbol}#{symbol}+/, symbol)
    |> String.trim(symbol)
  end

end

I use the above to build a slug in my web app: https://video-hub.exadra37.com/watch/exadra37/why-we-built-our-own-distributed-column-store-by-sam-stokes

5 Likes

What would be the attack vector in this case though? Of course, for certain resources, don’t leak any kind of unnecessary details, but in this case I don’t see how this applies.

StackOverflow and lots of other services follow the /:id/:slug pattern.

Take your StackOverflow link, for example:

https://stackoverflow.com/users/6454622/exadra37

  • change the slug part and it will always redirect you to the correct url with the current slug
  • this allows for infinite slug changes without having to keep a history to do the redirects
  • you simply check if the slug matches the current slug, otherwise do a 301 Redirect to the correct path
1 Like

And increment my ID by one https://stackoverflow.com/users/6454623:

And you get the next user after me in StackOverflow.

Now, I can write an automated script and esily scrape the data of all StackOverflow users and build a database with it, while if the links had not the ID, but only the username it would not be possible to do it.

You can read in more detail about Broken Level Object Authorization and why is the first risk at the OWASP API top 10:

1 Like

True, but these are all public facing pages. Discourse uses a similar pattern /:slug/:id. If something needs authorization, it needs to go through proper authorization. Security by obfuscation is not security.

I’ve read the guide and it makes the point about “sensitive data”. My point was that this scenario (an article url) is not such. The StackOverflow user’s pages I would personally include it as “sensitive data” though, even if SO doesn’t seem to think so.

4 Likes

This is highly contextual. In many cases it’s perfectly ok, and even useful, to be able to enumerate resources. In case of blog posts, which are already public and do not constitute a secret, I don’t see any risk in being able to enumerate them (but again, it depends on the specific application).

Security is a lot about understanding context :slight_smile: what you advise for is very important in some specific contexts, but should not be “mindlessly” applied as a rule.

That said, this is going a bit off topic.

4 Likes

Just because they are public facing pages, they should not make it easy to be scrapped and have my data as part of databases outside the service I am using.

Oh, the old myth :slight_smile:

Security is all about layers of defense, as many as you can afford and need. Like medieval castles or prisons have many layers of security, applications should also have, even if the ones that a lot like to call security by obscurity.

I agree that’s is about context, but the default of using IDs everywhere creates the habit of using IDs by default, even in situations where they should not be used, as I see regularly due to me line of work.

By other words, using IDs by default becomes muscle memory, thus developers use it without thinking in its consequences, while if the default of the industry was to not use them in first place, then they would be only used in the situations that they are really needed, that its rarely.

1 Like

Do “binary IDs” follow a predictable sequence? I am talking about these: mix phx.gen.context — Phoenix v1.5.8

I don’t know, but even if they allow you don’t want to have an URL with this type of value sample_binary_id: "11111111-1111-1111-1111-111111111111"

Sure, but it’s probably better to have ugly URLs instead of insecure URLs.

Instead you use slugs in the url, like I show in the topic:

Plus this type of URL is very SEO friendly… Its a Win Win :slight_smile:

I seriously don’t understand what’s the problem. :017: When the DB resources are hidden behind authentication (maybe also authorization) then what can an attacker actually do? They change the URL and receive a redirect to a root page — this is what’s happening in 99.9% of the web apps I’ve written.

So let them bang their heads against the wall, fine with me.

If they can’t penetrate the authentication wall then good luck to them. :102: If they can steal somebody else’s account then well, nothing much can be done at that time.

Broken authentication/authorization mechanisms is what we find more when pen-testing any backend system, and that’s why it’s the number one in the OWASP API TOP 10 risks, and that’s why they also call it Broken Level Object Authorization.

For example, I am a logged user in your application and my sensitive data is under https://example/user/1234 and this is an enpoint protected with user authentication, but it’s common to be able to switch the 1234 to 1235 and get data that we are not authorized to see, because the backend has a broken authorization system. By other words is checking the user is logged, but is not checking the user is authorized to see the record.

In such a broken authorization system I can just write a simple script and dump the entire sensitive data that I was supposed to not be able to access.

The mHealth study that the company I work for made on 30 mobile health apps and their backends found this exact problem, were Alissa Knight(a famous security researcher) was able to access other patient data records, exams, etc just by incrementing the ID of it. She was a logged user, but she was not authorized to see that records, but she could see them anyway, just by incrementing the ID by one. This was not seen in only one backend, but pretty much in all of them.

1 Like

I understand that security should be layered.

But you’re describing really rookie mistakes. If my authentication system is broken I’d not sleep until I get it right. Not to mention that I’ll definitely test it properly to remove the possibility of the most trivial attacks.

Sure, using UUIDs mitigates the data exposure… somewhat. But do you really think that scraping data would be the attacker’s first priority? I would think they will try to revoke all other users’ privileges first and gain complete access.

I don’t have experience in this area. Have you seen this happen? Somebody gaining partial access and just jumping to scraping data usually hidden behind an authentication wall without first trying to do a complete takeover?

Rookie mistakes? It may be in some cases, but in the majority of them they are done by senior developers.

The study that I linked was done against mobile apps and backends that have milllions of users and are from big corporations or startups.

You can have authentication and authorization working correctly, but if they aren’t invoked in the right places, then you expose data that you should not be exposing, and this happens a lot in complex backends, despite the authentication/authorization code itself being well tested.

For example, you add an endpoint to your application and you put it behind authentication, but then you forgot to check if the logged user is authorized to access to it. Another scenario is when some code is touched and some dev accidentally removes authorization or relocates it to a place were is not effective anymore.

I am pretty sure that you know that authentication is not the same as authorization, but this is often a source of confusion for some developers, therefore for anyone reading this and getting confused just google authentication vs authorization.

That’s what we call a data-breach and it happens every week in a very regular basis, and one of the reasons why GDPR now exists.

Health data form people is very valuable in the dark markets :slight_smile:

Slugged URLs either break when content is updated (and the slug changes with the update) or can leak information that was subsequently redacted (if the slug doesn’t change and the redacted material is in it).

A sufficiently-smart slug implementation can track those changes and issue redirects, but it’s a lot of extra work.

The redaction scenario is unlikely but possible; depending on your application, it may be more likely than the “authorization system completely fails” scenario.

Sorry, but I am not able to follow you here. Can you explain with other words?

@Exadra37 thanks for reaching out here, but I’d like to correct things, as I’ve also done security compliance covering this!


It is incorrect to say “using IDs in the URL is unsafe”. For example, in the example, “slug” is now “id” :wink: - isn’t it?

It’s rather this: using predictable IDs may contribute to more information disclose when other parts are broken, by allowing attacker to guess and iterate resources easily, when bad things happen.

There are several reasons to avoid numeric, auto-incremental IDs, and I prefer uuid or similar random ids over integer. However, calling auto-incremental ID is insecure is incorrect.


Let’s get back to the item “API1:2019 Broken Object Level Authorization”. It doesn’t say “don’t use numerical ids”. It’s basically saying “do the object level authorization”!

  • API1:2019 Broken Object Level AuthorizationAPIs tend to expose endpoints that handle object identifiers, creating a wide attack surface Level Access Control issue. Object level authorization checks should be considered in every function that accesses a data source using an input from the user.

Also see the “How To Prevent” section

  • Implement a proper authorization mechanism that relies on the user policies and hierarchy.
  • Use an authorization mechanism to check if the logged-in user has access to perform the requested action on the record in every function that uses an input from the client to access a record in the database.
  • Prefer to use random and unpredictable values as GUIDs for records’ IDs.
  • Write tests to evaluate the authorization mechanism. Do not deploy vulnerable changes that break the tests.

See - it says “prefer to use random…”, not “do not use …”.

Technically… GDPR adds a few requirements “related” to security (e.g. notification of data breach) but is of data protection in the context of privacy, not general security. Nitpick!

7 Likes

Thanks for your excellent post and explanation :slight_smile:

I mentioned numerical ids :wink:

I know it says, but as I said:

We disagree here. In my opinion they are insecure by nature, but I agree that may be use cases were they will not harm, but as I say above a developer creates the habit of using it and then it uses it everywhere without knowing the consequences.

I know about it, and I also know that in real life this systems are easy to be misconfigured, as I say here:

Don’t get me wrong here neither get it personal, but security compliance is often about ticking boxes to get certified to be in business.

Being secure by meeting only security compliance standards is not enough and hacker studies and breaches prove that.

And in practice one of the main reasons was the constant grow of data-breaches happening, that leads to violate your privacy, because your data is not private anymore :slight_smile: