GraphQL gives away to much info and this hurts it's security

Today I saw this newsletter:

The interesting bit is in the clairvoyance part of the newsletter, where it reveals us a new tool to reverse engineer a GraphQL API when introspection is disabled:

Clairvoyance allows us to get GraphQL API schema when introspection is disabled. It produces schema in JSON format suitable for other tools like GraphQL Voyager, InQL or [graphql-path-enum](https://gitlab.com/dee-see/graphql-path- enum).

Is the Absinthe library also vulnerable to this?

1 Like

Absinthe doesn’t have built-in functionality to disable schema introspection. Absinthe doesn’t suggest field name as it is described in the article. If you disable introspection by your custom mechanism (read more here How to disable Absinthe GraphQL introspection), then you should be safe. At least safer, than Apollo Server users. But, anyway, it is always possible to brute force any API to reveal field or endpoint names.

That’s a poor design decision in terms of security or even the lack of thinking on it and its implications while designing the library :frowning:

While it may be, but not enough to convince me to use the library.

It depends how you build them. Some APIs are also very easy, like when they follow the json-ld or hal specification, but I never used them, for the same reason I don’t use GraphQL, security.

If I write an API that only has my own apps has clients I just return 2 status codes 200 and 400, and the 400 just says Whoops, something went wrong!!! and a unique log id string so that I can lookup the logs for the specific error, exception for when is an error that the client needs to show to the user, like a form validation error :wink:

And when writing an API I can use Swagger or RAML to write its blueprint in order to use for building security when ingesting the requests, aka I will not allow any request to pass by that doesn’t comply with the blueprint contract :wink:

1 Like

The entire GraphQL thing is mostly aimed at frontend developer convenience IMO. I was heavily involved in working on GraphQL API gateways in two separate projects and while Absinthe itself is an absolutely excellent tool, the need for GraphQL itself always eluded me and to this day I can’t see why is it useful or even necessary.

Especially in terms of caching; REST APIs allow for heavy and deep caching while with GraphQL all bets are off. It dramatically increases the load on your DB (I mean in a default setup, obviously; in your own app you can utilize all sorts of custom internal backend caching and that helps matters… but that is just one more argument against GraphQL – you must work hard to compensate for its deficiencies – at one point the whole thing becomes too much complexity).

I am very sure I am missing something but at least in my practice I failed to ever find a good use of GraphQL.

2 Likes

All APIs are vulnerable to this. If you do not authenticate ids, you run into trouble.

Introspection has nothing to do with their issue. The browser network tab is enough to see the API calls being made. They could do this with old school HTTP POST + JSON and it wouldn’t make any difference.

Right, and Absinthe / GraphQL won’t let you send inputs that aren’t the right type or shape either. Swagger, JSON-RPC, etc, none of them have a way at the API schema level to enforce that you are authorized to use an ID. This must be done at the application layer.

3 Likes

I might be missing something here, but I don’t see the security issue here.
You should always authenticate and authorize all calls that are being made to your endpoint and not rely on hidden APIs.

3 Likes

It seems that you have not read the article with enough attention or you missed this in my topic:

The tools is to be used when instrospection is disabled :wink:

It’s not only you, a lot of us developers lack the security mindset from an hacker perspective, and that is one of the reasons that our industry is so flawed at the security level.

Nowadays, I work as a Developer Advocate for Mobile Apps and API security, but until 2014 I was one of this developers, that lacked a security perspective from the other side of the fence, but what open my eyes, was the video course Hack Yourself First: How to go on the Cyber-Offense from the well know security researcher Troy Hunt. This completely changed my perspective in how to approach code and I strongly recommend any developer to take this course.

Thanks for the resource, I haven’t watched that yet, will put it on my list.

Now, I do have a tiny bit of background in hacking and have worked with pentesters on some of the applications I’ve developed on, so I can see that Clairvoyance gives you an unwanted insight into your API.
But still it feels like depending on disabling introspection is just Security through Obscurity. So definitely not something to rely on.

1 Like

I am not advocating to rely on Security through Obscurity as the solo mechanism of defence, but contrary to some I think that security by obscurity should always be one of the layers of defense in your perimeter. Remember castles in the medieval age’s, they were built with layers of defence, that would put more challenges to enemy in order to get to its core.

So, in my opinion software development should employ security through obscurity as much as possible, but never rely on it as the solo security mechanism.

Developers need to keep in mind that attackers will chain exploits, weaknesses, conveniences and apparently harmless stuff to build their attack to get where they want to go.

So, while one thing in isolation may seem innocent, it can be of significant importance when combined with others, and anyone working in Security can confirm this :slight_smile:

Well, that just isn’t true :slight_smile: The DataLoader pattern is well known and widely used with GraphQL, and that provides an easy entry point to caching beyond per-query.

That aside … security!

So, the web is, in a word, fucked. It is probably one of the least-well designed set of technologies that is so widely used, and that says a lot as there are some truly bad ideas out there in wide usage. It has such an absurdly large attack surface ranging from client-side code that is dynamically loaded from potentially random locations with little to no sandboxing (so many patchwork bandages to address that have been deployed over the years…) to server-side APIs that are neither protocols nor designed with basic concepts like secure federated authentication / authorization or progressive discovery in mind. It’s a dumpster fire which we collectively limp by on.

Just look at at venerated Internet protocols which offer no API (let alone advertise capabilities!) until authentication is performed. You don’t get “you aren’t allowed to access that” messages, you just don’t get anything at all, socket closed. And even after auth, they will expose different views on its capabilities depending on what authorization follows. This is fairly basic security stuff … which most web technologies lack.

It isn’t all darkness, though. It’s one reason I love things like LiveView. The user authenticates, opens a socket over which authentication and authorization can be performed and then based on that capabilities can be provided. No peeking, no leakage required.

Before I moved an app recently to LiveView uploads, anyone could find out that there were various upload endpoints. Returning 404’s on incorrect uploads to pretend nothing was there would have made error recovery harder for legitimate usage (aka my own front-end apps!), and even then the latency differences between that and a “true” 404 would probably have been perceptible. Now that everything happens over a websocket and and upload capabilities are only ever advertised to users with sufficient authorization? Yeah, that hole is closed with no hackiness. (… and yes, I do use rate limiting)

As @dimitarvp noted, GraphQL is a front-end developer convenience (and, given the mess REST APIs can be, and often are, for non-trivial access to hierarchical data, a rather useful convenience), but the world ought to just own up to the fact that HTTP based services do not provide much in the way of security or privacy.

I have serious flashbacks to telnet, ftp, and pop3 … which, btw, were all things alive and going when HTTP was initially designed.

6 Likes

I have dumped the use of APis in favor of websockets usage via LiveView for this same reason. It’s doesn’t give away what your backend is capable of, like and API does, be it GraphQL powered or not.

If giving away what your API is capable of is a security risk, you have failed some very basic security requirements imo.

It’s like saying your house is more secure if people don’t know whether or not you have a back door.

Having a lock on both your front and back door is what makes you safe.
Any GQL design that is not enforcing a kind of authorization as a parameter on every call going from a resolver to your business layer (for non-public functionality) has most likely been designed wrong.

And GQL certainly solves a bucket load more than ‘frontend convenience’
Off the top of my head:

  • Documentation that lives INSIDE your api (Swagger that doesnt suck and is harder to diverge)
  • Very easy tool to design api contracts and shape of data before you implement anything in your business layer
  • Allows you to think in data more than business actions. That is an insanely flexible feature. It is a lot easier to nail your data shape than it is to up front nail your business exposure. This increases the possibility space for innovation while still adhering to a strict data shape so you don’t get a cancerous beast of an API that falls under its own weight.
  • Composability is a lot easier in GQL as the query and mutation submission through a document is a lot easier on the developer than composing REST apis. Even the more modern APIs that supply _self and other links you can navigate by.

A client wanted my frontend dev to build out a big data dump to csv because they were losing track of their master data records. It took him half a day building a big query, collate that into a table view with a simple csv export button.

Building that in SQL (backend) or through composing REST apis (client) would have been a nightmare.

Edit: Graphql is actually so powerful in terms of query composability that I am still using it even though I’m coding liveview, because it’s easier to just re-run a query (as in client doing refetchQuery on webbrowser) than it is to do anything else. So my CUD is talking to the Service (context) modules directly, while all R is done through a Absinthe runner. Works beautiful.

I expose a split schema since we have some external dependants so one Schema is only internal and one is exposed for clients over API. They share all types and enums so it’s super easy to stay in sync.

4 Likes

Thing is that these technologies do not just publish their API, the API is generally available on the network. So if there is a flaw in any part of the auth or other data management, it falls apart to anyone who has network access to it.

I read a blog the other week from a fellow who found a way to get at private Youtube videos (one frame at a time, no audio, …) by just such a flaw in a non-Youtube Google API.

Compare with protocols like SSH or IMAP: there just isn’t any access to the security-relevant API, full-stop, until authentication and/or authorization has occurred. IMAP doesn’t even have to tell you what the server is capable of until then (and usually is configured not to), but more importantly will reject any attempt to utilize that API until auth happens simply because the API is, by design, not even available until those earlier steps are taken.

This is obviously much easier to do with connection-oriented protocols rather than message-oriented ones like HTTP, but it isn’t impossible and also says something about the reliance on message-oriented protocols for security-critical infrastructure.

All that said, not advertising capabilities does make the intruders life more difficult and can help avoid exposing details about your infrastructure that can become next targets for attack. It’s why we typically don’t put our databases on publicly accessible ports, and why people looking for exploits will look at what sorts of API leakage (“oh, an S3 upload URL … let’s look at the security on that bucket, shall we?”) … it is defense in depth, though indeed not something that can be relied on. It just helps.

Yes, all those things are great. I really do prefer (at least for the right sorts of data) GraphQL to REST for those reasons. It’s not that one can not accomplish the same things with REST, it’s just easier and less error prone … for me that’s convenience :slight_smile:

This, however, that you note does go beyond convenience for me:

This is a really interesting point … GraphQL is protocol-independent. REST is pretty tied to concepts in HTTP … while you could duct-tape it to another underlying protocol, it smells deeply of HTTP, and it would likely be easier to take the principles behind it and reimplement it for the target non-HTTP protocol. GraphQL on the other hand works over whatever transport equally well.

That definitely does go beyond convenience.

1 Like

Could you expand on that, please? Haven’t followed Elixir for like half of 2020 – even if I worked with it a lot. :smiley:

Yeah, if this isn’t a good description of the modern web then I don’t know what is. :003:

An API giving too much away is like you put locks on your doors and then hide a spare key for it nearby. Or if you prefer, like putting an alarm and have outside what is the brand and model you are using.

Some years ago, I was were you are now, where for me developer convenience was above security, but I have seen too many things that went wrong because of this mentality in our industry, thus nowadays I value more security then developer convenience.

Unfortunately our industry seems to only pay attention when they are hit with a data breach, but even so I still have seen some still denying and continuing the same path.

I recommend you to watch the Hack Yourself First course, because it will be an eye opener :slight_smile:

Sure :slight_smile: I’ll use the file upload example I mentioned earlier, in part because it is still fresh in my mind… before using LiveView for that, there was a page with an upload drop area, powered by a javascript library of course, that pointed to an upload endpoint so the user could attach files to another entity.

That endpoint was wired to a function in the related Controller … and when that upload came in it obviously had no idea who was uploading the file or what the context was, so it had to lookup the resource that the upload was supposed to be related to, check the user authentication, and then make sure that based on the authentication that that gave the uploader the rights to access that resource. Also, since different users in the system have different upload limits, it had to check and enforce that as well.

Anyone with a network connection could hit that endpoint with a file. Think about that: it means that anyone could consume bandwidth and be sending rando file data to the server. So, that upload endpoint had to be very careful about the auth, needed rate-limiting (but not in a way to annoy allowed users!), needed to allow fairly large files through even though that data may end up being entirely discarded as later authorization based on the data may fail. And it had to do that on every request … and again, anyone on the internet could initiate a file upload to that website. Not … great.

Some of the logic for upload size was in Endpoint configuration, some in a wrapper for the Multipart upload plug, and some in the Controller. Ugh.

Enter LiveView uploads … when the LiveView page is loaded, authorization occurs and the stateful websocket connection enters a ‘trusted’ state: the app has confirmed who is on the other end of that connection. Even better, it looks up that user account’s rights: can they upload to this item? (Some users can access, or even edit, but not upload!) What is their file upload allowance? etc. It does this once on websocket setup (the “mount” in LiveView). Based on that information, it then allow_upload/3's all the uploads along with details like max upload size … it can ever vary things like timeouts and data chunk sizes (not that I do in this case).

It does this once. And only after authorization (implying authentication) has occurred. Nobody can get at the file upload code, at all, in that app until they have authenticated. There is no public endpoint for it anymore. Instead, it is a function on the server-side that accepts data (or not!) over the stateful websocket connection.

The security surface just dropped from configuration+upload plug+controller accessible to anyone on the Internet, plus a 3rd party javascript library that I also hope isn’t security trash, to a self-contained function that one must first authenticate to before have any access to at all.

It’s more efficient (one time per page, no matter how many uploads, and a smaller page size due to dropping that JS lib and it’s CSS), less things to get wrong (it isn’t spread out everywhere), and has a significantly smaller security surface.

4 Likes

Hiding a spare key nearby is sloppy security work and analogous to committing your keys to github etc. I am not preaching convenience over security, I am preaching a security approach where being open and explicit forces you to consider the security risks and mitigate in a pre-emptive way.

I get what you are saying, and it isn’t wrong, but there is also a reason it is best practice to not expose services one doesn’t need to, such as dropping an SQL server on a public port. The idea is not to rely on obscurity, but to make it harder for the attacker, or even better not giving them a larger security surface to explore.

1 Like

Thank you for the good explanation. :heart:

But doesn’t your example allow for a malicious client to just dump 10GB of file into the un-authenticated socket? What’s to stop them? Is there, like, a load balancer or a reverse proxy that will cut them off before they even get to 1MB if they are not authenticated?