Why/When Not To Use GraphQL

I have an app that will have multiple nodes running in parallel from different locations in the world.

3rd parties will query the node network through an API.

Nodes will be querying 3rd party Rest and FIX APIs.

Nodes will be making some overlapping & some unique queries to each other.

If 2 or more nodes are querying the same 3rd party data I want them to share that data between themselves in a shared cache.

Some data will expire in milliseconds. Some as high as 5-10 minutes. Some APIs have significant hit rate constraints, others don’t.

Shared cache is to add speed & robustness to the network, leveraging the queries of all the nodes.

With that basic outline, is there any reason why not to use GraphQL for the query language for both node-to-node communication & for 3rd parties?

Edit: added more details.

2 Likes

different locations in the world

Network would become the bottleneck then, probably. So you’d want to minimize the amount of data you send between the nodes. GraqhQL uses json by default, which is not as optimized as a custom binary protocol.

for 3rd parties

They would have to use GraphQL clients which is extra learning for the them if they haven’t used it before.

node-to-node communication

The main advantage of GraphQL, at least for me, is that it is somewhat future-proof, such that I can change the code on the backend, and my clients (iphone app) won’t need to be updated, their queries would still work. You probably won’t need that for node-to-node communications since you control the nodes’ code versions.

4 Likes

For the clients, it should work fine, as idiot says it makes changing things easier, as you can just add fields, or deprecat others and the clients will continue to work until they decide to use the new features.

Depending on what you are sending between nodes, it might be a bit too much overhead.
If it is a shared cache type thing, could you not use something like Mnesia or Riak? Or if you need to send actual messages maybe just simple JSON, or protobuff/cap’n proto if you want smaller.

1 Like

If we go the binary approach, sending data to the other nodes overlapping the query, we would have to code a layer to manage the data parsing. Or we just keep it simple & send everything to every other node. But then we’d lose the benefit binary gave us.

With graphQL we’d have that built in right? (I have not yet made a graphQL, and only have a cursery understanding of it now.)

What exactly do you mean you’d have built in in graphql? Getting a query on one node from a client and fetching the missing parts from the other nodes?

BTW to add to @benperiton, if you end up using mnesia, don’t forget about https://github.com/uwiger/unsplit.

1 Like

I’m thinking to use GraphQL for node-to-node because the overlapping node data changes fast. From a cursory look it looks like GraphQL would save a lot both in code burden & data sent between nodes? Unless I’m missing something.

GraphQL can use a schema covering multiple nodes.

In fact, GraphQL, can merge multiple APIs at the same time, into a bigger one.

GraphQL can use a schema covering multiple nodes.

Yeah, but so can REST as well …

GET /users can return data from many distinct nodes.

It’s more that You may not need the nodes to talk together, and have a main entry point that would manage all the queries. But it’s true Rest can do it as well :slight_smile:

GraphQL add a lot of flexibility when building client side.

Not in a built in way. The cutting edge in GraphQL these days is what’s called “schema stitching”. The way it works is that you have Services [1, 2, …] and they each have a GraphQL schema that tackles some set of functionality. You can then create Service X that sits in front of all of these and, using introspection, automatically stitch together the schemas from Services [1, 2, …]. Then clients make a graphql request to X, and X transparently hits all the subservices to get the various bits.

Everything that GraphQL does can totally be accomplished by writing it all yourself and exposing it via REST. In the same way, everything SQL gives you can be accomplished by just maintaining some CSVs locally. It’s nice not to need to wire that on your own though…

7 Likes

GrafhQL is BFF concept (Backend for Fronted) https://www.thoughtworks.com/insights/blog/bff-soundcloud
I would use GraphQL for Client (Mobile/Web) <- > Server Communication

But I would not use GraphQL as Service to Service communication especially in the same network like micro-service architecture. I think there are better options like google https://grpc.io/ …or apache kafka.

2 Likes

Can you elaborate more… why not GraphQL on the node-to-node communication (keep in mind the data shared between nodes is only the overlapping calls & changes second to second).

Just my opinion…

Ask vs. Tell

Tell, Don’t Ask
Alec Sharp, in the recent book Smalltalk by Example, points up a very valuable lesson in few words:

Procedural code gets information then makes decisions. Object-oriented code tells objects to do things.
Alec Sharp

That is, you should endeavor to tell objects what you want them to do; do not ask them questions about their state, make a decision, and then tell them what to do.

from Tell, Don’t Ask

Now I think this can be extended to distributed processing as:

You should endeavor to Tell nodes what you want them to do; do not Ask them questions about their state.

GraphQL is an inherently Ask-style protocol and given that it was primarily designed for extracting data-of-interest from a business data model for the purpose of rendering it on a view it is justified to simply Ask for the data.

However when it comes to distributed systems it usually scales better to have the nodes communicate through Tell-style interaction patterns.

will have multiple nodes running in parallel from different locations in the world.

Geographical separation typically translates to latency between nodes. So you want to keep the number of trips between nodes “per operation” to an absolute minimum …

Some data will expire in milliseconds.

… otherwise your data may be stale before you even have a chance to aggregate it …

Nodes will be making some overlapping & some unique queries to each other.

… and this wouldn’t be helping the situation either.

If 2 or more nodes are querying the same 3rd party data …

How would this orchestration fare in the face of global-magnitude latencies?

share that data between themselves in a shared cache.

Again a shared cache would add more global trips.

Seems with these these types of constraints each node needs it’s own active dataset:

  • If an incoming API query lands outside of the active dataset the node queries the necessary 3rd party APIs and aggregates the result.
  • Any freshly aggregated result is communicated with the sibling nodes either via direct communication or via a distribution node.
  • Any derived data (analytics?) is distributed in a similar fashion in order to make multi-node queries unnecessary.

This is more of a Tell-style of operation - i.e. nodes distributing information to other nodes that may need it - so that they don’t have to waste time Ask-ing around.

  1. In a strange way this is somewhat inspired by the Datomic architecture where each client program has an active data set (cache) that is kept in sink by the storage server with the latest writes.

  2. Querying across multiple nodes could be viewed as a variation on the Reach-in Reporting Antipattern (Page 19: Chapter 4 - Microservices AntiPatterns and Pitfalls).

9 Likes

///wisdom///

1 Like

Hmm GraphQL is a query language nothing is preventing you from doing queries over Natsd, Kafka or whatever else you might find useful for a given use case.

2 Likes

You can use subscription based GraphQL between nodes.