Why/When Not To Use GraphQL

peerreynders · February 5, 2018, 2:12am

Just my opinion…

Ask vs. Tell

Tell, Don’t Ask
Alec Sharp, in the recent book Smalltalk by Example, points up a very valuable lesson in few words:

Procedural code gets information then makes decisions. Object-oriented code tells objects to do things.
— Alec Sharp

That is, you should endeavor to tell objects what you want them to do; do not ask them questions about their state, make a decision, and then tell them what to do.

from Tell, Don’t Ask

Now I think this can be extended to distributed processing as:

You should endeavor to Tell nodes what you want them to do; do not Ask them questions about their state.

GraphQL is an inherently Ask-style protocol and given that it was primarily designed for extracting data-of-interest from a business data model for the purpose of rendering it on a view it is justified to simply Ask for the data.

However when it comes to distributed systems it usually scales better to have the nodes communicate through Tell-style interaction patterns.

will have multiple nodes running in parallel from different locations in the world.

Geographical separation typically translates to latency between nodes. So you want to keep the number of trips between nodes “per operation” to an absolute minimum …

Some data will expire in milliseconds.

… otherwise your data may be stale before you even have a chance to aggregate it …

Nodes will be making some overlapping & some unique queries to each other.

… and this wouldn’t be helping the situation either.

If 2 or more nodes are querying the same 3rd party data …

How would this orchestration fare in the face of global-magnitude latencies?

share that data between themselves in a shared cache.

Again a shared cache would add more global trips.

Seems with these these types of constraints each node needs it’s own active dataset:

If an incoming API query lands outside of the active dataset the node queries the necessary 3rd party APIs and aggregates the result.
Any freshly aggregated result is communicated with the sibling nodes either via direct communication or via a distribution node.
Any derived data (analytics?) is distributed in a similar fashion in order to make multi-node queries unnecessary.

This is more of a Tell-style of operation - i.e. nodes distributing information to other nodes that may need it - so that they don’t have to waste time Ask-ing around.

In a strange way this is somewhat inspired by the Datomic architecture where each client program has an active data set (cache) that is kept in sink by the storage server with the latest writes.
Querying across multiple nodes could be viewed as a variation on the Reach-in Reporting Antipattern (Page 19: Chapter 4 - Microservices AntiPatterns and Pitfalls).