In short: Mongo or NoSQL databases are something you should seek out only when you have a problem that you can’t address with a SQL database.
I’ve been following Mongo since the very beginning. One of their biggest early users was Server Density who for years had the largest Mongo deployment that existed. They blogged about it a lot.
What Mongo is and what most NoSQL solutions are really are solutions to niche use cases. For Mongo those use cases are:
- Huge DB write bursts
- Unstructured data
- Data sharding
Mongo jumped on the scene early because it won a lot of benchmarks. The benchmarks happened because they allowed async DB writes which were buffered in RAM rather than having to be written to disk. When the RAM buffer fills up, Mongo slows down dramatically so if the write volume is closer to a constant level, Mongo will struggle. Winning benchmarks gets a lot of developer eyes though.
Unstructured JSON data is good in certain scenarios but it complicates functionality like indexing and it adds significant extra storage space by duplicating keys on every field. Most situations where it is used aren’t really well justified and can be better served by defining your consistent fields with columns and then storing the unstructured bits in a separate field. With most SQL databases offering variations of a JSON column.
Data sharding remains the real selling point for NoSQL data because the forced lack of joins simplifies spreading data across multiple servers.
Today, if you still care about benchmarks and you need everything that Mongo offers…you’re better off using Couchbase…but very, very, very few people actually need either of these.
Server Density was the actual use case that legitimately made sense for Mongo. They ran agents on your machine that would track data from every random piece of software that you were running and then feed it back to them every minute. The data getting sent back could be different for EVERY single agent that was running and they were getting bursts every minute.
Usually if you need those types of features it reflects a small subset of your overall data structure and even Server Density acknowledged that, only using Mongo for their collection system and using a SQL DB for everything else.
Sharding is also an abstract problem that is totally dependent on the type of data that you have. If you’re running a huge system that lumps data together in a big bucket, using something like that might make sense. For a lot of systems that are SaaS where data is really associated with a customer who’s signed up, a solution like Citus will let you shard straight out of SQL to scale your database horizontally…without losing all of the capabilities of SQL.
What you’re left with is the in memory focus of the database for speed sake…but that entirely depends on what you’re using it for which could be better served by a number of other options.
Personally, I’d classify all NoSQL databases including Mongo as something that you only reach for when you have a very specific limitation of a SQL database…and when you reach that limitation, the solution you reach for is going to be based on the specific problem for the specific data; not the specific solution or all of the data. You could just as easily end up with Cassandra, Hadoop, CouchDB, Couchbase, Citus, some type of columnar data store or even something like GenStage as you could end up with Mongo.