What do you think about Mongodb as a technology?

mongodb
database

#1

Well, so at first I was going to ask about Phoenix/Elixir mongo-db support due to the internet flooded warning about not to use mongodb with Phoenix. But then after research on this forum, I realized that the question has already been answered, and it’s mainly lack of maintaining development resources.

So the question that I had in mind, has changed. Since I am currently learning both Elixir and Phoenix and I am so in love with the design principles, documentation, performance, history and everything. This ecosystem that I’m loving is giving me a lot of trust and credibility regarding the foundation (company) and development community behind these technologies, makes me ask: What do you actually think/believe about Mongodb as a technology honestly? Why the support is weak? It makes me think that there is a general consensus/disbelief about the technology.

I am one of the people who were relieved to know that the default is Postgresql, not because I am an expert in it but because I do understand how a relational database works but have tried to understand many times how the collection based database works in terms of query accuracy and have a hard time. Besides that, it is well known that Facebook tried mongo as a means of salability and it failed them so this is also another reason to turn off.

By giving your honest opinion about the mongodb technology and how mature/stable it is, you may be actually helping many others avoid this decision. But I am curious to know your opinion that is presumably based on a technical experience or standpoint. I think the purity and honesty of your feedback will also help the mongodb project to improve on what they lack, so it’s not a bad thing.


#2

You may be interested in the opinions expressed in this topic:


#3

In short: Mongo or NoSQL databases are something you should seek out only when you have a problem that you can’t address with a SQL database.

In detail:
I’ve been following Mongo since the very beginning. One of their biggest early users was Server Density who for years had the largest Mongo deployment that existed. They blogged about it a lot.

What Mongo is and what most NoSQL solutions are really are solutions to niche use cases. For Mongo those use cases are:

  1. Huge DB write bursts
  2. Unstructured data
  3. Data sharding

Mongo jumped on the scene early because it won a lot of benchmarks. The benchmarks happened because they allowed async DB writes which were buffered in RAM rather than having to be written to disk. When the RAM buffer fills up, Mongo slows down dramatically so if the write volume is closer to a constant level, Mongo will struggle. Winning benchmarks gets a lot of developer eyes though.

Unstructured JSON data is good in certain scenarios but it complicates functionality like indexing and it adds significant extra storage space by duplicating keys on every field. Most situations where it is used aren’t really well justified and can be better served by defining your consistent fields with columns and then storing the unstructured bits in a separate field. With most SQL databases offering variations of a JSON column.

Data sharding remains the real selling point for NoSQL data because the forced lack of joins simplifies spreading data across multiple servers.

Today, if you still care about benchmarks and you need everything that Mongo offers…you’re better off using Couchbase…but very, very, very few people actually need either of these.

Server Density was the actual use case that legitimately made sense for Mongo. They ran agents on your machine that would track data from every random piece of software that you were running and then feed it back to them every minute. The data getting sent back could be different for EVERY single agent that was running and they were getting bursts every minute.

Usually if you need those types of features it reflects a small subset of your overall data structure and even Server Density acknowledged that, only using Mongo for their collection system and using a SQL DB for everything else.

Sharding is also an abstract problem that is totally dependent on the type of data that you have. If you’re running a huge system that lumps data together in a big bucket, using something like that might make sense. For a lot of systems that are SaaS where data is really associated with a customer who’s signed up, a solution like Citus will let you shard straight out of SQL to scale your database horizontally…without losing all of the capabilities of SQL.

What you’re left with is the in memory focus of the database for speed sake…but that entirely depends on what you’re using it for which could be better served by a number of other options.

Personally, I’d classify all NoSQL databases including Mongo as something that you only reach for when you have a very specific limitation of a SQL database…and when you reach that limitation, the solution you reach for is going to be based on the specific problem for the specific data; not the specific solution or all of the data. You could just as easily end up with Cassandra, Hadoop, CouchDB, Couchbase, Citus, some type of columnar data store or even something like GenStage as you could end up with Mongo.


#4

Thanks for your detailed response. Any reason why you have not mentioned Firebase?


#5

No reason. Wasn’t trying to list everything.


#6

I think @brightball made a great summary so there’s not much to add.

What I want to add is that I just don’t really trust Mongodb, imo one point can not be highlighted enough:

I can’t find the post/presentation where I first saw this highlighted at the moment but this essentially means your change isn’t really there. It sits in RAM. It’s not written to disk or anything yet. If Mongodb crashes right then and there or the server goes out that change is lost although Mongodb has already acknowledged the write as “complete” - which is not what you expect from a database. To the best of my knowledge this was the same even if you distributed Mongodb - i.e. even if the change was written to none of the replicas (not even the primary) it was acknowledged which doesn’t give you a lot of durability.

I’m somewhat sure things have changed or there’s at least an option, or maybe there always was an option but to the best of my knowledge this was the default, which is terrible. Looks great in benchmarks, no so great in real life when you lose data. The one thing I don’t want my database to do.

There’s also a rather popular and good blog post by Sarah Mei “Why You Should Never Use MongoDB” that goes into other details why it might be a bad idea. Specifically through the nested nature of the documents (and no joins) modelling data effectively gets hard and you can end up with lots of duplicates which is the opposite of fun (especially for updating).

Another problem you run in here is if you want to display data that you have embedded in documents based on some filter that isn’t the primary document - you basically have to scan through all documents.

As for unstructured data, at first it sounds cool but then remember that your code has to deal with that data structure. I don’t wanna have millions of ifs checking whether a field is there or not and having my application code deal with all of this.

If you need some unstructured data - ecto’s support for Postgres’ JSONB is excellent


#7

This somehow reminds of the Optimistic UI used by frameworks like Meteor. The user receives a feedback that his actions are done, but in reality it’s just cached in a minimongo cache on the client side until the server responds with confirmation.

Thanks for your feedback Prag!


#8

For UI I think optimistic updates are ok, because most of the time things work/go well. It’s an entirely different thing when the thing I trust my entire data to does it to me. Maybe I’m just too used to working with ACID compliant SQL databases though :slight_smile:


#9

Well, Battleship movie is a good example. When the aliens invaded us, they throw EMP bombs and destroyed all military electronics. What worked? The ancient World War 2 battleships that were analog, driven by 70 year old veterans :smiley: Sometimes old school just rocks!


#10

I think this is more a failing of Meteor’s implementation than the idea of Optimistic UI: Don’t block the user, get out of the way.

  • Visually keep the distinction between a pending task and a fulfilled one subtle as it will likely succeed (and don’t block the next action)
  • However a compensating action needs to be viable and available in case the task does fail.

#11

I think that MongoDB is a good tool if you want to create a quick proof-of-concept, just like e.g. Ruby is a general programming language great for proof-of-concepts.

But I’d like neither for creating resilient applications that should be maintained, extended and potentially scaled over a prolonged time-span. Why? Exactly because of the performance, resilience and features they give up because of their dynamic, implicit nature.


#12

I personally think MongoDB is great–have been running it in production for 5 years serving 1,000,000+ users daily. Schemaless DB is a joy to work with, no need to fuss with migrations. The JSON document structure (maps, arrays, etc.) means your data in your DB matches fairly closely to how it’s represented in your application. You can store arrays of foreign keys in a document; no need to ever use join tables. I am aware Postgres has JSONB but it’s just not treated like a first-class citizen, especially in ORM libraries.

…speaking of which, the MongoDB ORM for Ruby (Mongoid) is the best ORM Iibrary I’ve ever worked with, it fits Mongo like a glove.

Re: concerns like writes being buffered in RAM, in most real-world web applications, if your database crashes, you have bigger problems to worry about than dropping a few writes that were ack’ed by the DB (considering that your DB will now no longer be receiving new writes.) In cases where the DB ack matters a lot, such as financial transaction systems, I would not recommend MongoDB.