I’ve had a couple of times in the past where we went full-out TDD and YAGNI, and ended up with no database. Just some code dumping data out to disk in whatever the native format was (in this world, :erlang.term_to_binary is your friend). As someone remarked, a database is especially useful if you need to coordinate multiple instances of your app, so in modern systems it often serves more as a distribution tool than a persistence tool, if you want that categorization. To that end, it is often the simplest (most readily available, best documented) tool for the job.
Not to forget the things DBs do beyond pure persistance. I mean sql might very well be easier/quicker in building reports compared to querying all that data from various genservers. Also rolling your own data indexing might not be your thing.
But don’t underestimate the raw search power of modern CPUs. In 2006 I built a real estate site which basically had the whole dataset in memory; a search was basically just a traversal through the ~125k properties listed, including doing stupid stuff like substring matching as a proxy for text search. Queries came back in <<100ms, customer was happy, we never optimized it or considered adding a “proper” search engine.
Yeah, modern DB’s have a lot of features that often come in handy I guess.
I’ve gone full circle in the one day this post have been up. I’m back at thinking user authentication, search capabilities and so on makes it easier for me to just stick to the beaten path for now and use Postgres.
I think that another main reason people might use a database often, is that when using an interpreted language, code (selection, filtering, ordering, reducing) that runs on the database is significantly faster than when this is done in the interpreted language, as the relational databases are written in a compiled language.
However, when working in a language that already compiles, this speed difference is not nearly as significant. What @LostKobrakai says is true: The database have been fine-tuned to do what they are best at, and they probably will be faster. But there are two other factors to not forget:
Developer efficiency: Is the added complexity and the added amount of moving parts to my app worth it in extra setup and maintenance it will take?
Communication: Because the database is in a separate OS process, you’re forced to use OS pipes or sockets to communicate. This is obviously a lot slower than in-process (again, talking about an OS process here) memory access. For many queries, especially larger ones that return a lot of results, the overhead of serialization+sending+receiving+deserialization might be significant.
As for the Twitter example: I do not think every tweet should be kept in random access memory indefinitely: At some point they could and should be archived, resulting in a more scalable system. Furthermore, I think that a tweet definitely should not be a process, as I believe it is not a machine, but only a mostly static piece of data. We can filter/sort/order pieces of data, but we cannot filter/sort/order processes directly in a meaningful way.
I see a tweet as a letter: piece of paper with information written on it. It is not a creature, a clock or another automaton. It’s in the original greek etymological origin of the word ‘automaton’: Something that acts of its own will.
Something to keep in mind about that Joe quote : he probably means an “Actor” in the Actor Model meaning far more than an “Erlang process”. In the Actor Model “abstract” pov, making everything an actor can totally make sense because it is your main way to deal with memory access.
In our real life BEAM application, we may want to collocate that into a single process for “optimisation” and “implementation” purpose.
All of these things can be represented as prosesses and communicate via messaging. A batch of a given recipe would have a brewdate and so forth.
Be very careful here. If I’m reading your remarks correctly, it sounds like you are trying to model entities as processes (a practice with a “bad smell” common to Object Oriented programmers, myself included, who move into the functional world).
I wonder, perhaps, if you are misinterpreting something that I perceive as a main theme of Phoenix 1.3.
I gather that in Ruby on Rails, there is a tendency for the persistence mechanism (ActiveRecord) to pervade an application in a way that the logic of the domain model that the application operates on becomes entangled with the persistence code. As a result, I have been lead to believe that it is common when writing an Rails App to begin solving any problem by figuring out how you are going to squeeze that problem into the database. (as with most things, probably true in some cases and untrue in many)
I gather from the 1.3 presentation, that for various reasons (one of which was the generated code that Phoenix produced) some Phoenix 1.2 apps started to show this unhealthy trait of the persistence driving the domain model of a problem. As I understand it, one of the main aims of Phoenix 1.3 is to help developers keep a healthy separation between their domain modeling code and the code used to persist elements of that model.
In that case the warning is “Don’t start solving your problem by designing your database”. Start by creating code that models your domain and solves the problem. Then, if you need to persist data, find a way to do that. A database might be one good choice among many.
Lance and I are working with some great reviewers to make sure we address this feedback. We’re going to shift the approach for the two chapters in question, the chapter on modeling and the chapter on wrapping the result in OTP.
As a popular book in the community, we want to make sure that the guidance we give is sound.
This sounds fantastic! Now We just have to wait for the changes
I think it’s good to have some perspective outside of some fairly exotic scenarios you really do want a database. You are very unlikely to be able to have a performant alternative at reasonable cost outside of fairly niche scenarios.