Do you want/need a database? Do you want to serve a small number
or users or make it scalable to billions of users? Do you want it
fault-tolerant? Do you want to serve static or dynamic pages?
Do you want to authenticate the users? What are the protocols?
Do you want predictable latency? etc etc
You need to think about questions like these before making the application.
The design of a scalable system for billions of users is completely different
to the design of one for a small number of users.
The word “fast” also doesn’t mean much - some people interpret this
as ms. others as seconds. The Tax authorities are “fast” if they reply with a couple of months -
You need think about the required latency (for example 95% of all requests should be satisfied with 200ms) and the number of request/second and #users
Fast is relative to the task being preformed. Both of those things can be optimized and you will get different benefit from each, but in general most apps the database querying and loading data is the heaviest part of the application request time. Here is a Jeff Atwood tweet that sums up the majority of application performance optimization.
SQL is usually hundreds to thousands of times slower than the server side code. I always start performance turning with SQL. Every application, regardless of the language, has issues in this area and it’s where you will seen 90% of your performance gains in most applications. There are exceptions to the rule though.
You have to remember that any SQL database will be a bottleneck since there is an impedance mismatch between the way dayt is represented in a daybase and the way it is represented in the beam VM.
Databases are basically rectangular tables of cells, where the cells contain very simple types like strings and integers - every time you access a row of an external database this list of cells has to be converted to beam internal data structures - this conversion is extremely expensive.
The best way to persist data is in a process - then no conversion is needed, but this is not fault tolerant - so you need to keep a trail of updates to the data and store this on disk.
Often you don’t need a database for example you might like to have a system where you store all the user data as in the file system with “one file per user” this will scale very nicely - just move the files to a new machine if you need more capacity.
Erlang has two primitives term_to_binary and the inverse binary_to_term that serialise any term and reconstruct it - so storing complex terms on disk is really easy.
I have mixed feelings about databases, they are great for aggregate operations (for example, find all users that have these attributes) but terrible for operations on individual users (where a single file per user is far better).
If I were designing a new system I’d go for ‘one file per user’ as much as possible and try to limit databases for operations over all users.
If you look at how many programs are designed you’ll see they follow this principle. Apple stores all images in the file system (hidden a bit) and has a database with metadata about the files. This is good since the database is small
and many operations can be performed with minimal use of the database. What they do not do is put all the data in a database - there are good reasons for this.
The problem with files is that this approach is inherently not-available since the file has to live on some machine. And sure, you can go into disubstituted systems, riak core and stuff, but it’s a lot of mental overhead - database is just so much simpler (as long as your data needs don’t exceed a single database node, then you have no choice).
This is all very fascinating to me! I’ve always thought that using the file system was bad for this kind of thing, but I’m finding a lot of things I learned for another language simply go out of the window when it comes to BEAM languages
Joe, I’m not sure how much Elixir you are doing, but I’d LOVE to see a course from you like the one from PragDave. Better still, I’d love for you and @rvirding to do it together (where’s Mike these days? I reckon a reunion is called for )