Elixir ecosystem is a dream come true and future proof, except...for...,

PJextra · April 29, 2020, 10:29pm

…databases.
Pipe operator, pattern matching, Supervisors, Flow, Phoenix channels, Live view, Releases,…
But SQL feels so “deja vu”, past, old. Ok, for existing developers it’s present and future but for new web developers it’s sooo 60’s. I mean, human readable queries are the future, out-of-the-box horizontal scaling a basic need.
Graph and document databases are what a new developer looks, not SQL.
Sure, I like the way Ecto handles that. Makes everything so simple. But it seems I’m just fooling myself. I don’t want to become an expert in something of the past.
So, this is my way of saying that everything in the Elixir ecosystem is years ahed of our time, except when it comes to databases.
I wish, now that all the other pieces are there, Elixir’s team could apply the same magic into the future way of persisting data. And analyse them. And make data easy to handle, IoT way. Human way.
What do you think?

PS: I really, really appreciate how Elixir makes software development so crystal clear and future proof. I learned Ecto and PostgreSQL but I feel that’s my Achilles heel.

lucaong · April 29, 2020, 10:39pm

I personally disagree with this. Graph databases, document databases, etc. definitely have a place in some applications, but relational databases are a powerful general purpose solution backed by a well developed theory (relational algebra) and lots of development.

Modern relational databases like Postgres even offer document based storage, full-text search, and a lot more for cases when something like that is needed.

I am interested in your argument though. What is making relational databases a thing of the past in your view? What do you mean when you say that SQL feels old? I assume there is something deeper in your argument than just SQL not being trendy

benwilson512 · April 29, 2020, 10:42pm

Ehhhh sorta. They were new and cool in the late 00’s through say 2014. Since then there has been a major resurgence in the use of RDBMS both traditional and new (think spanner). Postgres in particular has developed substantially in a way that improves performance, and it opens up new ways of using postgres that can be quite impactful. We’re using Postgres for both timeseries data and as an event store for event sourcing to great effect for example.

The key here is persisting data. Persisting data reliably is flat out hard, and doing so in a way that improves on existing solutions is harder still. Elixir’s characteristics aren’t silver bullets to distributed computing problems either.

A lot of your argument here seems to be oriented around what feels new, what feels old, and what feels cool. There is a lot of fascinating stuff happening in the database world to be sure, I highly recommend Tag: Jepsen for example if you want to have some additional ways of evaluating database technologies beyond whether or not they provide a SQL interface.

PJextra · April 29, 2020, 10:51pm

First let me tell I’m a true admirer of your skills and libraries.
I know SQL and relational databases are important. I first took the effort to learn about them and personally like PostgreSQL. Have two say that like it even more due to the capability of storing “Document like” data and like Ecto embedded schemas.
But the future is about IoT and Graph N levels relationship, not “Joins” and pre-defined schemas.
And SQL is powerful but every time I look into a Mongo query or a Rethink DB or a…I get win love. Because I like them. I don’t like SQL. I have to live with it.
But apart from personal preferences, Elixir is all about distributed system and built-in horizontal scaling. Would you bet on relational Dbs for that?

ityonemo · April 29, 2020, 10:52pm

Elixir’s team could apply the same magic into the future way of persisting data.

boy do I wish that were true, too, but state is hard, and the elixir team has so much on its plate as it is… I wouldn’t wish that on them. Now if only I had a bunch of cash I could just throw at a skunk team of a couple of smart academics and a couple of smart programmers to work on interesting things…

PJextra · April 29, 2020, 10:59pm

Again, I would repeat myself in my previous answer.
You are an inspiring member in this ecosystem with Absinthe and the way you love simple and efficient code.
I’m a new software developer that didn’t lear SQL 15 years ago and now I look to GraphQL and feel in love with it, so why REST? But that’s, honestly, a minor preference and efficiency issue when you compare databases query languages, distribution and scaling capabilities and, of course, human readability.
The problem is not SQL or relational databases in itself.
It’s when you compare them with the new possibilities.
Just like when you compare Java with Elixir.
The majority will say Java is the best option and, bla, bla, but it’s not even a discussion for me that started to code recently.
It’s the same with databases.
And, I’m saying this because Elixir “Just works”, Phoenix is a " Peace of mind", and so on.
SQL…well…relational DBs are a dark cloud in this dream. Simplicity wise for a newcomer. How the hell should I think data first? I think “Business first”.

PJextra · April 29, 2020, 11:03pm

LOL, true, but as far as I know everything is very stable and with low need of attention except for LiveView as of today. But the next big game changer could be this one: databases.
Even authentication that was a no go 2 years ago is close to be handled…so…
“Where there’s a will there’s a way…”

MrDoops · April 29, 2020, 11:58pm

The issue isn’t necessarily databases, they’re an implementation detail for persistence and query requirements. Databases solve the real problem of making sure your data is still there throughout restarts of the system. Databases do a lot and they’re hard. Want to learn more? The issue might instead be an architecture that couples directly to the database through misuse of CRUD semantics. So maybe it would help to discuss the misuse of CRUD?

Have a problem? Let’s add a field! Have a bigger problem? Let’s make a new table! Have a huge problem? Let’s make another service with it’s own database so it can have the same problems but somewhere else but with more coordination costs! Ha ha! Microservices.

The issue isn’t CRUD, but usually exposure of CRUD semantics as the interface rather than domain semantics.

With CRUD we have an entity and it gets created, updated, deleted or read. If that’s our interface what does the business logic? Maybe the CRUD is exposed over JSON via HTTP or even…GraphQL. What? I thought GraphQL was fancy and new and modern and solved all our problems? Well not if you just do CRUD over GraphQL… same issues can still surface because instead of exposing a service that does business things we’re exposing a service that does database things. Because our business logic isn’t involved in the interface to the consumers it means our consumers have to do the business logic! As a result, the business logic: the whole thing you built the app for, tends to get scattered all over the stack.

This isn’t always a bad thing. Sometimes exposing the CRUD is a great way to get moving fast when the real requirements of the problem are unclear. A reasonably designed table can operate like a slightly better spreadsheet. You monitor post-deployment the repeated efforts of some user to discover the real “business logic”. However this benefit of CRUD is a double-edged sword - what if you never notice that real requirement, or the developer leaves and finds a new job? Business has to business and changes will still be made, but will they be good changes?

So real issue is change. More fields? New picklist options? New tables? Migrations? For the database or the existing data? Both?! Good luck migrating that unvalidated text field you tossed in there because it was the quickest option. You’ll just need to write an ETL script that covers every single possible string your sales team could’ve thought up on the spot over the last year. Maybe we should’ve just used a spreadsheet…

So why does everyone build CRUD systems? Why do we have Phoenix generators for CRUD? The reason is because it’s extremely time efficient and if we know what we’re doing we can minimize most of the main issues. Instead of create_user we make register_user. Now our interface to potential downstream consumers is domain-driven rather than CRUD-driven! Problem solved. Mostly. We still have to handle change and data migrations over time. But, if the database model is well defined and we’re validating our data, that change doesn’t have to be that bad.

Finally, just think of all the engineering hours into frameworks, Postgres, etc over the decades. At this point we can run those Phoenix generators and do what would’ve taken teams of database administrators years ago. We’ve taken something that was hard and difficult, thrown hundreds of thousands of engineering hours at it, and now we have mix phx.gen.live.

Can we do the same for other architectures than CRUD?

From a technical perspective we could argue that CQRS and Event-sourcing is “simply better”. It scales! We get an audit log! We can torch that poorly designed read-model and make a new one. That’s awesome. But if it takes more developer time to implement a given CQRS/ES feature it will almost always lose out from the business perspective until it burns through significantly less hours than CRUD. We can run a large profitable business on a $10 / month server using free open source software. Hooray! Infrastructure is cheap! But the developer hours are going to cost you thousands per week. Have a fancy new architecture? Now you have to pay for fancy developers and they cost even more.

So this is a long winded way of saying that instead of investing engineering hours into the diminishing returns of improving CRUD we should be looking at making new and improved architectures as or more productive than CRUD is currently.

/rant

lucaong · April 30, 2020, 7:34am

I think there is some truth in this. It’s not easy to horizontally scale a relational database in the way that distributed system do. Indeed, Erlang native solutions like Mnesia adopt a different model. Also, solutions like CouchDB or Riak (both written largely in Erlang) adopted a distributed model from the start.

That said, solid relational databases can take a lot (I am always amazed how much!) before one needs to scale horizontally. Even when scaling horizontally, one still has a choice in what to shard and what not.

On top of that, I believe that most data is inherently relational, and consistency is often more desirable for core data than full distribution. It depends on the case of course, but relational databases, when following established practices, are very versatile (you can model graphs or document-based access with them), and adapt very well to changing requirements as they impose less “sticky” decisions upfront.

You are right that classic relational databases embrace a different model than the Erlang distributed system philosophy. But I don’t think the two are at odds, and in my personal experience distribution at the database level is something very tricky to deal with, that I would only use when necessary, not as a default.

It’s a good conversation to have though I do love to look into different solutions. If you didn’t read it yet, this book is really enjoyable: Seven Databases in Seven Weeks

ityonemo · April 30, 2020, 7:37am

I’ve always wondered why there isn’t a relational database in the BEAM. You could use otp for the tricky parts like shard management and distributed locks, and drop to NIFs for parts that need to be low level like disk or block access.

lpil · April 30, 2020, 9:13am

Creating a RDBS takes an incredible amount of investment. Postgresql has 50 years of history with some of the brightest minds in computer science and database implementation working on it during that time, which is reflected in its capabilities.

Many newer NoSQL databases are much smaller in scope, and even then they take a long time to create. Neo4J is 13 years old and is still lacking many features that we might expect of a RDBS, especially in the open source version.

As someone who’s used these and also RDBS in prod I would pick the relational database unless I have a clear technical requirement that the RDBS can’t handle (for example complex search, for which I would likely use elasticsearch).

My experience is that NoSQL databases excel at the thing they are made for, but are much less suited to other tasks, so you have to think extremely carefully about your schema design and access patterns, requiring much more design work up front.

Relational databases allow a much more iterative and “agile” style of development as it’s trivial to change the schema in production at any point using migrations, and the optimisation for ad-hoc queries means you don’t need to determine and design for all your access patterns before you start development.

hauleth · April 30, 2020, 9:24am

And in this case it mean:

It is battle tested
It is optimised
It has a lot of tooling

SQL queries are pretty readable. Of course, QUEL was much better, but SQL won. I would say that SQL queries are more readable than 90% of “NoSQL” novelties.

Nah, especially as SQL can work as graph or document DB with success. At the same time most of the problems are better described with relational algebra.

Yet, here we are discussing language that is based on technology which is 40 years old (Erlang).

That is impossible dream. Data handling will always be hard, because each data is different.

LostKobrakai · April 30, 2020, 9:26am

There are many databases build on the beam/erlang. To the mentioned I’d add antidote db and barrel db – there are probably quite a few more. Both are quite interesting, but certainly not as popular as e.g. couchdb or riak. So the BEAM is full of options in that space. What you feel to be a favor for sql (postgres) is more the fact that databases are very complex systems, which need time to be developed and some hard questions to be answered competently. Therefore people are even more likely to stick to proven tech which they know and especially which they know works e.g. compared to a runtime programming language. Postgres seems to work for a lot of people/use cases, and the adoption does get even bigger for sql based databases, so including mysql, mssql, ….

gregvaughn · April 30, 2020, 4:17pm

No. It’s so 80’s (with numerous updates). SQL was not standardized until the late 80’s. I know those 20 years difference may not mean much to you personally, but when you’re arguing about something being outdated, it hurts your argument when that thing is 20 years younger than you claim.

PJextra · April 30, 2020, 6:33pm

You’re right.
This means I’m old because that’s the way my generation expressed when things were the old way of doing things.
Nevertheless, in a more serious note my point is that Elixir is all about simplicity and convenience and Relational DBs are not by themselves those, specially when it comes to the new unstructured data, human way of interact with software and specially distributed and scalable systems.

PJextra · April 30, 2020, 6:37pm

Yes, I agree with a lot of those arguments but the truth is that the future is more and more about unstructured data, relationships, distributed architectures, more than simple interfaces,…
Off course I recognise the excellent work done by Ecto to simplify the way we interact with Databases. But the fundamental problem is still there…

PJextra · April 30, 2020, 6:42pm

Yes, but those new projects seem to be paused or stopped…and making it easy to support a lot of existing relations DBs is a worrying sign as it means you’re locked into that model.
And, liked or not, new developers (not literally) don’t know SQL don’t think that structured as they use to do 20 years ago, mostly due to Javascript and Nodejs explosion, that makes it easy to copy paste a lot of code and things work. Until they don’t.
But that’s exactly why I think Elixir could embrace that problem and make persistence and manipulating data as simple as it is to tackle real-time, for example.

lpil · April 30, 2020, 6:44pm

Are there particular databases you would like to see clients for in Elixir? Which ones stand out as being suitable to you?

hauleth · April 30, 2020, 7:04pm

Rise of strongly typed languages makes this statement weird. I do not think that this is true, we still operate mostly on highly structured data.

And what makes SQL not fit in that place? It also depends on what kind of relations we speak there. Relations in sense of relations form relational algebra or joins from relational algebra? People often confuse these two.

And how is that relevant to SQL? There is hell lot of so called “NewSQL” DBs that work perfectly fine in distributed environments. All of that in the end is about CAP, where “traditional SQLs” put all their cards on CP. If Google managed with regular SQL for so long, then you, with your toy service, will also manage.

In fact I see Ecto query as a return to the QUEL which was THE query language for relational algebra (as it was created by Codd himself. It is important to remember that PostgreSQL started as POSTGRES which used POSTQUEL as a query language. So this is more like going back to where relational algebra was done by someone who know it well rather than something novel.

And I do not see that “fundamental problem”. Not at all.

I would say that this is problem with “new developers” not with SQL…

I would say that this is a reason to keep SQL, not to abandon it…

It cannot, because persistence and manipulating data is hard. It is so hard, that there is whole branch of engineering built around it. That branch is called programming. There is no way to make it simple, as there is no simple way to tell what is “data”, you cannot describe what you want to do with “data”, you often cannot define how that “data” will be encoded nor how it be stored.

In the end, do you know how Data works?

Sorry if this feel a little like Linus on C++, but it has been said that SQL is the worst form of data query language except all the others that have been tried.

gregvaughn · April 30, 2020, 7:40pm

You might want to be cautious about generational … generalizations. Your overall point that relational databases are obsolete sounds like things I hear from people half my age

I would counter that Elixir builds atop the foundations of Erlang for fault tolerance and stability – invented by the fine team at Ericsson not long after SQL was first ratified. It’s incongruous to laud one while denigrating the other. “Simple ain’t Easy” requires a solid foundation.

By all means, I’d love to see the Elixir community explore alternate data stores, but not at the expense of first having Postgres as a solid/safe default choice.