Can someone please give me an example of an application where erlang term_to_binary and binary_to_term have been used for state management using OTP instead of a database.
Generally the way I (and maybe most others) have been modelling applications is by thinking in terms of database tables and their relationships. This has been going on for long and is often easier than thinking probably in DDD terms, especially for junior developers. They way hardware prices have come down I think many applications (not internet scale) can easily fit their data in the RAM. So it would be great if someone can advise me how to use OTP and files for state management of smaller applications.
Also I think its very difficult to rule out the utility of a database (mainly POSTGRES) because of the reporting requirements. Should we go for the file based approach how do we tackle this?
I am more confused by reading Doing without database in the 21st century.
Would going the databaseless route wonât be very hard? ACID etc.
If someone can shed some light on this it would be really enlightening and helpful.
Thanks
Everything depends on the use case.
I am pretty sure you have some concrete examples in mind while asking. It might be useful to share your particular scenario.
In this book Functional Web Development with Elixir, OTP, and Phoenix author shows how to make an interactive game. When the game is ready and works in memory, he adds saving state in ETS in case of crashes and mentions that it could use dets (disk based term storage).
He does exactly the thing you describe as confusing: starts without touching the DB and ends up with an excellent model decoupled from storage. I highly encourage reading it!
[EDIT] I would also treat Why Relational Databases Are So Bad with a grain of salt. The timesheet example might ring a bell in people. Adding sequence ids to rows in data and work to produce a report from SQL is repetitive. The same goes with retrieving employee name from a different table.
What the author doesnât mention is that storing employees and hours separtely and normalising the database allows to easily reverse the query: âwho worked at a given time?â Try that with saving timesheets as a list of hours in a file
Again: everything depends on the use case. Maybe you will never need that reversed query?
I recently tried this approach building a new production service. It managed lists of contacts with some additional business logic, each list getting its own GenServer with state backed up to S3. Each GenServer would gracefully shut down after a few minutes of inactivity to avoid memory leaks. In short, we scrapped the whole thing and reimplemented with postgres.
Building without a database allows you to write some of the most expressive code youâll ever create, at the expense of having to write a lot of things you take for granted in traditional design.
Big issues I faced:
- Needing sagas almost immediately. Simple pieces of information had be duplicated in a few places, and these updates had to be atomic.
-
Data migrations. My initial version directly wrote the struct with
term_to_binary
, but obviously this gets hairy if data needs to change. Lacking the time to implement a proper migration strategy due to deadlines, I ultimately decided to abandon the whole approach.
Itâs entirely possible I implemented it wrong, and I do intend to try it again in the future. Ultimately I felt like the system I had built wasnât nearly as stable or simple as a database, and given the scheduling deadlines, stability and simplicity had to come first.
As a side note, the original business requirements had us under the impression that contact lists would be no greater than 10K entries. We had a 45K list day one in production, with the expectation to have multiple 100K+ lists in the coming weeks.
I tried a similar approach last summer, but with sqlites stored on google storage. Each user had an sqlite database. And I also switched back to Postgres since the ecto adapter for sqlite wasnât quite as nice to work with as the one for Postgres. I also worried a bit about race conditions, where two clients would start at the same time and both download and start writing to the same database.
I mostly did that to try out hosting the application entirely on preemtible/spot instances in the cloud.
The no-DB approach sounds awesome when youâre building exercise apps but unless you need barely any reporting or joining of data, then it doesnât scale beyond the first two weeks of active development.
The problem is never âwhat is the exact way my app is persisting information?â. The problem always has been âhow do I query and aggregate it?â.
IMO the way relational DBs build indices and joins â and the internal storage mechanism in general, including transactions â needs to be ported to an embedded DB. People around here periodically attempt to write apps using only (D)ETS as a storage and the conclusion always seems to be âitâs too hard to arbitrarily query dataâ.
If I had the time and was paid for it Iâd seriously attempt writing such an embedded DB engine.
âŠOr, it might be worth it to contribute to the sqlite Elixir adapter. But then again, sqlite only allows a single write operation at a time.
I know you didnât reply to me, but let me leave some notes on how Iâd handle some of this:
I planned to use some ad-hoc approach (and later switch to spark) to build materialised views. I never got to do it, but I donât see much reason why it wouldnât work.
âŠOr, it might be worth it to contribute to the sqlite Elixir adapter. But then again, sqlite only allows a single write operation at a time.
With WAL enabled, so as not to block reads, itâs enough if used carefully. For example, since in my setup all users had their own databases (colocated with the user processes at execution time), all requests took less than 1ms (but it could eventually get worse as more users â more databases â more files â more filesystem thrashing, thereâs a way around it with storing multiple sqlite databases in a single lmdb, but I never really looked into it), whereas after switching to Postgres it was about ~30ms. The app was a simple voice messaging app, so the requests were âdo I have new messages?â, âget message history with Xâ, âsend message to Yâ, âget prekeys for convo with Zâ.
IMHO If the purpose of the application is to store and manage data, then you can either use a database or build a database, and you probably donât want to build a database.
However, there are applications who have jobs that isnât storing and managing data. For example, we have applications which route data and others that serve as kiosk systems. For these, a database was superfluous. Just my $0.02.
This is exactly right. Unless your company sells a database donât build a database.
To add on to this, I donât know of many companies that donât need to store something. My advice is to put your state into a reliable database (postgres is a good default). As other people have said this empowers reporting, etl, and a whole bunch of other benefits.
Itâs not the only reason, but the main reason to build stateful systems - meaning bringing your applicationâs state into processes - is to reduce latency. I have a lot of empirical evidence to support that a âstatelessâ elixir service backed by a database will take you a really long way. IMO you need to prove that postgres or your db of choice wonât be fast enough before you start bringing more state into your application.
Unless your company develops databases, the database probably isnât your application. Thatâs the basis of opinion pieces like No DB:
- The database is just a detail that you donât need to figure out right away.
- The center of your application are the use cases of your application. (not the database)
i.e. relational databases can be very useful but their existence shouldnât dominate the architecture - possibly to the point where the UI is assembling dynamic SQL.
And in some circumstances other approaches to managing/handling data can be more effective:
âTurning the database inside out with Apache Samzaâ by Martin Kleppmann
Iâve worked on multiple systems that democratized their data through kafka and other mechanisms. Based on those anecdotes Iâm more then comfortable suggesting that most companies should not do this.
All that aside, most companies are information systems. Meaning they take data, store data, and present that data to users as information. That being the case the database absolutely does matter. Its not âjust a detailâ. Its an integral part of your business and you should choose databases that have the tradeoffs your business needs.
The message is that your core problem should dictate which data handling technology is appropriate.
Before the emergence of NewSQL and stream processing it wasnât uncommon for âtheâ relational database to be the foundation and crown jewel of the all business processing - the core of a brittle and tightly coupled BBoM. Unfortunately technology centric design can happen all too easily with any technology.
Thanks for your response.
I am trying to create a small feedback application where students can give feedback/rating to their teachers and need to show average feedback/rating for each teacher etc.
I read Functional Web Development with Elixir OTP, and Phoenix and was wondering can this strategy be used for something which is not a game. I get the use case of the game when sometimes you just need to persist the final score/result and the running game can be managed in memory.
But can we use similar strategy for apps where everything needs to be persisted permanently?
Thank you so much for sharing the practical problems you faced.
Using a database gives us so many nice guarantees that we appreciate deeply when they are not available.
At the most basic level that would only require that the individual feedback/rating submissions are persisted (preferably as structured records) to an append only file. That file can then later be processed by a separate program to aggregate the per teacher results.
Keeping an in memory representation of the data and processing the submissions in realtime only makes sense if all feedback submissions are made in a fairly short time period (hours to a day, rather than weeks) and there is a need to amend the displayed results as the submissions are coming in (like for an election coverage).
Things get a bit more interesting once you are trying ensure that all submissions are authorized and not duplicated - but again there are different ways to go about that.
So the priority is to identify your applicationâs use cases in detail - not which technology will be used to persist your data.
That use case is compelling! If you have time to try, you could follow the development process like in the book:
a) Write the logic for use cases like rating, querying averages and so on in pure Elixir
b) Try storing the results in the GenServer
for each teacher (similarly to having one game process)
c) Figure out in the end what do you want form the storage.
I assume you would want to do some aggregation in the end. How many votes there were, who was the best teacher and having three entities (student, teacher and score) begs for a relational database.
The point of DDD is not âdo not use relational databaseâ. It is âdo not pollute your logic with DB detailsâ. E.g. your code might model a teacher as a struct that has a list of scores. It is pure Elixir model, and your application logic does not care if it uses left join or right join or stores the scores as JSONB or whatever else you think is optimal.
An added benefit of starting with the in-memory model is more interactivity. Maybe you could send a push notification to a teacher when his score drops below three stars?
Starting from the database makes you think on the wrong level. E.g. you will need some authorisation and authentication both for teachers and students. If you start from the database, you immediately think: âshould I create another users table and have foreign keys in teachers and students tables? Maybe teachers and students should be in one table and have a role
field?â.
Starting from the use case, you create a struct called User
or Account
, implement the feature and defer retrieving the users for later. Maybe youâll end up with getting users from LDAP or private school students database, and you wonât have to persist them at all?
In general, I disagree with the author of âDoing without databaseâŠâ. DBs are there. They solve problems almost everyone has. Letâs use them. I agree with DDD approach of not caring about DB as long as possible.
Developers sometimes say stuff like: âyou should abstract the database! what if you need to change it?â That happens so rarely that I would dismiss it. Abstracting database away has other benefits: your business logic gets simpler and easier to test in isolation. In case, you do a big migration, youâve decoupled code that performs business from the code that loads stuff, and you can change only the latter without touching âbusiness layerâ.
TL;DR DDD=good stuff; NO_DB movement=meh
Without c) this basically sound like what commanded allows you to do. To an extend even c), but youâd need to use either a supported eventstore or write one.
There are 2 no db ideas that some are using⊠which include using a git repo to persist data and using a hosted spreadsheet service.
Some static site generators will read markdown and other files before compiling into a static site. NetlifyCMS is an open source project that allows people to create markdown files using a CMS⊠and each save is actually a git commit to master. A hook automatically rebuilds the site after each commit. So itâs great for sites that donât change super often or sites that arenât massive. It can include large file storage and an identity system so authors donât need access to the git repo.
Second one is using a hosted spreadsheet like google sheets or airtable as the db. https://sheety.co <- allows u to turn your google sheet into an easy api.
That is insightful. Thanks.
Maybe Iâ missing the point, but nobody here is trying to NOT use a database.
The first line in Wikipedia âdatabaseâ page is:
Then, if your application manages data, you will have a database. And as it has always been the case, the real question is âwhich database should I use?â and the subsidiary questions Iâve read all along this post:
- Do I want to rely on an external system?
- Do I want to use a new language for data querying?
- What about (write / query / both) performances?
- Do I have will / time / skills to build my own database?
- Do I want to decouple my business logic from my storage?
- What kind of data do I want to store?
- Do I need to store this data?
- Can I afford data loss?
- What about legal? (GDPR vs append-only DB )
etc.
These are pure tech questions, imho and to summarize a bit fast, DDD adds another question:
- Do I want to express my need in a data way or a more user friendly (that even non-tech could understand) way?
And ultimately, the mother of all questions is:
- What are my use cases?
Itâs strange that almost each time I read something about a DB (system or not), it tends to turn out to be some apology of this DB system against one (or all) others. All DB systems are good if they fit their use cases: you donât use MongoDb if you care about your data, you donât use RDBMS if you want to model a social network, you donât use graph database to compute stats, and so on.
Like half the points you enumerated are political or, at least, are not decided by the programmers writing the code on the ground. They are decided by at least team leaders, more likely CTOs.
As for DB-vs-DB: there are unquestionable benefits of using one DB as opposed to another but thatâs a huge topic on its own.
As for âshould we even use a DBâ â this discussion is not monopolized at all. But most people naturally come to the conclusion that the DB helps.