A Database-less App Framework?

That train of thought often leads to:

  • Apply “the change” to application state in memory
  • Eventually write “the change” (preserving sequence) to an append-only log
  • Replay “changes” to recreate last known state

leading people to Event Sourcing (which sometimes doesn’t end well especially once CQRS gets involved).

4 Likes

leading people to Event Sourcing (which sometimes doesn’t end well especially once CQRS gets involved).

Oooh, why? Tbh, I have been totally sold on event sourcing…

2 Likes

Database is a tool that we sometimes overuse, but I think dropping it completely would solve nothing and will probably cause more harm then good.

I agree with @cmkarlsson in that we must differentiate between the “permanent” kind of data that is meant to be held in the Database and will probably outlive the app and other kinds that do not belong there. If the application only has “other kinds” of data then it’s a good case to go database-less, I never had a chance to work on such an app as of yet though :slight_smile:

2 Likes

I don’t think event source applications typically would apply the change to application state before it is written to durable storage. Typically a write to the log (e.g. Kafka) is the first thing that happens.

2 Likes

Which makes sense if consistency is the primary concern (e.g. bank accounts) - waiting for the write to complete is going to delay updating application state - as always, it’s about making tradeoffs.

Usually because they go for CQRS/ES - Martin Fowler comments on it in his CQRS portion of his talk.

3 Likes

There are some ways to build an OLAP (analytical) and OLTP (transactional), each of them serve different purpose. And btw, most of the time, indexing on both of them are using B-Tree, which is good enough. Other indexing types are okay and good if used correctly. We can think relational database as a safe-zone to store our data, before we contemplate to use other faster and specific-purpose of storage. That is, I prefer to look really carefully on my tools before I scale and relational database is probably the safest (although not fastest) approach, because:

  • It has ACID principle, full featured transactions, and lots of useful constraints,
  • It has the most supporting tools available,
  • It is one of the oldest technology,
  • It is backed by giant corporations and math :wink:.

I do say that RDBMS is the safe-zone, because it has the smallest possibility of being dead in comparison to other storage products.

I think, unless we are building the next Alibaba or Linkedin, RDBMS is good enough.

Though, I also use cache in my apps (redis or ETS) to quicken things up.

N.B. And that’s why I thankful that Ecto is more explicit than ActiveRecord, I no longer stuck fixing N+1 query ever again in Phoenix. This is a basic thing to do, but gives dramatic performance improvements in some cases.

3 Likes

There’s a few more cases where I would say it’s not good enough in addition to “massive systems” that you have in mind here I think too, but generally I agree with all that you are saying. For highly event-driven systems (such as backing IoT or messaging, game servers) they don’t really have to be massive for making sense for stuff to run in memory / other type of storage than RDBMS

2 Likes

Or Akka has is out of the box → Akka persistence

2 Likes

I wonder what’s the difference between the Erleans ‘Grain’ approach, and a garden variety NoSQL database… It seems to be kind of reinventing the wheel, no?

In my head, something like Joe Armstrong’s save-binary-to-disk idea, seems more natural (but also more restrictive, in terms of the amount of storage) but not reinventing the wheel either.

1 Like

In a way that’s what riak_core and now lasp are - your application is your database.

I never heard of Lasp before - thanks for this.

They both have an emphasis on distribution - I’d rather something simpler. Like SQLite, but just binary data…

Some times stateful solutions are very useful and powerful, and it could simplify a lot of things for modern/future apps.
Say we are creating a react, which mostly would having a datastore in the front-end, and views observes the data store and renders.
If you change something in the datastore, the views automatically changes. Instead of manually update the DOM like 10 or 15 years ago.
However if you call fetch to update a resource, you may have to refetch some resources and invalidate the cache case by case, according you business rules.
If you put the observable store concept in the back-end, over WebSocket, doing diff on back-end (could be done with generic library similar to redux and react-redux) views and push updated data, all of those case by case refetch logic could be deleted.

I feel there should be a simpler solution compared to event sourcing. Pretty much like just auto backup snapshot for actor states without event sourcing.

Event sourcing is a way to let you easily map to different projections even based on past events, which could give you great advantage of exploring business possibilities.

It’s great but might be a overkill for most of the apps. It’s really a great and beautiful functional architecture.
However, as a consultant, I keep hearing organizations fails to manage the problems caused by event sourcing – including but not limited to performance issue when bulk load projection, missing event, mental overhead. Although ideally there should be an excellent library automatically manage everything magically, because the nature of event sourcing is pretty easy. But we are not there yet. There are several implementation is really great like Akka persistence, eventsourced and other awesome libraries that cares most things for you, but yet they’re not letting you hand-off and stop worrying.

My thought there should be a light weight actor behaviour just automatically backup & persist actor state, and resume state when it fails. So that you wouldn’t have to worry about projection, event versioning, differences between commands / events etc.

I’m curious about riak_core recently but haven’t yet got time to spike anything about it. I’m not sure it could be distributed storage or actors.
I believe the ideal way of modeling is to bind callbacks with actor instead of storing them to the databases to cold data and revive them when using them, in which case they have no power to do things actively.

2 Likes

That’s exactly my current thinking. Keep things super simple - but v useful perhaps.

2 Likes

DBs are v good at many things, but are also a little bit complicated.

I suspect many apps don’t need to scale to need DB amounts of data.

For that niche, it’d be nice to have a simpler solution, eh?

Maybe it’s important to take a step back, and let us consider what a ‘data base’ actually is:

It’s just some place in which you store (some of) your data.

Whether this is inside your (OS) application or outside of it does not really matter that much from a nomenclature point-of-view.

Historically, data base-frameworks have grown to become their own separate (OS) applications because this meant they could be started, stopped and versioned separately from the main (OS) application.

In a world like the BEAM which is its own operating-system like environment, it makes a lot of sense to work with a data base that runs inside of this system, because of:

  • No separate serialization/deserialization-steps necessary when reading/writing from/to the data base.
  • Distribution is possible in the same way as your normal application.
  • No two (OS)-processes competing for the attention of the (OS) schedulers; rather, they can be (semi-)preemptively scheduled by the BEAM schedulers.
  • Less mental complexity than when using two separate technologies that have to talk to one-other.

However, the idea to run a data base inside the BEAM is somewhat new, or at least widely unexpored territory: There is of course (D)ETS and Mnesia built on top of that, and then we have riak_core (but using the normal riak data base inside your own BEAM instances is not possible AFAIK). CouchDB was also not built with in-BEAM support in mind.

So Mnesia, riak_core and lasp are our only current contenders for distributed in-app data bases, and none of these are easy to get started with.

But if you don’t need to go to distributed scale, you can just use ETS, or even simpler: A process that periodically backs up its state using :erlang.term_to_binary and writes it to a file.

2 Likes

That’s what I am thinking about.

Everyone focuses on planetary scale distribution. But in many cases, the usual use-case is quite small…

2 Likes

Do you just mean how grains are stored?

There isn’t any, you can just as easily use a NoSQL store for grain persistence. Postgres is what it supports right now just because that is what we use at work. Orleans supports a number of different databases for grain persistence.

3 Likes

Nope, when reading up about it, it seemed that Orleans might be getting close to NoSQL land.

I.e. indexed, but largely unstructured storage of data.

1 Like

The default phoenix generator will create an application with ecto - the reason is to give the user something “with batteries included”. You can easily remove this by passing the --no-ecto flag to the mix phoenix.new generator.

Extracted from:

1 Like

Ah, true, there is work on indexed actors.

1 Like

Hi, I know it’s 3 years old topic but somehow I have similar issue. I need “in-house” small data storage which has to be distribute between multiple apps(2 kind of apps with multiple instances) to share state. I’m thing about mnesia and Cachex for example. I wonder how you ended up and what solutions are available in these days.
Main reson for me to find this kind of storage it’s because I have two apps:

  1. api app
  2. show app
    You can create structure with api and when you create you can show it with shop app. We use postgresql right now to store these information. It works well, until postgresql server is not available and we have to prepare for bigger traffic. So for example when we lost connection with psql for 3 seconds we can lost about 300-500 api calls. Right now I see only one solution and that’s distributed in-house(in-beam) storage which will handle it and async storing in posgresql for other use. I’m not sure if I need persistent storage for that kind of in-house db. I see only one reason for it: when some entries are not saved into postgresql and some app crash. But distribution can help with it.

Cachex is nice cache solution. But I have feelings that using it for my use case is not right and can make me more other problems. This kind of cache is more cache for already stored data, not pre-save cache I think.

Mnesia is battle tested, but I need only key-value storage where I store some information. I take token from URL, decrypt ids, load data by that ids and use data to show call 3rd party APIs and show informations.

1 Like