Opinion on file & memory based event sourcing system

Exadra37 · December 20, 2020, 10:12pm

I thinking that disabling the Linux write cache could help here:

Not all system’s belong to the same “turn-on write-back caching” recommendation group as write-back caching caries a risk of data loss in the event such as power failure etc. In the event of power failure, data residing in the hard drive’s cache do not get a chance to be stored and a lost. This fact is especially important for database system. In order to disable write-back caching set write-caching to 0:

# hdparm -W0 /dev/sda

/dev/sda:
setting drive write-caching to 0 (off)
write-caching =  0 (off)

Exadra37 · December 20, 2020, 10:19pm

Reading the Kafka design decisions about going with a filesystem approach while taking advantage of all the low level stuff that Linux as to offer may help you in making some good decisions.

jdumont · March 24, 2021, 8:14am

Thanks for the info and links! I’ve tabled this project whilst I work on other things — bills to pay and all that… — but the idea still keeps nagging at me. I had intended to look at the guts of systems like Kafka and EventStore properly when I revisit this.

Do you have any insight on how the Erlang VM might impact this approach? When looking at this before it seems that any access to the low level OS stuff outside the VM is more limited than with other systems, but that may just be my unfamiliarity with the guts of the VM.

Exadra37 · March 24, 2021, 9:33am

This approach of using files to persist the data need to take in consideration that the Erlang library disk_log by default as a delay of 2 seconds or 64kb to write to the disk as I say here:

This setting can be tuned, but care needs to be taken to find the right balance in terms of disk IO performance, otherwise you may create a bottleneck when writing to the disk.

I am not that familiar to, but @dimitarvp may have something to say here.

From what I remember when reading the Kafka design decisions you don’t need to interact with the OS from the BEAM, you just need to tune the machine for disk IO as they recommend.

Exadra37 · December 26, 2022, 11:19pm

Did you get back to this at some point?

jdumont · December 27, 2022, 12:42pm

I hadn’t — I’ve actually changed careers and work as a photographer now — but funnily enough started poking around my terminal again today and seeing what’s new in Elixir as I’ve a few ideas I want to try (on my own time). Event sourcing is still one of those things, but I’ll need to dig into it again and see whether this approach has merit versus a more conventional ES setup using Postgres.