Immutable filesystems

I was directed by Ben Tyler (elixir meetup Amsterdam, he gave an excellent talk there) to this interesting paper “immutability changes everything”. There are even immutable filesystems!

Conclusion
Designs are driving towards immutability. We need immutability to coordinate at ever increasing distances. We can afford immutability given room to store data for a long time. Versioning gives us a changing view of things while the underlying data is expressed with new contents bound to a unique identifier.

Copy-on-Write: Many emerging systems leverage copy-on-write semantics to provide a façade of change while writing immutable files to an underlying store. In turn, the underlying store offers robustness and scalability because it is storing immutable files. For instance, there are many key-value systems implemented with log-structured merge trees (e.g. HBase, BigTable, & LevelDB).

Clean Replication: When data is immutable and has a unique identifier, many different challenges with replication are eased. There’s never a worry about finding a stale version of the data because there are no stale versions. Consequently, the replication system may be more fluid and less picky about where it allows a replica to land. There are fewer replication bugs, too.

Immutable DataSets: Immutable DataSets can be combined by reference with transactional database data and offer clean semantics when the DataSets project relational schema and tables. We can look at the semantics projected by an immutable DataSet and create a new version of it optimized for a different usage pattern but still projecting the same semantics. Projections, redundant copies, denormalization, indexing, and column stores are all examples of optimizing immutable data while preserving its semantics.

Parallelism and Fault Tolerance: Immutability and functional computation are the key to implementing “Big Data”.

Immutability does change everything!

Full paper here: http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

6 Likes

:cool: and concise paper…Thank You! :smile:

I’m especially interested in this topic, as I’ve gone through most of the listed aspects while evolving the ibGib framework.

disclaimer: I’m going to talk about ibGib here. I’m not hijacking the thread! It’s because ibGib is more of a concept than just a framework, and its aspects are tightly related to all things mentioned in the linked paper!

Each ibGib datum is immutable, has a hash (gib) maintaining internal integrity of each datum, and each is related to each other via rel8ns that provide not just a “single” DAG of evolving a “single” thing through time but rather create multiple monotonically increasing rel8ns providing possible n-dimensional DAG projections using the same data construct.

I especially liked his look at Copy On Write (COW) and its relationship with the physical substrates of SSDs and spindle HDDs. Certainly interesting relating these things to immutability via COW, but this is one of the novel things about ibGib which also addresses his naming slippery slope. Much like I’ve learned recently about IPFS, I designed ibGib to have ib^gib pointers, which are the combination of an ib “name” and a gib “hash”. Together, they provide a unique URL very similar to the IPFS merkle links. If you change the ib “name”, then you are creating a new ibGib, not mutating the “same” one. This is by definition of what an ib is. It has an internal data json construct that is meant for this type of thing.

So if you have a “filename”, then this is an aspect that is internal to that “thing”, so that is what you are “changing” (by creating a new ibGib and relating it to this previous version). This is an important distinction and it partially comes from the fact that I’m approaching these concepts from the ground up and not using the “file” paradigm like many (most/all?) others. So in ibGib, the “filename” is actually part of the content of the “file”. And really, the “file” itself is both a “file” and a “folder” as each ibGib has this internal content as well as rel8ns to other ibGib.

Much of this really brings out the delicacies that abound when thinking of what a “single” thing is across branching timelines, showing that there exist many strategies that could be used to resolve these branches into a single thing. The writer’s concept of a “file” (and the file-based implementations of these types of systems) combine this resolving strategy implicitly, whereas ibGib allows for the resolution as a separate step. And any resolution itself can be persisted as another ibGib in the strange and wonderful world of supersymmetrical data constructs.

What an exciting paper! Thanks again :smile:

4 Likes

I need to queue up that paper to read. Since I haven’t read it yet, I’m not sure if this will be relevant to your interests, but 20 years ago the Apple Newton PDA basically had an immutable filesystem (at least during app runtime it was immutable – you went into a special “install” mode to install/upgrade apps) with a separate data structure store called “soups” that were similar to Redis now.

3 Likes

I love antique! Not so much Apple and Steve Jobs, but beauty is amoral.

2 Likes

Remember that even PostgreSQL is immutable. :slight_smile:

3 Likes

ZFS tadadadadadada ZFS tadadadadadada

You can also look at things like IPFS or Datomic.