ElixirProto - A simple Elixir-native alternative to Profobuf

mindreframer · September 14, 2025, 8:30am

ElixirProto: Protobuf-Inspired Serialization for Elixir Events

I wanted to have an Elixir-native serialization for events, that also supports some Protobuf features:

compact format
backwards-compatible schema evolution
no direct coupling to Elixir struct names

If you’ve ever worked with event sourcing or audit logs, you know the pain: thousands of events piling up in storage, each carrying redundant schema names and field information. A simple event becomes 80+ bytes when it could be 30.

ElixirProto borrows Protobuf’s core insight—use numeric schema IDs and positional fields instead of names. But it stays in Elixir-land, using the robust :erlang.term_to_binary format you already trust.

This is the initial version with limited support for real production demands. Future features will be added very carefully and slowly. The good news is: the storage format will stay stable. I’m doubling down on :erlang:term_to_binary. This wont change.

{struct_index, [{1: "value1", 2: "value2"}] } feels like a robust, yet flexible format to store Elixir structs directly, with the guarantee that you can read them for future decades.

Pros:

you don’t need to add complex libraries and binaries to your dev setup
this library is tiny and very transparent
very compact final result
not coupled to your Elixir struct names (feel free to rename them!)
not coupled to your Elixir struct attributes names (feel free to rename them!)
previous versions of this struct with less fields can be deserialized without any issues
big deal for real production apps to guarantee that your data stay compact and accessible in the future

Cons

Elixir-only
not explicit types for attributes
no support for nested types
no typespecs

Some of this cons might be addressed in future updates.

Example

defmodule OrderCreated do
  use ElixirProto.Schema, name: "orders.created", index: 1
  defschema [:order_id, :customer_id, :total, :currency, :timestamp]
end

event = %OrderCreated{order_id: "123", customer_id: "456", total: 99.99}
encoded = ElixirProto.encode(event)  # ~40% smaller typically

When It Matters

Event stores: Those millions of domain events add up fast in storage costs
Message queues: Smaller payloads mean better throughput and lower AWS bills
Audit logs: Compliance data you can’t delete but rarely access
Analytics pipelines: Moving lots of similar events between services

Like Protobuf, you get schema evolution for free—append new fields and old data still works.

The Trade-offs

More setup: You manage schema indices manually (like Protobuf field numbers). Pick wrong and you’re stuck with them.

Another dependency: Sometimes Jason.encode! or plain :erlang.term_to_binary is simpler and good enough.

Overkill for small volumes: If you’re not storing thousands of events daily and watching storage costs climb, the built-ins work fine.

Worth It?

If you’re storing thousands of events daily and watching storage costs climb, probably yes. If you’re building a simple CRUD app, probably no.

It’s Protobuf’s space efficiency without leaving Elixir’s type system. Whether that trade-off makes sense depends on how much you’re paying for those extra bytes.

Some benchmark results here. Please be careful and evaluate those results with your own data requirement.

ElixirProto Benchmark

Performance Results

Encoding Performance (operations per second)

Sparse user: ElixirProto 237K ops/s vs Plain 177K ops/s (34% faster)
Product: ElixirProto 122K ops/s vs Plain 117K ops/s (4% faster)
Large sparse struct: ElixirProto 116K ops/s vs Plain 91K ops/s (28% faster)
Single user (full): ElixirProto 102K ops/s vs Plain 102K ops/s (equivalent)
Large full struct: ElixirProto 84K ops/s vs Plain 72K ops/s (16% faster)

Memory Usage Per Operation

Plain serialization: ~0.26 KB per operation
ElixirProto: 1.75-9.37 KB per operation (7-36x more memory usage)

ElixirProto uses significantly more memory during encoding due to intermediate data structure creation.

Collection Performance (individual encoding)

ElixirProto 100 sparse users: 2.32K collections/s
ElixirProto 50 products: 2.32K collections/s
Plain 100 sparse users: 1.69K collections/s (38% slower)
Plain 50 products: 2.12K collections/s (9% slower)

Payload Size Analysis

|----------|--------------|------------|-------------|---------|-----------|

ElixirProto compression ratio vs original data:

Sparse user: 27.2% of original size
Large sparse struct: 7.3% of original size

Field Count Impact Analysis

ElixirProto space savings with varying field density:

5 fields: 161 bytes saved (78.2% savings)
10 fields: 164 bytes saved (72.6% savings)
20 fields: 166 bytes saved (64.6% savings)
30 fields: 168 bytes saved (60.4% savings)
50 fields: 160 bytes saved (54.2% savings)

Key insight: ElixirProto maintains strong savings even as field count increases, due to index-based field representation and nil omission.

Collection Size Analysis

Individual Struct Encoding (100 full users)

ElixirProto total: 28,949 bytes
Plain total: 33,214 bytes
Savings: 4,265 bytes (12.8%)

Collection vs Individual Comparison

Plain collection (100 users as list): 1,896 bytes
Individual sum: 33,214 bytes
Collection advantage: 31,318 bytes saved (94.3% reduction)

For bulk data, plain collection serialization is extremely space-efficient compared to individual struct encoding.

Performance vs Payload Trade-offs

ElixirProto Advantages

Sparse data: Up to 71% space savings
Schema evolution: Explicit indices enable backward compatibility
Bandwidth efficiency: Smaller payloads for network transfer
Storage optimization: Reduced disk/memory footprint for persistent data
Field omission: Automatic nil field exclusion

Plain Serialization Advantages

Memory efficiency: 7-36x less memory usage during encoding
Collection handling: Extremely efficient for bulk data
Simplicity: No schema management required
Compatibility: Works with any Elixir data structure
Development speed: No setup overhead

When to Use Each Approach

Use ElixirProto For:

Event sourcing with sparse events
API responses with optional fields
Database records with many nullable columns
Message queues with payload size limits
Mobile/IoT applications with bandwidth constraints
Long-term data storage where compression matters

Use Plain Serialization For:

Hot path encoding/decoding (performance critical)
In-memory caching
Collections of mixed data types
Temporary data structures
Development and debugging
Applications where simplicity > space optimization

Test Environment

Hardware: Apple M3 Ultra
Software: Elixir 1.18.3, Erlang 27.3, JIT enabled
Compression: zlib for both approaches
Benchmark Tool: Benchee with 3s runtime + 2s warmup

Reproducing Results

Run mix run benchmarks/basic.exs to generate current results.

Sample Structures

User: 7 fields (id, name, email, age, active, created_at, metadata)
Product: 8 fields (id, name, description, price, category, in_stock, tags, specs)
LargeStruct: 50 fields (field_01 through field_50)

Conclusion

ElixirProto is optimized for space-efficient serialization of structured data with predictable schemas. It excels with sparse data and provides substantial space savings at the cost of higher memory usage during encoding. Plain serialization remains the better choice for performance-critical paths and mixed data types.

elixir_proto | Hex

mindreframer · September 14, 2025, 8:42am

In the meantime I have added support for nested serialization and TypedStruct-inspired spec types (as an alternative implementation).

The serialization stays the same, but with ElixirProto.TypedSchema one gets also the typespecs.