ElixirProto - A simple Elixir-native alternative to Profobuf

ElixirProto: Protobuf-Inspired Serialization for Elixir Events

I wanted to have an Elixir-native serialization for events, that also supports some Protobuf features:

  • compact format
  • backwards-compatible schema evolution
  • no direct coupling to Elixir struct names

If you’ve ever worked with event sourcing or audit logs, you know the pain: thousands of events piling up in storage, each carrying redundant schema names and field information. A simple event becomes 80+ bytes when it could be 30.

ElixirProto borrows Protobuf’s core insight—use numeric schema IDs and positional fields instead of names. But it stays in Elixir-land, using the robust :erlang.term_to_binary format you already trust.

This is the initial version with limited support for real production demands. Future features will be added very carefully and slowly. The good news is: the storage format will stay stable. I’m doubling down on :erlang:term_to_binary. This wont change.

{struct_index, [{1: "value1", 2: "value2"}] } feels like a robust, yet flexible format to store Elixir structs directly, with the guarantee that you can read them for future decades.

Pros:

  • you don’t need to add complex libraries and binaries to your dev setup
  • this library is tiny and very transparent
  • very compact final result
  • not coupled to your Elixir struct names (feel free to rename them!)
  • not coupled to your Elixir struct attributes names (feel free to rename them!)
  • previous versions of this struct with less fields can be deserialized without any issues
  • big deal for real production apps to guarantee that your data stay compact and accessible in the future

Cons

  • Elixir-only
  • not explicit types for attributes
  • no support for nested types
  • no typespecs

Some of this cons might be addressed in future updates.

Example

defmodule OrderCreated do
  use ElixirProto.Schema, name: "orders.created", index: 1
  defschema [:order_id, :customer_id, :total, :currency, :timestamp]
end

event = %OrderCreated{order_id: "123", customer_id: "456", total: 99.99}
encoded = ElixirProto.encode(event)  # ~40% smaller typically

When It Matters

  • Event stores: Those millions of domain events add up fast in storage costs
  • Message queues: Smaller payloads mean better throughput and lower AWS bills
  • Audit logs: Compliance data you can’t delete but rarely access
  • Analytics pipelines: Moving lots of similar events between services

Like Protobuf, you get schema evolution for free—append new fields and old data still works.

The Trade-offs

More setup: You manage schema indices manually (like Protobuf field numbers). Pick wrong and you’re stuck with them.

Another dependency: Sometimes Jason.encode! or plain :erlang.term_to_binary is simpler and good enough.

Overkill for small volumes: If you’re not storing thousands of events daily and watching storage costs climb, the built-ins work fine.

Worth It?

If you’re storing thousands of events daily and watching storage costs climb, probably yes. If you’re building a simple CRUD app, probably no.

It’s Protobuf’s space efficiency without leaving Elixir’s type system. Whether that trade-off makes sense depends on how much you’re paying for those extra bytes.

Some benchmark results here. Please be careful and evaluate those results with your own data requirement.

ElixirProto Benchmark

Performance Results

Encoding Performance (operations per second)

  • Sparse user: ElixirProto 237K ops/s vs Plain 177K ops/s (34% faster)

  • Product: ElixirProto 122K ops/s vs Plain 117K ops/s (4% faster)

  • Large sparse struct: ElixirProto 116K ops/s vs Plain 91K ops/s (28% faster)

  • Single user (full): ElixirProto 102K ops/s vs Plain 102K ops/s (equivalent)

  • Large full struct: ElixirProto 84K ops/s vs Plain 72K ops/s (16% faster)

Memory Usage Per Operation

  • Plain serialization: ~0.26 KB per operation

  • ElixirProto: 1.75-9.37 KB per operation (7-36x more memory usage)

ElixirProto uses significantly more memory during encoding due to intermediate data structure creation.

Collection Performance (individual encoding)

  • ElixirProto 100 sparse users: 2.32K collections/s

  • ElixirProto 50 products: 2.32K collections/s

  • Plain 100 sparse users: 1.69K collections/s (38% slower)

  • Plain 50 products: 2.12K collections/s (9% slower)

Payload Size Analysis

| Scenario | Uncompressed | Plain+gzip | ElixirProto | Savings | % Savings |

|----------|--------------|------------|-------------|---------|-----------|

| Single User (full) | 478 bytes | 332 bytes | 289 bytes | 43 bytes | 13.0% |

| Single User (sparse) | 125 bytes | 111 bytes | 34 bytes | 77 bytes | 69.4% |

| Single Product | 349 bytes | 254 bytes | 196 bytes | 58 bytes | 22.8% |

| Large Struct (50/50 fields) | 1,279 bytes | 301 bytes | 136 bytes | 165 bytes | 54.8% |

| Large Struct (10/50 fields) | 879 bytes | 225 bytes | 64 bytes | 161 bytes | 71.6% |

ElixirProto compression ratio vs original data:

  • Sparse user: 27.2% of original size

  • Large sparse struct: 7.3% of original size

Field Count Impact Analysis

ElixirProto space savings with varying field density:

  • 5 fields: 161 bytes saved (78.2% savings)

  • 10 fields: 164 bytes saved (72.6% savings)

  • 20 fields: 166 bytes saved (64.6% savings)

  • 30 fields: 168 bytes saved (60.4% savings)

  • 50 fields: 160 bytes saved (54.2% savings)

Key insight: ElixirProto maintains strong savings even as field count increases, due to index-based field representation and nil omission.

Collection Size Analysis

Individual Struct Encoding (100 full users)

  • ElixirProto total: 28,949 bytes

  • Plain total: 33,214 bytes

  • Savings: 4,265 bytes (12.8%)

Collection vs Individual Comparison

  • Plain collection (100 users as list): 1,896 bytes

  • Individual sum: 33,214 bytes

  • Collection advantage: 31,318 bytes saved (94.3% reduction)

For bulk data, plain collection serialization is extremely space-efficient compared to individual struct encoding.

Performance vs Payload Trade-offs

ElixirProto Advantages

  • Sparse data: Up to 71% space savings

  • Schema evolution: Explicit indices enable backward compatibility

  • Bandwidth efficiency: Smaller payloads for network transfer

  • Storage optimization: Reduced disk/memory footprint for persistent data

  • Field omission: Automatic nil field exclusion

Plain Serialization Advantages

  • Memory efficiency: 7-36x less memory usage during encoding

  • Collection handling: Extremely efficient for bulk data

  • Simplicity: No schema management required

  • Compatibility: Works with any Elixir data structure

  • Development speed: No setup overhead

When to Use Each Approach

Use ElixirProto For:

  • Event sourcing with sparse events

  • API responses with optional fields

  • Database records with many nullable columns

  • Message queues with payload size limits

  • Mobile/IoT applications with bandwidth constraints

  • Long-term data storage where compression matters

Use Plain Serialization For:

  • Hot path encoding/decoding (performance critical)

  • In-memory caching

  • Collections of mixed data types

  • Temporary data structures

  • Development and debugging

  • Applications where simplicity > space optimization

Test Environment

  • Hardware: Apple M3 Ultra

  • Software: Elixir 1.18.3, Erlang 27.3, JIT enabled

  • Compression: zlib for both approaches

  • Benchmark Tool: Benchee with 3s runtime + 2s warmup

Reproducing Results

Run mix run benchmarks/basic.exs to generate current results.

Sample Structures

  • User: 7 fields (id, name, email, age, active, created_at, metadata)

  • Product: 8 fields (id, name, description, price, category, in_stock, tags, specs)

  • LargeStruct: 50 fields (field_01 through field_50)

Conclusion

ElixirProto is optimized for space-efficient serialization of structured data with predictable schemas. It excels with sparse data and provides substantial space savings at the cost of higher memory usage during encoding. Plain serialization remains the better choice for performance-critical paths and mixed data types.

7 Likes

In the meantime I have added support for nested serialization and TypedStruct-inspired spec types (as an alternative implementation).

The serialization stays the same, but with ElixirProto.TypedSchema one gets also the typespecs.

1 Like