Announcing the first stable release of ExZarr - a pure Elixir implementation of the Zarr specification for compressed, chunked, N-dimensional arrays!
What is ExZarr?
ExZarr brings Zarr to the Elixir ecosystem, enabling high-performance storage and processing of large scientific and machine learning datasets. It’s fully compatible with Python zarr-python, allowing seamless data exchange between Elixir and Python applications.
Why Zarr?
Zarr is a format for chunked, compressed, N-dimensional arrays designed for parallel computing. It’s widely used in:
- Scientific computing (climate, genomics, astronomy)
- Machine learning (large training datasets, model checkpoints)
- Data engineering (data lakes, ETL pipelines)
Zarr is an open, cloud-native storage format designed for working with very large, multidimensional array data. Instead of storing data in a single monolithic file, Zarr breaks arrays into many independently compressed chunks, each of which can be read or written on its own. This design makes it particularly well suited to modern workflows where data lives in object storage (like S3), is accessed in parallel by many workers, or is processed incrementally rather than all at once. Zarr organizes data hierarchically with simple, human-readable metadata, and it is supported across a growing ecosystem of languages and tools, especially in Python-based scientific and data engineering stacks.
The main strength of Zarr lies in performance and scalability: you can efficiently stream just the slices of data you need, process them in parallel, and avoid the I/O bottlenecks common in traditional file formats. This makes it a natural fit for domains like climate science, remote sensing, bioimaging, genomics, and machine learning, where datasets are often terabytes in size and accessed by distributed compute. The trade-offs are that Zarr requires some care in choosing chunk sizes to get good performance, its ecosystem is still maturing compared to long-established formats like HDF5 or NetCDF, and it is not ideal for non-array-centric data models. When your problem is fundamentally about large numerical arrays at scale—especially in the cloud—Zarr tends to shine.
Key Features
- N-dimensional arrays with 10 data types
- Zarr v2 and v3 support with automatic version detection
- High performance: 26x faster multi-chunk reads
- Multiple storage backends: filesystem, S3, GCS, Azure, MongoDB, etc.
- Chunk streaming with lazy evaluation and parallel processing
- Full Python compatibility - read/write arrays created by zarr-python
Production-Ready Quality
v1.0.0 represents months of hardening:
- 1,713 tests (100% passing)
- 80.3% code coverage
- 65 property-based tests
- Zero warnings (compilation, credo, dialyzer)
- Comprehensive security documentation
- Sobelow security analysis (0 high/medium warnings)
Quick Example
# Create a 2D array
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
storage: :filesystem,
path: "/tmp/my_array"
)
# Write some data
:ok = ExZarr.set_item(array, [0..99, 0..99], data)
# Read it back
{:ok, data} = ExZarr.get_item(array, [0..99, 0..99])
Installation
def deps do
[
{:ex_zarr, "~> 1.0"}
]
end
Resources
- Hex: ex_zarr | Hex
- Docs: ExZarr v1.0.0 — Documentation
- Changelog: ExZarr/CHANGELOG.md at main · thanos/ExZarr · GitHub
- GitHub:
Looking Forward
Planned for v1.1.0:
- Nx integration
- Cloud storage optimizations
- Enhanced v3 format support
- Performance improvements for >1TB arrays






















