Contributing#

👋 Hi! Thanks for your interest in contributing to Icechunk!

Icechunk is an open source (Apache 2.0) project and welcomes contributions in the form of:

Usage questions - open a GitHub issue
Bug reports - open a GitHub issue
Feature requests - open a GitHub issue
Documentation improvements - open a GitHub pull request
Bug fixes and enhancements - open a GitHub pull request

Development#

Python Development Workflow#

The Python code is developed in the icechunk-python subdirectory. To make changes first enter that directory:

cd icechunk-python

Create / activate a virtual environment:

VenvConda / Mambauv

python3 -m venv .venv
source .venv/bin/activate

mamba create -n icechunk python=3.12 rust zarr
mamba activate icechunk

uv sync

Install maturin:

Venvuv

pip install maturin

Build the project in dev mode:

maturin develop

# or with the optional dependencies
maturin develop --extras=test,benchmark

or build the project in editable mode:

pip install -e icechunk@.

uv manages rebuilding as needed, so it will run the Maturin build when using uv run.

To explicitly use Maturin, install it globally.

uv tool install maturin

Maturin may need to know it should work with uv, so add --uv to the CLI.

maturin develop --uv --extras=test,benchmark

Testing#

The full Python test suite depends on S3 and Azure compatible object stores.

They can be run from the root of the repo with docker compose up (ctrl-c then docker compose down once done to clean up.).

uv

uv run pytest

Running Xarray Backend Tests#

Icechunk includes integration tests that verify compatibility with Xarray's zarr backend API. These tests require the Xarray repository to be cloned locally.

Set the environment variables (adjust XARRAY_DIR to point to your local Xarray clone):

export ICECHUNK_XARRAY_BACKENDS_TESTS=1
export XARRAY_DIR=~/Documents/dev/xarray  # or your xarray location

Run the Xarray backend tests:

python -m pytest -xvs tests/run_xarray_backends_tests.py \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

To run a specific Xarray test you have first specify a class defined in @icechunk-python/tests/run_xarray_backends_tests.py and then specify an xarray test. For example:

python -m pytest -xvs tests/run_xarray_backends_tests.py::TestIcechunkStoreFilesystem::test_pickle \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

Rust Development Workflow#

Prerequisites#

Install the just command runner (used for build tasks and pre-commit hooks):

cargo install just

Or using other package managers:

macOS: brew install just
Ubuntu: snap install --edge --classic just

Building#

Build the Rust workspace:

# Build all packages
just build

# Build release version
just build-release

# Compile tests without running them
just compile-tests

Testing#

# Run all tests
just test

# Run tests with logs enabled
just test-logs debug

# Run only specific tests
cargo test test_name

Code Quality#

We use a tiered pre-commit system for fast development:

# Fast checks (~3 seconds) - format and lint only
just pre-commit-fast

# Medium checks (~2-3 minutes) - includes compilation and deps
just pre-commit

# Full CI checks (~5+ minutes) - includes all tests and examples
just pre-commit-ci

Individual checks:

# Format code
just format

# Check formatting without changing files
just format --check

# Lint with clippy
just lint

# Check dependencies for security issues
just check-deps

Pre-commit Hooks#

We use pre-commit to automatically run checks. Install it:

pip install pre-commit
pre-commit install

The pre-commit configuration automatically runs:

Every commit: Fast Python and Rust checks (~2 seconds total)
Before push: Medium Rust checks (compilation + dependencies)
Manual: Full CI-level checks when needed

To run manually:

# Run on changed files only
pre-commit run

# Run on all files
pre-commit run --all-files

# Run full CI checks manually
pre-commit run rust-pre-commit-ci --hook-stage manual

Roadmap#

Features#

Support more object stores and more of their custom features
Better Python API and helper functions
Bindings to other languages: C, Wasm
Better, faster, more secure distributed sessions
Savepoints and persistent sessions
Chunk and repo level statistics and metrics
More powerful conflict detection and resolution
Efficient move operation
Telemetry
Zarr-less usage from Python and other languages
Better documentation and examples

Performance#

Lower changeset memory footprint
Optimize virtual dataset prefixes
Bring back manifest joining for small arrays
Improve performance of ancestry, garbage_collect, get_size and other metrics
More flexible caching hierarchy
Better I/O pipeline
Better GIL management
Request batching and splitting
Bringing parts of the codec pipeline to the Rust side
Chunk compaction

We’re very excited about a number of extensions to Zarr that would work great with Icechunk.

Contributing#

Development#

Python Development Workflow#

Testing#

Running Xarray Backend Tests#

Rust Development Workflow#

Prerequisites#

Building#

Testing#

Code Quality#

Pre-commit Hooks#

Roadmap#

Features#

Performance#

Zarr-related#