Skip to content

Home / contributing

Contributing#

👋 Hi! Thanks for your interest in contributing to Icechunk!

Icechunk is an open source (Apache 2.0) project and welcomes contributions in the form of:

Development#

Python Development Workflow#

The Python code is developed in the icechunk-python subdirectory. To make changes first enter that directory:

cd icechunk-python

Create / activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
mamba create -n icechunk python=3.12 rust zarr
mamba activate icechunk
uv sync

Install maturin:

pip install maturin

Build the project in dev mode:

maturin develop

# or with the optional dependencies
maturin develop --extras=test,benchmark

or build the project in editable mode:

pip install -e icechunk@.

uv manages rebuilding as needed, so it will run the Maturin build when using uv run.

To explicitly use Maturin, install it globally.

uv tool install maturin

Maturin may need to know it should work with uv, so add --uv to the CLI.

maturin develop --uv --extras=test,benchmark

Testing#

The full Python test suite depends on S3 and Azure compatible object stores.

They can be run from the root of the repo with docker compose up (ctrl-c then docker compose down once done to clean up.).

uv run pytest

Running Xarray Backend Tests#

Icechunk includes integration tests that verify compatibility with Xarray's zarr backend API. These tests require the Xarray repository to be cloned locally.

Set the environment variables (adjust XARRAY_DIR to point to your local Xarray clone):

export ICECHUNK_XARRAY_BACKENDS_TESTS=1
export XARRAY_DIR=~/Documents/dev/xarray  # or your xarray location

Run the Xarray backend tests:

python -m pytest -xvs tests/run_xarray_backends_tests.py \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

To run a specific Xarray test you have first specify a class defined in @icechunk-python/tests/run_xarray_backends_tests.py and then specify an xarray test. For example:

python -m pytest -xvs tests/run_xarray_backends_tests.py::TestIcechunkStoreFilesystem::test_pickle \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

Rust Development Workflow#

Prerequisites#

Install the just command runner (used for build tasks and pre-commit hooks):

cargo install just

Or using other package managers:

  • macOS: brew install just
  • Ubuntu: snap install --edge --classic just

Building#

Build the Rust workspace:

# Build all packages
just build

# Build release version
just build-release

# Compile tests without running them
just compile-tests

Testing#

# Run all tests
just test

# Run tests with logs enabled
just test-logs debug

# Run only specific tests
cargo test test_name

Code Quality#

We use a tiered pre-commit system for fast development:

# Fast checks (~3 seconds) - format and lint only
just pre-commit-fast

# Medium checks (~2-3 minutes) - includes compilation and deps
just pre-commit

# Full CI checks (~5+ minutes) - includes all tests and examples
just pre-commit-ci

Individual checks:

# Format code
just format

# Check formatting without changing files
just format --check

# Lint with clippy
just lint

# Check dependencies for security issues
just check-deps

Pre-commit Hooks#

We use pre-commit to automatically run checks. Install it:

pip install pre-commit
pre-commit install

The pre-commit configuration automatically runs:

  • Every commit: Fast Python and Rust checks (~2 seconds total)
  • Before push: Medium Rust checks (compilation + dependencies)
  • Manual: Full CI-level checks when needed

To run manually:

# Run on changed files only
pre-commit run

# Run on all files
pre-commit run --all-files

# Run full CI checks manually
pre-commit run rust-pre-commit-ci --hook-stage manual

Roadmap#

Features#

  • Support more object stores and more of their custom features
  • Better Python API and helper functions
  • Bindings to other languages: C, Wasm
  • Better, faster, more secure distributed sessions
  • Savepoints and persistent sessions
  • Chunk and repo level statistics and metrics
  • More powerful conflict detection and resolution
  • Efficient move operation
  • Telemetry
  • Zarr-less usage from Python and other languages
  • Better documentation and examples

Performance#

  • Lower changeset memory footprint
  • Optimize virtual dataset prefixes
  • Bring back manifest joining for small arrays
  • Improve performance of ancestry, garbage_collect, get_size and other metrics
  • More flexible caching hierarchy
  • Better I/O pipeline
  • Better GIL management
  • Request batching and splitting
  • Bringing parts of the codec pipeline to the Rust side
  • Chunk compaction

We’re very excited about a number of extensions to Zarr that would work great with Icechunk.