Contributing#
👋 Hi! Thanks for your interest in contributing to Icechunk!
Icechunk is an open source (Apache 2.0) project and welcomes contributions in the form of:
- Usage questions - open a GitHub issue
- Bug reports - open a GitHub issue
- Feature requests - open a GitHub issue
- Documentation improvements - open a GitHub pull request
- Bug fixes and enhancements - open a GitHub pull request
Development#
Python Development Workflow#
The Python code is developed in the icechunk-python subdirectory. To make changes first enter that directory:
Create / activate a virtual environment:
Install maturin:
Build the project in dev mode:
or build the project in editable mode:
Testing#
The full Python test suite depends on S3 and Azure compatible object stores.
They can be run from the root of the repo with docker compose up (ctrl-c then docker compose down once done to clean up.).
Running Xarray Backend Tests#
Icechunk includes integration tests that verify compatibility with Xarray's zarr backend API. These tests require the Xarray repository to be cloned locally.
Set the environment variables (adjust XARRAY_DIR to point to your local Xarray clone):
export ICECHUNK_XARRAY_BACKENDS_TESTS=1
export XARRAY_DIR=~/Documents/dev/xarray # or your xarray location
Run the Xarray backend tests:
python -m pytest -xvs tests/run_xarray_backends_tests.py \
-c $XARRAY_DIR/pyproject.toml \
-W ignore \
--override-ini="addopts="
To run a specific Xarray test you have first specify a class defined in @icechunk-python/tests/run_xarray_backends_tests.py and then specify an xarray test. For example:
python -m pytest -xvs tests/run_xarray_backends_tests.py::TestIcechunkStoreFilesystem::test_pickle \
-c $XARRAY_DIR/pyproject.toml \
-W ignore \
--override-ini="addopts="
Rust Development Workflow#
Prerequisites#
Install the just command runner (used for build tasks and pre-commit hooks):
Or using other package managers:
- macOS:
brew install just - Ubuntu:
snap install --edge --classic just
Building#
Build the Rust workspace:
# Build all packages
just build
# Build release version
just build-release
# Compile tests without running them
just compile-tests
Testing#
# Run all tests
just test
# Run tests with logs enabled
just test-logs debug
# Run only specific tests
cargo test test_name
Code Quality#
We use a tiered pre-commit system for fast development:
# Fast checks (~3 seconds) - format and lint only
just pre-commit-fast
# Medium checks (~2-3 minutes) - includes compilation and deps
just pre-commit
# Full CI checks (~5+ minutes) - includes all tests and examples
just pre-commit-ci
Individual checks:
# Format code
just format
# Check formatting without changing files
just format --check
# Lint with clippy
just lint
# Check dependencies for security issues
just check-deps
Pre-commit Hooks#
We use pre-commit to automatically run checks. Install it:
The pre-commit configuration automatically runs:
- Every commit: Fast Python and Rust checks (~2 seconds total)
- Before push: Medium Rust checks (compilation + dependencies)
- Manual: Full CI-level checks when needed
To run manually:
# Run on changed files only
pre-commit run
# Run on all files
pre-commit run --all-files
# Run full CI checks manually
pre-commit run rust-pre-commit-ci --hook-stage manual
Roadmap#
Features#
- Support more object stores and more of their custom features
- Better Python API and helper functions
- Bindings to other languages: C, Wasm
- Better, faster, more secure distributed sessions
- Savepoints and persistent sessions
- Chunk and repo level statistics and metrics
- More powerful conflict detection and resolution
- Efficient move operation
- Telemetry
- Zarr-less usage from Python and other languages
- Better documentation and examples
Performance#
- Lower changeset memory footprint
- Optimize virtual dataset prefixes
- Bring back manifest joining for small arrays
- Improve performance of
ancestry,garbage_collect,get_sizeand other metrics - More flexible caching hierarchy
- Better I/O pipeline
- Better GIL management
- Request batching and splitting
- Bringing parts of the codec pipeline to the Rust side
- Chunk compaction
Zarr-related#
We’re very excited about a number of extensions to Zarr that would work great with Icechunk.