Arraylake#
Icechunk is a free, open-source (Apache 2.0) transactional storage engine for Zarr. Arraylake is Earthmover's managed cloud platform, built on top of Icechunk. Both are made by Earthmover.
Icechunk is a fully capable, standalone product -- you never need Arraylake to use it. Arraylake adds operational, collaboration, and data-delivery features on top of Icechunk for teams that want a managed experience.
Migrating from Icechunk to Arraylake is easy, as any Icechunk Repository can be directly imported into Arraylake.
Feature Comparison#
Open-Source Storage Engine#
Arraylake is built on Icechunk, so these core capabilities ship with Icechunk and are available in both Icechunk and Arraylake.
| Feature | Icechunk | Arraylake |
|---|---|---|
| Transactional storage engine | ||
| Version control (branches, tags, time travel) | ||
| ACID transactions with serializable isolation | ||
| Virtual chunk references (HDF5, NetCDF, GRIB, TIFF) | ||
| Parallel / distributed writes | ||
| Cloud storage (S3, GCS, Azure, R2, etc.) | Self-managed | Earthmover-managed or bring your own bucket |
Collaboration & Access Control#
Arraylake adds team-oriented security and identity management on top of Icechunk's storage layer.
| Feature | Icechunk | Arraylake |
|---|---|---|
| Role-based access control (RBAC) | Relies on cloud IAM | Org-level and repo-level roles |
| SSO / SAML authentication | Google, GitHub, Microsoft AD | |
| Credential vending | You manage credentials | Automatic temporary credential delegation |
| API keys for service accounts | Scoped permissions with expiration | |
| Virtual chunk security | Every reader must manage credentials for external data sources | Org-level policies control which external sources are accessible; readers never handle credentials |
Data Catalog & Sharing#
Arraylake provides a central catalog for scientific data with native understanding of multidimensional arrays -- making it easy to discover, explore, and share datasets within and across organizations.
| Feature | Icechunk | Arraylake |
|---|---|---|
| Repository catalog & web UI | Browse, search, and inspect repos | |
| Repository metadata & tagging | Classify and filter repos with arbitrary metadata | |
| Organization-level dashboards | Aggregated view across all repos | |
| Cross-organization sharing | Share datasets between organizations with read-only mirrors | |
| Data marketplace | Publish and subscribe to datasets (free or paid) | |
| Filtered subscriptions | Data providers can gate access to subsets of a dataset behind a paywall |
Data Delivery#
Arraylake's Flux service exposes your data through industry-standard protocols, with no additional infrastructure to manage.
| Feature | Icechunk | Arraylake |
|---|---|---|
| EDR (Environmental Data Retrieval) | OGC-compliant | |
| Map Tiles API | OGC Tiles | |
| WMS (Web Map Service) | OGC v1.3.0 + ncWMS extensions | |
| OPeNDAP / DAP2 |
Operations & Monitoring#
Arraylake automates routine maintenance and gives visibility into repository health.
| Feature | Icechunk | Arraylake |
|---|---|---|
| Garbage collection & data expiration | You run it | Scheduled, runs on managed compute |
| Monitoring & metrics dashboards | Repo-level and org-level | |
| Webhooks & Slack notifications | Commit events | |
| Performance tuning | Manual configuration | arraylake repo tune benchmarking |
Support & Pricing#
| Feature | Icechunk | Arraylake |
|---|---|---|
| Pricing | Free forever (Apache 2.0) | Free tier (read-only) + Professional tier |
| Support | Community (GitHub, Slack) | Priority support |
When to Use Which#
Use Icechunk on its own if you are comfortable managing your own cloud infrastructure, don't need a web UI or access control beyond cloud IAM, and want full control with no additional cost or vendor dependency, and only want to access free and open data sources.
Use Arraylake if you need team collaboration with role-based access, want a web UI for managing repositories, need to serve data via standard protocols (OGC, OPeNDAP), or want managed operations like garbage collection, credential vending, and monitoring, or if you want access to paid datasets on Arraylake.
No Lock-in#
Arraylake stores your data in Icechunk format in your own object storage (bring your own bucket), following the open Icechunk Format Specification. If you discontinue your Arraylake subscription, you can still read and write all of your data using Icechunk.