Skip to content

Home / arraylake

Arraylake#

Icechunk is a free, open-source (Apache 2.0) transactional storage engine for Zarr. Arraylake is Earthmover's managed cloud platform, built on top of Icechunk. Both are made by Earthmover.

Icechunk is a fully capable, standalone product -- you never need Arraylake to use it. Arraylake adds operational, collaboration, and data-delivery features on top of Icechunk for teams that want a managed experience.

Migrating from Icechunk to Arraylake is easy, as any Icechunk Repository can be directly imported into Arraylake.

Feature Comparison#

Open-Source Storage Engine#

Arraylake is built on Icechunk, so these core capabilities ship with Icechunk and are available in both Icechunk and Arraylake.

Feature Icechunk Arraylake
Transactional storage engine
Version control (branches, tags, time travel)
ACID transactions with serializable isolation
Virtual chunk references (HDF5, NetCDF, GRIB, TIFF)
Parallel / distributed writes
Cloud storage (S3, GCS, Azure, R2, etc.) Self-managed Earthmover-managed or bring your own bucket

Collaboration & Access Control#

Arraylake adds team-oriented security and identity management on top of Icechunk's storage layer.

Feature Icechunk Arraylake
Role-based access control (RBAC) Relies on cloud IAM Org-level and repo-level roles
SSO / SAML authentication Google, GitHub, Microsoft AD
Credential vending You manage credentials Automatic temporary credential delegation
API keys for service accounts Scoped permissions with expiration
Virtual chunk security Every reader must manage credentials for external data sources Org-level policies control which external sources are accessible; readers never handle credentials

Data Catalog & Sharing#

Arraylake provides a central catalog for scientific data with native understanding of multidimensional arrays -- making it easy to discover, explore, and share datasets within and across organizations.

Feature Icechunk Arraylake
Repository catalog & web UI Browse, search, and inspect repos
Repository metadata & tagging Classify and filter repos with arbitrary metadata
Organization-level dashboards Aggregated view across all repos
Cross-organization sharing Share datasets between organizations with read-only mirrors
Data marketplace Publish and subscribe to datasets (free or paid)
Filtered subscriptions Data providers can gate access to subsets of a dataset behind a paywall

Data Delivery#

Arraylake's Flux service exposes your data through industry-standard protocols, with no additional infrastructure to manage.

Feature Icechunk Arraylake
EDR (Environmental Data Retrieval) OGC-compliant
Map Tiles API OGC Tiles
WMS (Web Map Service) OGC v1.3.0 + ncWMS extensions
OPeNDAP / DAP2

Operations & Monitoring#

Arraylake automates routine maintenance and gives visibility into repository health.

Feature Icechunk Arraylake
Garbage collection & data expiration You run it Scheduled, runs on managed compute
Monitoring & metrics dashboards Repo-level and org-level
Webhooks & Slack notifications Commit events
Performance tuning Manual configuration arraylake repo tune benchmarking

Support & Pricing#

Feature Icechunk Arraylake
Pricing Free forever (Apache 2.0) Free tier (read-only) + Professional tier
Support Community (GitHub, Slack) Priority support

When to Use Which#

Use Icechunk on its own if you are comfortable managing your own cloud infrastructure, don't need a web UI or access control beyond cloud IAM, and want full control with no additional cost or vendor dependency, and only want to access free and open data sources.

Use Arraylake if you need team collaboration with role-based access, want a web UI for managing repositories, need to serve data via standard protocols (OGC, OPeNDAP), or want managed operations like garbage collection, credential vending, and monitoring, or if you want access to paid datasets on Arraylake.

No Lock-in#

Arraylake stores your data in Icechunk format in your own object storage (bring your own bucket), following the open Icechunk Format Specification. If you discontinue your Arraylake subscription, you can still read and write all of your data using Icechunk.