Skip to content

Home / sample-datasets

Sample Datasets#

This page contains links to various example Icechunk datasets, all of which are open-access and hosted in anonymous-access buckets, so you can try reading them immediately!

All examples only require icechunk and xarray as dependencies.

Earthmover-hosted examples#

Note

These repositories have been upgraded to Icechunk v2, but remain openable with Icechunk v1 as well. Upgrading is a metadata-only change — the repositories contain metadata files for both versions, and the underlying chunks are the same.

Weatherbench2 ERA5 (native, Icechunk v2)#

A subset of the Weatherbench2 copy of the ERA5 reanalysis dataset.

import icechunk as ic
import xarray as xr

storage = ic.s3_storage(
    bucket="icechunk-public-data",
    prefix="v1/era5_weatherbench2",
    region="us-east-1",
    anonymous=True,
)

repo = ic.Repository.open(storage=storage)
session = repo.readonly_session("main")
ds = xr.open_dataset(
    session.store, group="1x721x1440", engine="zarr", chunks=None, consolidated=False
)
import icechunk as ic
import xarray as xr

storage = ic.r2_storage(
    prefix="v1/era5_weatherbench2",
    endpoint_url="https://data.icechunk.cloud",
    anonymous=True,
)

repo = ic.Repository.open(storage=storage)
session = repo.readonly_session("main")
ds = xr.open_dataset(
    session.store, group="1x721x1440", engine="zarr", chunks=None, consolidated=False
)

GLAD Land Cover Land Use (native, Icechunk v2)#

A copy of the GLAD Land Cover Land Use dataset distributed under a Creative Commons Attribution 4.0 International License.

See source.

import icechunk as ic
import xarray as xr

storage = ic.s3_storage(
    bucket="icechunk-public-data",
    prefix=f"v1/glad",
    region="us-east-1",
    anonymous=True,
)
repo = ic.Repository.open(storage=storage)
session = repo.readonly_session("main")
ds = xr.open_dataset(
    session.store, chunks=None, consolidated=False, engine="zarr"
)

3rd-party examples#

NOAA GFS archive (native, Icechunk v1)#

A copy of the NOAA GFS analysis dataset distributed under a Creative Commons Attribution 4.0 International License.

Provided by dynamical.org, see source.

import icechunk as ic
import xarray as xr

storage = ic.s3_storage(
    bucket="dynamical-noaa-gfs",
    prefix="noaa-gfs-analysis/v0.1.0.icechunk",
    region="us-west-2",
    anonymous=True,
)
repo = ic.Repository.open(storage=storage)
session = repo.readonly_session("main")
ds = xr.open_zarr(session.store, chunks=None)

NASA RASI (virtual, Icechunk v1)#

A copy of the NASA RASI dataset distributed under a Creative Commons Attribution 4.0 International License.

Provided by Development Seed, see https://github.com/virtual-zarr/rasi-icechunk.

import icechunk as ic
import xarray as xr

storage = ic.s3_storage(
    bucket='nasa-waterinsight',
    prefix="virtual-zarr-store/icechunk/RASI/HISTORICAL", #replace HISTORICAL with SSP245/SSP585 for future scenarios
    anonymous=True,
    region="us-west-2",
)

chunk_url = "s3://nasa-waterinsight/RASI/"
virtual_credentials = ic.credentials.containers_credentials({
    chunk_url: ic.credentials.s3_anonymous_credentials()
})

repo = ic.Repository.open(
    storage=storage,
    authorize_virtual_chunk_access=virtual_credentials,
)

session = repo.readonly_session('main')
ds = xr.open_zarr(session.store, chunks=None)