Home / icechunk-python / quickstart
Quickstart
Icechunk is designed to be mostly in the background. As a Python user, you'll mostly be interacting with Zarr. If you're not familiar with Zarr, you may want to start with the Zarr Tutorial
Installation
Install Icechunk with pip
Note
Icechunk is currently designed to support the Zarr V3 Specification. Using it today requires installing the latest pre-release of Zarr Python 3.
Create a new Icechunk repository
To get started, let's create a new Icechunk repository. We recommend creating your repo on S3 to get the most out of Icechunk's cloud-native design. However, you can also create a repo on your local filesystem.
Accessing the Icechunk store
Once the repository is created, we can use Session
s to read and write data. Since there is no data in the repository yet, let's create a writable session on the default main
branch.
Now that we have a session, we can access the IcechunkStore
from it to interact with the underlying data using zarr
:
Write some data and commit
We can now use our Icechunk store
with Zarr. Let's first create a group and an array within it.
Now let's write some data
Now let's commit our update using the session
🎉 Congratulations! You just made your first Icechunk snapshot.
Note
Once a writable Session
has been successfully committed to, it becomes read only to ensure that all writing is done explicitly.
Make a second commit
At this point, we have already committed using our session, so we need to get a new session and store to make more changes.
session_2 = repo.writable_session("main")
store_2 = session_2.store()
group = zarr.open_group(store_2)
array = group["my_array"]
Let's now put some new data into our array, overwriting the first five elements.
...and commit the changes
Explore version history
We can see the full version history of our repo:
hist = repo.ancestry(snapshot_id_2)
for anc in hist:
print(anc.id, anc.message, anc.written_at)
# Output:
# AHC3TSP5ERXKTM4FCB5G overwrite some values 2024-10-14 14:07:27.328429+00:00
# Q492CAPV7SF3T1BC0AA0 first commit 2024-10-14 14:07:26.152193+00:00
# T7SMDT9C5DZ8MP83DNM0 Repository initialized 2024-10-14 14:07:22.338529+00:00
...and we can go back in time to the earlier version.
# latest version
assert array[0] == 2
# check out earlier snapshot
earlier_session = repo.readonly_session(snapshot_id=hist[1].id)
store = earlier_session.store()
# get the array
group = zarr.open_group(store)
array = group["my_array]
# verify data matches first version
assert array[0] == 1
That's it! You now know how to use Icechunk! For your next step, dig deeper into configuration, explore the version control system, or learn how to use Icechunk with Xarray.