Home / icechunk-python / quickstart
Quickstart#
Icechunk is designed to be mostly in the background. As a Python user, you'll mostly be interacting with Zarr. If you're not familiar with Zarr, you may want to start with the Zarr Tutorial
Installation#
Icechunk can be installed using pip or conda:
Note
Icechunk is currently designed to support the Zarr V3 Specification. Using it today requires installing Zarr Python 3.
Create a new Icechunk repository#
To get started, let's create a new Icechunk repository. We recommend creating your repo on a cloud storage platform to get the most out of Icechunk's cloud-native design. However, you can also create a repo on your local filesystem.
Accessing the Icechunk store#
Once the repository is created, we can use Session
s to read and write data. Since there is no data in the repository yet, let's create a writable session on the default main
branch.
Now that we have a session, we can access the IcechunkStore
from it to interact with the underlying data using zarr
:
Write some data and commit#
We can now use our Icechunk store
with Zarr. Let's first create a group and an array within it.
import zarr
group = zarr.group(store)
array = group.create("my_array", shape=10, dtype='int32', chunks=(5,))
Now let's write some data
Now let's commit our update using the session
🎉 Congratulations! You just made your first Icechunk snapshot.
Note
Once a writable Session
has been successfully committed to, it becomes read only to ensure that all writing is done explicitly.
Make a second commit#
At this point, we have already committed using our session, so we need to get a new session and store to make more changes.
session_2 = repo.writable_session("main")
store_2 = session_2.store
group = zarr.open_group(store_2)
array = group["my_array"]
Let's now put some new data into our array, overwriting the first five elements.
...and commit the changes
Explore version history#
We can see the full version history of our repo:
hist = repo.ancestry(snapshot_id=snapshot_id_2)
for ancestor in hist:
print(ancestor.id, ancestor.message, ancestor.written_at)
...and we can go back in time to the earlier version.
# latest version
assert array[0] == 2
# check out earlier snapshot
earlier_session = repo.readonly_session(snapshot_id=snapshot_id_1)
store = earlier_session.store
# get the array
group = zarr.open_group(store, mode="r")
array = group["my_array"]
# verify data matches first version
assert array[0] == 1
That's it! You now know how to use Icechunk! For your next step, dig deeper into configuration, explore the version control system, or learn how to use Icechunk with Xarray.