Skip to content

Home / understanding / version-control

Transactions and Version Control#

Icechunk carries over concepts from other version control software (e.g. Git) to multidimensional arrays. Doing so helps ease the burden of managing multiple versions of your data, and helps you be precise about which version of your dataset is being used for downstream purposes.

Core concepts of Icechunk's version control system are:

  • A snapshot bundles together related data and metadata changes in a single "transaction".
  • A branch points to the latest snapshot in a series of snapshots. Multiple branches can co-exist at a given time, and multiple users can add snapshots to a single branch. One common pattern is to use dev, stage, and prod branches to separate versions of a dataset.
  • A tag is an immutable reference to a snapshot, usually used to represent an "important" version of the dataset such as a release.

Snapshots, branches, and tags all refer to specific versions of your dataset. You can time-travel/navigate back to any version of your data as referenced by a snapshot, a branch, or a tag using a snapshot ID, a branch name, or a tag name when creating a new Session.

Setup#

To get started, we can create a new Repository.

Note

This example uses an in-memory storage backend, but you can also use any other storage backend instead.

import icechunk as ic

repo = ic.Repository.create(ic.in_memory_storage())

On creating a new Repository, it will automatically create a main branch with an initial snapshot. We can take a look at the ancestry of the main branch to confirm this.

for ancestor in repo.ancestry(branch="main"):
    print(ancestor)

Note

The ancestry method can be used to inspect the ancestry of any branch, snapshot, or tag.

We get back an iterator of SnapshotInfo objects, which contain information about the snapshot, including its ID, the ID of its parent snapshot, and the time it was written.

Creating a snapshot#

Now that we have a Repository with a main branch, we can modify the data in the repository and create a new snapshot. First we need to create a writable Session from the main branch.

Note

Writable Session objects are required to create new snapshots, and can only be created from the tip of a branch. Checking out tags or other snapshots is read-only.

session = repo.writable_session("main")

We can now access the zarr.Store from the Session and create a new root group. Then we can modify the attributes of the root group and create a new snapshot.

import zarr

root = zarr.create_group(session.store)
root.attrs["foo"] = "bar"
print(session.commit(message="Add foo attribute to root group"))
FF7S4SK6PYZJGBG7GP60

Success! We've created a new snapshot with a new attribute on the root group.

Once we've committed the snapshot, the Session will become read-only, and we can no longer modify the data using our existing Session. If we want to modify the data again, we need to create a new writable Session from the branch. Notice that we don't have to refresh the Repository to get the updates from the main branch. Instead, the Repository will automatically fetch the latest snapshot from the branch when we create a new writable Session from it.

session = repo.writable_session("main")
root = zarr.open_group(session.store)
root.attrs["foo"] = "baz"
print(session.commit(message="Update foo attribute on root group"))
NQ0HP5SFBE2MQP1RC480

With a few snapshots committed, we can take a look at the ancestry of the main branch:

for snapshot in repo.ancestry(branch="main"):
    print(snapshot)
<icechunk.snapshots.SnapshotInfo>
id: NQ0HP5SFBE2MQP1RC480
parent_id: FF7S4SK6PYZJGBG7GP60
written_at: datetime.datetime(2026,4,29,22,19,38,161831, tzinfo=datetime.timezone.utc)
message: Update foo attribute on root group
metadata: PySnapshotProperties({})

<icechunk.snapshots.SnapshotInfo>
id: FF7S4SK6PYZJGBG7GP60
parent_id: 1CECHNKREP0F1RSTCMT0
written_at: datetime.datetime(2026,4,29,22,19,38,150655, tzinfo=datetime.timezone.utc)
message: Add foo attribute to root group
metadata: PySnapshotProperties({})

<icechunk.snapshots.SnapshotInfo>
id: 1CECHNKREP0F1RSTCMT0
parent_id: None
written_at: datetime.datetime(2026,4,29,22,19,38,124802, tzinfo=datetime.timezone.utc)
message: Repository initialized
metadata: PySnapshotProperties({"__icechunk": JsonValue(Object {"is_root": Bool(true)})})

Visually, this looks like:

print(repo.ancestry_graph(branch="main", plain=True))
● NQ0HP5SF (main) Update foo attribute on root group
● FF7S4SK6 Add foo attribute to root group
● 1CECHNKR Repository initialized

Empty Snapshots#

Set the allow_empty kwarg to create an "empty" snapshot --- one with no changes, just metadata and a message.

session = repo.writable_session("main")
snap = session.commit(
    "added an empty commit", metadata={"moo": "zoo"}, allow_empty=True
)
print(repo.lookup_snapshot(snap))
<icechunk.snapshots.SnapshotInfo>
id: Q536777YCBYZV2Z3PM1G
parent_id: NQ0HP5SFBE2MQP1RC480
written_at: datetime.datetime(2026,4,29,22,19,38,172288, tzinfo=datetime.timezone.utc)
message: added an empty commit
metadata: PySnapshotProperties({"moo": JsonValue(String("zoo"))})

Amending a snapshot#

Amending a snapshot is allowed:

session = repo.writable_session("main")
root = zarr.open_group(session.store)
root.attrs["foo"] = "quux"
print(session.amend("amended commit"))
WCT2X75KR4Q2PX1QTVQG

which edits the history to be

print(repo.ancestry_graph(branch="main", plain=True))
● WCT2X75K (main) amended commit
● NQ0HP5SF Update foo attribute on root group
● FF7S4SK6 Add foo attribute to root group
● 1CECHNKR Repository initialized

Note that the snapshot ID has now changed.

Set the allow_empty kwarg to edit just the message

session = repo.writable_session("main")
session.amend("i have edited this message", allow_empty=True)
for snapshot in repo.ancestry(branch="main"):
    print(snapshot)
<icechunk.snapshots.SnapshotInfo>
id: F6XJJB03B0WWBJGZNHVG
parent_id: NQ0HP5SFBE2MQP1RC480
written_at: datetime.datetime(2026,4,29,22,19,38,186154, tzinfo=datetime.timezone.utc)
message: i have edited this message
metadata: PySnapshotProperties({})

<icechunk.snapshots.SnapshotInfo>
id: NQ0HP5SFBE2MQP1RC480
parent_id: FF7S4SK6PYZJGBG7GP60
written_at: datetime.datetime(2026,4,29,22,19,38,161831, tzinfo=datetime.timezone.utc)
message: Update foo attribute on root group
metadata: PySnapshotProperties({})

<icechunk.snapshots.SnapshotInfo>
id: FF7S4SK6PYZJGBG7GP60
parent_id: 1CECHNKREP0F1RSTCMT0
written_at: datetime.datetime(2026,4,29,22,19,38,150655, tzinfo=datetime.timezone.utc)
message: Add foo attribute to root group
metadata: PySnapshotProperties({})

<icechunk.snapshots.SnapshotInfo>
id: 1CECHNKREP0F1RSTCMT0
parent_id: None
written_at: datetime.datetime(2026,4,29,22,19,38,124802, tzinfo=datetime.timezone.utc)
message: Repository initialized
metadata: PySnapshotProperties({"__icechunk": JsonValue(Object {"is_root": Bool(true)})})

Transaction Context Manager#

To simplify the process of updating a repo, Icechunk provides a transaction context manager which yields an IcechunkStore object directly:

with repo.transaction("main", message="updated from context manager") as store:
    root = zarr.open_group(store)
    root.attrs["foo"] = "qux"

The context manager creates as Session in the background and automatically commits it (provided there are no errors within the context).

Time Travel#

Now that we've created a few snapshots, we can time-travel back to the previous snapshot using the snapshot ID.

Note

It's important to note that because the zarr Store is read-only, we need to pass mode="r" to the zarr.open_group function.

session = repo.readonly_session(snapshot_id=list(repo.ancestry(branch="main"))[1].id)
root = zarr.open_group(session.store, mode="r")
print(root.attrs["foo"])
quux

Branches#

Creating Branches#

If we want to modify the data from a previous snapshot, we can create a new branch from that snapshot with create_branch.

main_branch_snapshot_id = repo.lookup_branch("main")
repo.create_branch("dev", snapshot_id=main_branch_snapshot_id)

We can now create a new writable Session from the dev branch and modify the data.

session = repo.writable_session("dev")
root = zarr.open_group(session.store)
root.attrs["foo"] = "balogna"
print(session.commit(message="Update foo attribute on root group"))
Y5ZHTX71K4S0XJN80S7G

We can also create a new branch from the tip of the main branch if we want to modify our current working branch without modifying the main branch.

repo.create_branch("feature", snapshot_id=main_branch_snapshot_id)

session = repo.writable_session("feature")
root = zarr.open_group(session.store)
root.attrs["foo"] = "cherry"
print(session.commit(message="Update foo attribute on root group"))
W64Z9V0QMBFC22V9QM2G

With these branches created, the hierarchy of the repository now looks like:

print(repo.ancestry_graph(plain=True))
    ● W64Z9V0Q (feature) Update foo attribute on root group
  ● │ Y5ZHTX71 (dev) Update foo attribute on root group
  │ │ 
●─╯─╯ WWDHWM9Z (main) updated from context manager
●     F6XJJB03 i have edited this message
●     NQ0HP5SF Update foo attribute on root group
●     FF7S4SK6 Add foo attribute to root group
●     1CECHNKR Repository initialized

Listing and Looking Up Branches#

We can list all branches in the repository.

print(repo.list_branches())
{'dev', 'feature', 'main'}

If we need to find the snapshot that a branch is based on, we can use the lookup_branch method.

print(repo.lookup_branch("feature"))
W64Z9V0QMBFC22V9QM2G

Deleting and Resetting Branches#

We can delete a branch with delete_branch.

repo.delete_branch("feature")

We can also reset a branch to a previous snapshot with reset_branch. This immediately modifies the branch tip to the specified snapshot, changing the history of the branch.

repo.reset_branch("dev", snapshot_id=main_branch_snapshot_id)

Creating Anonymous Snapshots#

Sometimes you want to save your work without committing to any branch. The flush method creates a new snapshot from the session's changes but does not update any branch pointer. The resulting snapshot is "anonymous" — it exists in the store but no branch or tag points to it.

This can be useful for:

  • Exploratory work — saving intermediate results without cluttering a branch's history.
  • Deferred decisions — creating several candidate snapshots and deciding later which one to keep.
session = repo.writable_session("main")
root = zarr.open_group(session.store)
root.attrs["experiment"] = "trial-1"

snapshot_id = session.flush(message="Exploratory change")
print(snapshot_id)
EJKWAJY7NTY1H4Y2X0Z0

After a flush the session becomes read-only, just like after a commit. The returned snapshot ID can be used later to attach the snapshot to a branch with reset_branch, or simply kept as a reference for time-travel.

# Adopt the flushed snapshot on the dev branch
repo.reset_branch("dev", snapshot_id=snapshot_id)

Note

Unlike commit, flush does not support rebasing or amending. It is a simple, one-step save operation.

Tip

The pattern of flush followed by reset_branch effectively gives you temporary branch semantics — you can do exploratory work without committing to a branch, then adopt the result onto a branch only if you're happy with it.

Tags#

Tags are immutable references to a snapshot. They are created with create_tag.

For example to tag the second commit in main's history:

repo.create_tag("v1.0.0", snapshot_id=list(repo.ancestry(branch="main"))[1].id)

Because tags are immutable, we need to use a readonly Session to access the data referenced by a tag.

session = repo.readonly_session(tag="v1.0.0")
root = zarr.open_group(session.store, mode="r")
print(root.attrs["foo"])
quux
print(repo.ancestry_graph(branch="main", plain=True))
● WWDHWM9Z (main) updated from context manager
● F6XJJB03 (v1.0.0) i have edited this message
● NQ0HP5SF Update foo attribute on root group
● FF7S4SK6 Add foo attribute to root group
● 1CECHNKR Repository initialized

We can also list all tags in the repository.

print(repo.list_tags())
{'v1.0.0'}

and we can look up the snapshot that a tag is based on with lookup_tag.

print(repo.lookup_tag("v1.0.0"))
F6XJJB03B0WWBJGZNHVG

And then finally delete a tag with delete_tag.

Note

Tags are immutable and once a tag is deleted, it can never be recreated.

repo.delete_tag("v1.0.0")

Conflict Resolution#

Icechunk is a serverless distributed system, and as such, it is possible to have multiple users or processes modifying the same data at the same time. Icechunk relies on the consistency guarantees of the underlying storage backends to ensure that the data is always consistent. In situations where two users or processes attempt to modify the same data at the same time, Icechunk will detect the conflict and raise an exception at commit time. This can be illustrated with the following example.

Let's create a fresh repository, add some attributes to the root group and create an array named data.

import icechunk as ic
import numpy as np
import zarr

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
root = zarr.create_group(session.store)
root.attrs["foo"] = "bar"
root.create_dataset("data", shape=(10, 10), chunks=(1, 1), dtype=np.int32)
print(session.commit(message="Add foo attribute and data array"))
DBD1F0C8WTRWVMFERQCG

Lets try to modify the data array in two different sessions, created from the main branch.

session1 = repo.writable_session("main")
session2 = repo.writable_session("main")

root1 = zarr.group(session1.store)
root2 = zarr.group(session2.store)

root1["data"][0,0] = 1
root2["data"][0,:] = 2

and then try to commit the changes.

print(session1.commit(message="Update first element of data array"))
print(session2.commit(message="Update first row of data array"))

# AE9XS2ZWXT861KD2JGHG
# ---------------------------------------------------------------------------
# ConflictError                             Traceback (most recent call last)
# Cell In[7], line 11
#      8 root2.attrs["foo"] = "baz"
#      10 print(session1.commit(message="Update foo attribute on root group"))
# ---> 11 print(session2.commit(message="Update foo attribute on root group"))

# File ~/Developer/icechunk/icechunk-python/python/icechunk/session.py:224, in Session.commit(self, message, metadata)
#     222     return self._session.commit(message, metadata)
#     223 except PyConflictError as e:
# --> 224     raise ConflictError(e) from None

# ConflictError: Failed to commit, expected parent: Some("BG0W943WSNFMMVD1FXJ0"), actual parent: Some("AE9XS2ZWXT861KD2JGHG")

The first session was able to commit successfully, but the second session failed with a ConflictError. When the second session was created, the changes made were relative to the tip of the main branch, but the tip of the main branch had been modified by the first session.

To resolve this conflict, we can use the rebase functionality.

Rebasing#

To update the second session so it is based off the tip of the main branch, we can use the rebase method.

First, we can try to rebase, without merging any conflicting changes:

session2.rebase(rebase_with=ic.conflicts.ConflictDetector())

# ---------------------------------------------------------------------------
# RebaseFailedError                         Traceback (most recent call last)
# Cell In[8], line 1
# ----> 1 session2.rebase(icechunk.ConflictDetector())

# File ~/Developer/icechunk/icechunk-python/python/icechunk/session.py:247, in Session.rebase(self, solver)
#     245     self._session.rebase(solver)
#     246 except PyRebaseFailedError as e:
# --> 247     raise RebaseFailedError(e) from None

# RebaseFailedError: Rebase failed on snapshot AE9XS2ZWXT861KD2JGHG: 1 conflicts found

This however fails because both sessions modified metadata. We can use the RebaseFailedError to get more information about the conflict.

try:
    session1.rebase(ic.conflicts.ConflictDetector())
except ic.RebaseFailedError as e:
    for conflict in e.conflicts:
        print(f"Conflict at {conflict.path}: {conflict.conflicted_chunks}")

# Conflict at /data: [[0, 0]]

We get a clear indication of the conflict, and the chunks that are conflicting. In this case we have decided that the first session's changes are correct, so we can again use the BasicConflictSolver to resolve the conflict.

session1.rebase(ic.conflicts.BasicConflictSolver(on_chunk_conflict=ic.conflicts.VersionSelection.UseOurs))
session1.commit(message="Update first element of data array")

# 'R4WXW2CYNAZTQ3HXTNK0'

Success! We have now resolved the conflict and committed the changes.

Let's look at the value of the data array to confirm that the conflict was resolved correctly.

session = repo.readonly_session("main")
root = zarr.open_group(session.store, mode="r")
root["data"][0,:]

# array([1, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

As you can see, readonly_session accepts a string for a branch name, or you can also write:

session = repo.readonly_session(branch="main")

Lastly, if you make changes to non-conflicting chunks or attributes, you can rebase without having to resolve any conflicts. This time we will show how to use rebase automatically during the commit call:

session1 = repo.writable_session("main")
session2 = repo.writable_session("main")

root1 = zarr.group(session1.store)
root2 = zarr.group(session2.store)

root1["data"][3,:] = 3
root2["data"][4,:] = 4

session1.commit(message="Update fourth row of data array")
session2.commit(message="Update fifth row of data array", rebase_with=ic.conflicts.ConflictDetector())
print("Rebase+commit succeeded")

And now we can see the data in the data array to confirm that the changes were committed correctly.

session = repo.readonly_session(branch="main")
root = zarr.open_group(session.store, mode="r")
root["data"][:,:]

# array([[1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
#        [4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)

Limitations#

At the moment, the rebase functionality is limited to resolving conflicts with chunks in arrays. Other types of conflicts are not able to be resolved by icechunk yet and must be resolved manually.