Skip to content

Home / understanding / expiration

Expiring Data#

Over time, an Icechunk Repository will accumulate many snapshots, not all of which need to be kept around.

"Expiration" allows you to mark snapshots as expired, and "garbage collection" deletes all data (manifests, chunks, snapshots, etc.) associated with expired snapshots.

First create a Repository, configured so that there are no "inline" chunks. This will help illustrate that data is actually deleted.

import icechunk as ic

repo = ic.Repository.create(
    ic.in_memory_storage(),
    config=ic.config.RepositoryConfig(inline_chunk_threshold_bytes=0),
)

Generate a few snapshots#

Let us generate a sequence of snapshots

import zarr
import time

for i in range(10):
    session = repo.writable_session("main")
    array = zarr.create_array(
        session.store, name="array", shape=(10,), fill_value=-1, dtype=int, overwrite=True
    )
    array[:] = i
    session.commit(f"snap {i}")
    time.sleep(0.1)

There are 10 snapshots

ancestry = list(repo.ancestry(branch="main"))
print("\n\n".join([str((a.id, a.written_at)) for a in ancestry]))

('17GFW36NG2PAMYRJ94ZG', datetime.datetime(2026, 6, 17, 20, 11, 41, 854796, tzinfo=datetime.timezone.utc))

('DVQ722YQSVH4R6ZHESF0', datetime.datetime(2026, 6, 17, 20, 11, 41, 748460, tzinfo=datetime.timezone.utc))

('VJ89VHSCME0CKR4707G0', datetime.datetime(2026, 6, 17, 20, 11, 41, 642255, tzinfo=datetime.timezone.utc))

('D8YYMFME8C712Q4NQWPG', datetime.datetime(2026, 6, 17, 20, 11, 41, 535861, tzinfo=datetime.timezone.utc))

('WNXGSM3TDKTJHR0937V0', datetime.datetime(2026, 6, 17, 20, 11, 41, 429571, tzinfo=datetime.timezone.utc))

('A734MRNMXANCKSVHCM7G', datetime.datetime(2026, 6, 17, 20, 11, 41, 323420, tzinfo=datetime.timezone.utc))

('39CBK8BNDHS585KYSJN0', datetime.datetime(2026, 6, 17, 20, 11, 41, 216890, tzinfo=datetime.timezone.utc))

('2VAFZES16T9DW104RSFG', datetime.datetime(2026, 6, 17, 20, 11, 41, 110607, tzinfo=datetime.timezone.utc))

('QDGC5GDK8Q941P30N2M0', datetime.datetime(2026, 6, 17, 20, 11, 41, 4474, tzinfo=datetime.timezone.utc))

('WS935Z7CH05QPAXV7690', datetime.datetime(2026, 6, 17, 20, 11, 40, 896839, tzinfo=datetime.timezone.utc))

('1CECHNKREP0F1RSTCMT0', datetime.datetime(2026, 6, 17, 20, 11, 40, 884950, tzinfo=datetime.timezone.utc))

Expire snapshots#

Danger

Expiring snapshots is an irreversible operation. Use it with care.

First we must expire snapshots. Here we will expire any snapshot older than the 5th one.

expiry_time = ancestry[5].written_at
print(expiry_time)

2026-06-17 20:11:41.323420+00:00

expired = repo.expire_snapshots(older_than=expiry_time)
print(expired)

{'QDGC5GDK8Q941P30N2M0', 'WS935Z7CH05QPAXV7690', '2VAFZES16T9DW104RSFG', '39CBK8BNDHS585KYSJN0'}

This prints out the set of snapshots that were expired.

Note

The first snapshot is never expired!

The cutoff is exclusive

older_than is an exclusive bound: a snapshot is expired only if its written_at is strictly earlier than the cutoff. A snapshot whose written_at equals the cutoff is kept. The same holds for garbage_collect: an object is deleted only if it was created strictly before the cutoff and no surviving snapshot references it. This means you can pass a snapshot's own written_at as the cutoff to expire everything older than it while keeping that snapshot itself.

Confirm that these are the right snapshots (remember that ancestry list commits in decreasing order of written_at time):

print([a.id for a in ancestry[-5:-1]])

['39CBK8BNDHS585KYSJN0', '2VAFZES16T9DW104RSFG', 'QDGC5GDK8Q941P30N2M0', 'WS935Z7CH05QPAXV7690']

Note that ancestry is now shorter:

new_ancestry = list(repo.ancestry(branch="main"))
print("\n\n".join([str((a.id, a.written_at)) for a in new_ancestry]))

('17GFW36NG2PAMYRJ94ZG', datetime.datetime(2026, 6, 17, 20, 11, 41, 854796, tzinfo=datetime.timezone.utc))

('DVQ722YQSVH4R6ZHESF0', datetime.datetime(2026, 6, 17, 20, 11, 41, 748460, tzinfo=datetime.timezone.utc))

('VJ89VHSCME0CKR4707G0', datetime.datetime(2026, 6, 17, 20, 11, 41, 642255, tzinfo=datetime.timezone.utc))

('D8YYMFME8C712Q4NQWPG', datetime.datetime(2026, 6, 17, 20, 11, 41, 535861, tzinfo=datetime.timezone.utc))

('WNXGSM3TDKTJHR0937V0', datetime.datetime(2026, 6, 17, 20, 11, 41, 429571, tzinfo=datetime.timezone.utc))

('A734MRNMXANCKSVHCM7G', datetime.datetime(2026, 6, 17, 20, 11, 41, 323420, tzinfo=datetime.timezone.utc))

('1CECHNKREP0F1RSTCMT0', datetime.datetime(2026, 6, 17, 20, 11, 40, 884950, tzinfo=datetime.timezone.utc))

Delete expired data#

Danger

Garbage collection is an irreversible operation that deletes data. Use it with care.

Use Repository.garbage_collect to delete data associated with expired snapshots

results = repo.garbage_collect(expiry_time)
print(results)

bytes_deleted: 3155 chunks_deleted: 4 manifests_deleted: 4 snapshots_deleted: 4 attributes_deleted: 0 transaction_logs_deleted: 0