Home / understanding / expiration
Expiring Data#
Over time, an Icechunk Repository will accumulate many snapshots, not all of which need to be kept around.
"Expiration" allows you to mark snapshots as expired, and "garbage collection" deletes all data (manifests, chunks, snapshots, etc.) associated with expired snapshots.
First create a Repository, configured so that there are no "inline" chunks. This will help illustrate that data is actually deleted.
import icechunk as ic
repo = ic.Repository.create(
ic.in_memory_storage(),
config=ic.config.RepositoryConfig(inline_chunk_threshold_bytes=0),
)
Generate a few snapshots#
Let us generate a sequence of snapshots
import zarr
import time
for i in range(10):
session = repo.writable_session("main")
array = zarr.create_array(
session.store, name="array", shape=(10,), fill_value=-1, dtype=int, overwrite=True
)
array[:] = i
session.commit(f"snap {i}")
time.sleep(0.1)
There are 10 snapshots
ancestry = list(repo.ancestry(branch="main"))
print("\n\n".join([str((a.id, a.written_at)) for a in ancestry]))
('17GFW36NG2PAMYRJ94ZG', datetime.datetime(2026, 6, 17, 20, 11, 41, 854796, tzinfo=datetime.timezone.utc))
('DVQ722YQSVH4R6ZHESF0', datetime.datetime(2026, 6, 17, 20, 11, 41, 748460, tzinfo=datetime.timezone.utc))
('VJ89VHSCME0CKR4707G0', datetime.datetime(2026, 6, 17, 20, 11, 41, 642255, tzinfo=datetime.timezone.utc))
('D8YYMFME8C712Q4NQWPG', datetime.datetime(2026, 6, 17, 20, 11, 41, 535861, tzinfo=datetime.timezone.utc))
('WNXGSM3TDKTJHR0937V0', datetime.datetime(2026, 6, 17, 20, 11, 41, 429571, tzinfo=datetime.timezone.utc))
('A734MRNMXANCKSVHCM7G', datetime.datetime(2026, 6, 17, 20, 11, 41, 323420, tzinfo=datetime.timezone.utc))
('39CBK8BNDHS585KYSJN0', datetime.datetime(2026, 6, 17, 20, 11, 41, 216890, tzinfo=datetime.timezone.utc))
('2VAFZES16T9DW104RSFG', datetime.datetime(2026, 6, 17, 20, 11, 41, 110607, tzinfo=datetime.timezone.utc))
('QDGC5GDK8Q941P30N2M0', datetime.datetime(2026, 6, 17, 20, 11, 41, 4474, tzinfo=datetime.timezone.utc))
('WS935Z7CH05QPAXV7690', datetime.datetime(2026, 6, 17, 20, 11, 40, 896839, tzinfo=datetime.timezone.utc))
('1CECHNKREP0F1RSTCMT0', datetime.datetime(2026, 6, 17, 20, 11, 40, 884950, tzinfo=datetime.timezone.utc))
Expire snapshots#
Danger
Expiring snapshots is an irreversible operation. Use it with care.
First we must expire snapshots. Here we will expire any snapshot older than the 5th one.
2026-06-17 20:11:41.323420+00:00
{'QDGC5GDK8Q941P30N2M0', 'WS935Z7CH05QPAXV7690', '2VAFZES16T9DW104RSFG', '39CBK8BNDHS585KYSJN0'}
This prints out the set of snapshots that were expired.
Note
The first snapshot is never expired!
The cutoff is exclusive
older_than is an exclusive bound: a snapshot is expired only if its written_at is strictly earlier than the cutoff. A snapshot whose written_at equals the cutoff is kept. The same holds for garbage_collect: an object is deleted only if it was created strictly before the cutoff and no surviving snapshot references it. This means you can pass a snapshot's own written_at as the cutoff to expire everything older than it while keeping that snapshot itself.
Confirm that these are the right snapshots (remember that ancestry list commits in decreasing order of written_at time):
['39CBK8BNDHS585KYSJN0', '2VAFZES16T9DW104RSFG', 'QDGC5GDK8Q941P30N2M0', 'WS935Z7CH05QPAXV7690']
Note that ancestry is now shorter:
new_ancestry = list(repo.ancestry(branch="main"))
print("\n\n".join([str((a.id, a.written_at)) for a in new_ancestry]))
('17GFW36NG2PAMYRJ94ZG', datetime.datetime(2026, 6, 17, 20, 11, 41, 854796, tzinfo=datetime.timezone.utc))
('DVQ722YQSVH4R6ZHESF0', datetime.datetime(2026, 6, 17, 20, 11, 41, 748460, tzinfo=datetime.timezone.utc))
('VJ89VHSCME0CKR4707G0', datetime.datetime(2026, 6, 17, 20, 11, 41, 642255, tzinfo=datetime.timezone.utc))
('D8YYMFME8C712Q4NQWPG', datetime.datetime(2026, 6, 17, 20, 11, 41, 535861, tzinfo=datetime.timezone.utc))
('WNXGSM3TDKTJHR0937V0', datetime.datetime(2026, 6, 17, 20, 11, 41, 429571, tzinfo=datetime.timezone.utc))
('A734MRNMXANCKSVHCM7G', datetime.datetime(2026, 6, 17, 20, 11, 41, 323420, tzinfo=datetime.timezone.utc))
('1CECHNKREP0F1RSTCMT0', datetime.datetime(2026, 6, 17, 20, 11, 40, 884950, tzinfo=datetime.timezone.utc))
Delete expired data#
Danger
Garbage collection is an irreversible operation that deletes data. Use it with care.
Use Repository.garbage_collect to delete data associated with expired snapshots