Skip to content

Home / guides / moving-chunks

Moving Chunks#

With Zarr alone, reordering chunks means rewriting them—shifting a 1TB array requires reading every chunk and writing it to a new location. For cloud data, that's downloading terabytes, transforming locally, and re-uploading everything.

Icechunk adds a layer of indirection: chunks are addressed through a manifest, not by their position. Shifting chunks just updates the manifest—the actual data never moves. A 1TB shift completes in milliseconds, transferring only the small metadata update.

This enables rolling time windows—continuously updating datasets like forecasts or sensor streams where you discard old data and append new, without ever copying a single byte.

Moving Chunks vs Moving Nodes

This page covers moving chunks within an array (reordering data). To move or rename arrays and groups in the hierarchy, see Moving and Renaming Nodes.

Choosing an API#

Method Best For Flexibility
shift_array Uniform shifts Simple—just specify offset
reindex_array Custom transformations Maximum—you control every chunk

Offsets Are in Chunks, Not Elements#

Both methods work with chunk indices, not array indices. If your array has chunk_size=2, then an offset of (-1,) shifts by 1 chunk, which is 2 elements:

# With chunk_size=2:
shift_array("/arr", (-1,)) #  → shifts by 1 chunk = 2 elements
shift_array("/arr", (-2,)) # → shifts by 2 chunks = 4 elements

Why chunks instead of elements? Because these are metadata-only operations. Shifting by partial chunks would require splitting and rewriting chunk data.

shift_array#

The shift_array method moves all chunks by a fixed offset per dimension (negative to shift toward index 0, positive toward higher indices). Out-of-bounds chunks are discarded, and vacated positions are cleared (reset to fill value).

import numpy as np
import icechunk as ic
import zarr

np.set_printoptions(formatter={'int': lambda x: f'{x:3d}'})

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr",
    shape=(10,),
    chunks=(2,),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(10)
print("Before:", arr[:])

session.shift_array("/arr", (-2,))  # Shift left by 2 chunks
print("After: ", arr[:])
Before: [  0   1   2   3   4   5   6   7   8   9]
After:  [  4   5   6   7   8   9  -1  -1  -1  -1]

The chunks containing [0, 1, 2, 3] were discarded. The vacated end is reset to the fill value (-1).

Preserving Data with Resize#

Chunks that shift out of bounds are lost. To preserve all data when shifting, resize first to make room:

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr",
    shape=(10,),
    chunks=(2,),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(10)
print("Before:", arr[:])

arr.resize((14,))  # Add space for 2 more chunks
session.shift_array("/arr", (2,))
print("After: ", arr[:])
Before: [  0   1   2   3   4   5   6   7   8   9]
After:  [ -1  -1  -1  -1   0   1   2   3   4   5   6   7   8   9]

Multi-dimensional Arrays#

For N-dimensional arrays, provide an offset for each dimension:

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr2d",
    shape=(6, 4),
    chunks=(2, 2),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(24).reshape(6, 4)
print("Original 6x4 array:")
print(arr[:])

session.shift_array("/arr2d", (1, 0))  # Shift down 1 chunk
print("\nAfter shift (1, 0):")
print(arr[:])
Original 6x4 array:
[[  0   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]
 [ 12  13  14  15]
 [ 16  17  18  19]
 [ 20  21  22  23]]

After shift (1, 0):
[[ -1  -1  -1  -1]
 [ -1  -1  -1  -1]
 [  0   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]
 [ 12  13  14  15]]

Example: Rolling Time Window#

Imagine a sensor array storing the last 7 days of hourly readings—shape (168,) with one chunk per day (24,). Each day, you want to discard the oldest day and make room for new data:

arr = zarr.open_array(store=session.store, path="sensors/temperature")
chunk_offset = (-1,)

# Compute the element-space shift from the chunk offset and chunk shape
element_shift = tuple(o * c for o, c in zip(chunk_offset, arr.chunks))
# element_shift = (-24,) — the shift in element space

# Shift left by 1 chunk, discarding the oldest
session.shift_array("/sensors/temperature", chunk_offset)

# Write new day's data to the vacated region
arr[element_shift[0]:] = todays_readings

session.commit(f"Updated sensor data for {today}")

Computing the index shift in element space is straightforward: multiply each chunk offset by the corresponding chunk size. This tells you exactly where to write new data.

This pattern works identically whether your array is 1 KB or 1 PB, and whether it's on local disk or cloud object storage—the shift is always instant with zero data transfer.

reindex_array#

For transformations that shift_array can't express, reindex_array gives you complete control. You provide a forward function that maps each chunk's old position to its new position. Your function receives a chunk index (as a list) and returns a new index to move the chunk there, or None to skip it (leave it in place).

However, reindex_array only visits chunk positions that contain data — empty (fill value) positions are skipped. This means that if an empty chunk would shift into an occupied position, the occupied position retains stale data. To handle this, provide a backward function — the inverse of forward. For each existing chunk position, the backward function determines whether a real chunk should have moved there; if not, the position is cleared to the fill value. See Providing a Backward Function for an example.

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr",
    shape=(10,),
    chunks=(2,),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(10)
print("Before:", arr[:])

def shift_and_filter(idx):
    """Shift left by 2, discard chunks that would go negative."""
    new_idx = idx[0] - 2
    return None if new_idx < 0 else [new_idx]

session.reindex_array("/arr", forward=shift_and_filter)
print("After: ", arr[:])
Before: [  0   1   2   3   4   5   6   7   8   9]
After:  [  4   5   6   7   8   9   6   7   8   9]

When all chunks contain data, the forward function alone produces correct results. Source positions that are not also destinations retain stale data — provide a backward function to clear them.

Providing a Backward Function#

When your array has empty (fill value) chunks, a forward-only reindex can leave stale data behind. The backward function is the inverse of forward: given a chunk position, it returns the position that would have mapped there. This lets icechunk detect and clear positions that should now be empty.

Here's a concrete example. We create an array with a gap — chunk 0 has value 1, chunk 1 is empty (fill value -1), and chunk 2 has value 3:

# Forward only: the empty chunk at index 1 doesn't shift into index 2
repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store, path="arr", shape=(3,), chunks=(1,),
    dtype="i4", fill_value=-1,
)
arr[0] = 1
arr[2] = 3
session.commit("init")

session = repo.writable_session("main")
arr = zarr.open_array(session.store, path="arr")
print("Before:      ", arr[:])  # [ 1, -1,  3]

n_chunks = 3
offset = 1

def fwd(idx):
    new = idx[0] + offset
    return [new] if 0 <= new < n_chunks else None

session.reindex_array("/arr", forward=fwd)
print("Forward only:", arr[:])  # [ 1,  1,  3] — index 0 is stale!
Before:       [  1  -1   3]
Forward only: [  1   1   3]

Index 0 should be empty after the shift, but the empty chunk at index 1 was never visited, so nothing cleared it. Adding a backward function fixes this:

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store, path="arr", shape=(3,), chunks=(1,),
    dtype="i4", fill_value=-1,
)
arr[0] = 1
arr[2] = 3
session.commit("init")

session = repo.writable_session("main")
arr = zarr.open_array(session.store, path="arr")
print("Before:       ", arr[:])  # [ 1, -1,  3]

n_chunks = 3
offset = 1

def fwd(idx):
    new = idx[0] + offset
    return [new] if 0 <= new < n_chunks else None

def bwd(idx):
    new = idx[0] - offset
    return [new] if 0 <= new < n_chunks else None

session.reindex_array("/arr", forward=fwd, backward=bwd)
print("With backward:", arr[:])  # [-1,  1,  3] — index 0 correctly cleared
Before:        [  1  -1   3]
With backward: [ -1   1  -1]

Tip

shift_array always provides both forward and backward functions internally, so it handles empty chunks correctly without any extra work.

Custom Transformations#

With reindex_array, you can implement any chunk permutation:

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr",
    shape=(10,),
    chunks=(2,),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(10)
print("Before:", arr[:])

def reverse_chunks(idx):
    """Reverse the order of all chunks."""
    return [4 - idx[0]]  # 0↔4, 1↔3, 2 stays

session.reindex_array("/arr", forward=reverse_chunks)
print("After: ", arr[:])
Before: [  0   1   2   3   4   5   6   7   8   9]
After:  [  8   9   6   7   4   5   2   3   0   1]

Multi-dimensional Example#

repo = ic.Repository.create(ic.in_memory_storage())
session = repo.writable_session("main")
arr = zarr.create(
    store=session.store,
    path="arr2d",
    shape=(4, 4),
    chunks=(2, 2),
    dtype="i4",
    fill_value=-1,
)
arr[:] = np.arange(16).reshape(4, 4)
print("Original:")
print(arr[:])

def swap_quadrants(idx):
    """Swap diagonal quadrants (top-left ↔ bottom-right, etc.)."""
    row, col = idx
    return [(row + 1) % 2, (col + 1) % 2]

session.reindex_array("/arr2d", forward=swap_quadrants)
print("\nAfter swapping quadrants:")
print(arr[:])
Original:
[[  0   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]
 [ 12  13  14  15]]

After swapping quadrants:
[[ 10  11   8   9]
 [ 14  15  12  13]
 [  2   3   0   1]
 [  6   7   4   5]]