Python API Reference#

The icechunk package provides Repository — the main entry point — and a few commonly-used exceptions and utilities directly in the top-level namespace. Everything else is organized into submodules.

import icechunk as ic

# Top-level: Repository, storage factories, exceptions, utilities
repo = ic.Repository.create(ic.s3_storage(bucket="my-bucket", prefix="my-prefix", from_env=True))

# Submodules for everything else
config = ic.config.RepositoryConfig(...)
solver = ic.conflicts.BasicConflictSolver(...)

Submodules#

Module	Description
`icechunk.config`	Repository configuration, manifest settings, compression, caching
`icechunk.conflicts`	Conflict detection and resolution
`icechunk.credentials`	Credential types and factories for S3, GCS, Azure
`icechunk.ops`	Operation types: updates, garbage collection summaries
`icechunk.session`	Sessions for reading and writing data
`icechunk.snapshots`	Snapshot metadata, diffs, manifest file info
`icechunk.storage`	Storage backends and configuration
`icechunk.virtual`	Virtual chunk containers
`icechunk.xarray`	Xarray integration
`icechunk.dask`	Dask integration

Top-level API#

The following classes, exceptions, and utilities are available directly in the icechunk namespace and are not part of any submodule.

Name	Kind	Description
`Repository`	class	Main entry point for creating and opening repositories
`IcechunkStore`	class	Zarr-compatible store backed by an Icechunk session
`IcechunkError`	exception	Base exception for Icechunk errors
`ConflictError`	exception	Raised on conflicting concurrent writes
`RebaseFailedError`	exception	Raised when a rebase cannot be completed
`print_debug_info`	function	Print versions of icechunk and related packages
`upgrade_icechunk_repository`	function	Migrate a repository to the latest spec version
`supported_spec_versions`	function	List supported spec versions

`icechunk.Repository`#

icechunk.Repository #

An Icechunk repository.

Methods:

Name	Description
`ancestry`	Get the ancestry of a snapshot.
`ancestry_graph`	Build a visual representation of the commit history.
`ancestry_graph_async`	Async version of :meth:`ancestry_graph`.
`async_ancestry`	Get the ancestry of a snapshot.
`chunk_storage_stats`	Calculate the total storage used for chunks, in bytes.
`chunk_storage_stats_async`	Calculate the total storage used for chunks, in bytes (async version).
`create`	Create a new Icechunk repository.
`create_async`	Create a new Icechunk repository asynchronously.
`create_branch`	Create a new branch at the given snapshot.
`create_branch_async`	Create a new branch at the given snapshot (async version).
`create_tag`	Create a new tag at the given snapshot.
`create_tag_async`	Create a new tag at the given snapshot (async version).
`default_commit_metadata`	Get the current configured default commit metadata for the repository.
`delete_branch`	Delete a branch.
`delete_branch_async`	Delete a branch (async version).
`delete_tag`	Delete a tag.
`delete_tag_async`	Delete a tag (async version).
`diff`	Compute an overview of the operations executed from version `from` to version `to`.
`diff_async`	Compute an overview of the operations executed from version `from` to version `to` (async version).
`disabled_feature_flags`	Get feature flags that are currently disabled.
`disabled_feature_flags_async`	Get feature flags that are currently disabled (async version).
`enabled_feature_flags`	Get feature flags that are currently enabled.
`enabled_feature_flags_async`	Get feature flags that are currently enabled (async version).
`exists`	Check if a repository exists at the given storage location.
`exists_async`	Check if a repository exists at the given storage location (async version).
`expire_snapshots`	Expire all snapshots older than a threshold.
`expire_snapshots_async`	Expire all snapshots older than a threshold (async version).
`feature_flags`	Get all feature flags and their current state.
`feature_flags_async`	Get all feature flags and their current state (async version).
`fetch_config`	Fetch the configuration for the repository saved in storage.
`fetch_config_async`	Fetch the configuration for the repository saved in storage (async version).
`fetch_spec_version`	Fetch the spec version of a repository without fully opening it.
`fetch_spec_version_async`	Fetch the spec version of a repository without fully opening it (async version).
`garbage_collect`	Delete any objects no longer accessible from any branches or tags.
`garbage_collect_async`	Delete any objects no longer accessible from any branches or tags (async version).
`get_metadata`	Get the current configured repository metadata.
`get_metadata_async`	Get the current configured repository metadata.
`get_status`	Get the current repository status.
`get_status_async`	Get the current repository status (async version).
`inspect_manifest`	Return chunk storage statistics for a manifest.
`inspect_manifest_async`	Return chunk storage statistics for a manifest.
`inspect_repo_info`	Return the top-level repository metadata.
`inspect_repo_info_async`	Return the top-level repository metadata.
`inspect_snapshot`	Return the node tree stored in a snapshot.
`inspect_snapshot_async`	Return the node tree stored in a snapshot.
`inspect_transaction_log`	Return the record of what changed in a single commit.
`inspect_transaction_log_async`	Return the record of what changed in a single commit.
`list_branches`	List the branches in the repository.
`list_branches_async`	List the branches in the repository (async version).
`list_manifest_files`	Get the manifest files used by the given snapshot ID
`list_manifest_files_async`	Get the manifest files used by the given snapshot ID
`list_tags`	List the tags in the repository.
`list_tags_async`	List the tags in the repository (async version).
`lookup_branch`	Get the tip snapshot ID of a branch.
`lookup_branch_async`	Get the tip snapshot ID of a branch (async version).
`lookup_snapshot`	Get the SnapshotInfo given a snapshot ID
`lookup_snapshot_async`	Get the SnapshotInfo given a snapshot ID (async version)
`lookup_tag`	Get the snapshot ID of a tag.
`lookup_tag_async`	Get the snapshot ID of a tag (async version).
`open`	Open an existing Icechunk repository.
`open_async`	Open an existing Icechunk repository asynchronously.
`open_or_create`	Open an existing Icechunk repository or create a new one if it does not exist.
`open_or_create_async`	Open an existing Icechunk repository or create a new one if it does not exist (async version).
`ops_log`	Get a summary of changes to the repository
`ops_log_async`	Get a summary of changes to the repository
`readonly_session`	Create a read-only session.
`readonly_session_async`	Create a read-only session (async version).
`rearrange_session`	Create a session to move/rename nodes in the Zarr hierarchy.
`rearrange_session_async`	Create a session to move/rename nodes in the Zarr hierarchy.
`reopen`	Reopen the repository with new configuration or credentials.
`reopen_async`	Reopen the repository with new configuration or credentials (async version).
`reset_branch`	Reset a branch to a specific snapshot.
`reset_branch_async`	Reset a branch to a specific snapshot (async version).
`rewrite_manifests`	Rewrite manifests for all arrays.
`rewrite_manifests_async`	Rewrite manifests for all arrays (async version).
`save_config`	Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.
`save_config_async`	Save the repository configuration to storage (async version).
`set_default_commit_metadata`	Set the default commit metadata for the repository. This is useful for providing
`set_feature_flag`	Set a feature flag.
`set_feature_flag_async`	Set a feature flag (async version).
`set_metadata`	Set the repository metadata, the passed dict will replace the complete metadata.
`set_metadata_async`	Set the repository metadata, the passed dict will replace the complete metadata.
`set_status`	Set the repository status.
`set_status_async`	Set the repository status (async version).
`total_chunks_storage`	Calculate the total storage used for chunks, in bytes.
`total_chunks_storage_async`	Calculate the total storage used for chunks, in bytes (async version).
`transaction`	Create a transaction on a branch.
`update_metadata`	Update the repository metadata.
`update_metadata_async`	Update the repository metadata.
`writable_session`	Create a writable session on a branch.
`writable_session_async`	Create a writable session on a branch (async version).

Attributes:

Name	Type	Description
`authorized_virtual_container_prefixes`	`set[str]`	Get all authorized virtual chunk container prefixes.
`config`	`RepositoryConfig`	Get a copy of this repository's config.
`metadata`	`dict[str, Any]`	Get the current configured repository metadata.
`status`	`RepoStatus`	Get the current repository status.
`storage`	`Storage`	Get a copy of this repository's Storage instance.

Source code in icechunk-python/python/icechunk/repository.py

class Repository:
    """An Icechunk repository."""

    _repository: PyRepository

    def __init__(self, repository: PyRepository):
        self._repository = repository

    def __repr__(self) -> str:
        return repr(self._repository)

    def __str__(self) -> str:
        return str(self._repository)

    def _repr_html_(self) -> str:
        return self._repository._repr_html_()

    @classmethod
    def create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Create a new Icechunk repository.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : SpecVersion, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            PyRepository.create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    async def create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Create a new Icechunk repository asynchronously.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : SpecVersion, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            await PyRepository.create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    def open(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            PyRepository.open(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    async def open_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository asynchronously.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            await PyRepository.open_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    def open_or_create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : SpecVersion, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.


        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            PyRepository.open_or_create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    async def open_or_create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist (async version).

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. Each value should be an explicit credential or no-auth
            sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
            for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
            ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
            ``http(s)://`` containers. Passing ``None`` is deprecated and will be
            unsupported in a future release: it silently reads credentials from the
            environment (or uses anonymous access), which can expose private credentials.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : SpecVersion, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return cls(
            await PyRepository.open_or_create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
                check_clean_root=check_clean_root,
            )
        )

    @staticmethod
    def exists(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> bool:
        """
        Check if a repository exists at the given storage location.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return PyRepository.exists(storage, storage_settings)

    @staticmethod
    async def exists_async(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> bool:
        """
        Check if a repository exists at the given storage location (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return await PyRepository.exists_async(storage, storage_settings)

    @staticmethod
    def fetch_spec_version(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> SpecVersion | None:
        """
        Fetch the spec version of a repository without fully opening it.

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        SpecVersion | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return PyRepository.fetch_spec_version(storage, storage_settings)

    @staticmethod
    async def fetch_spec_version_async(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> SpecVersion | None:
        """
        Fetch the spec version of a repository without fully opening it (async version).

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        SpecVersion | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return await PyRepository.fetch_spec_version_async(storage, storage_settings)

    def __getstate__(self) -> object:
        return {
            "_repository": self._repository.as_bytes(),
        }

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid repository state")
        self._repository = PyRepository.from_bytes(state["_repository"])

    @staticmethod
    def fetch_config(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return PyRepository.fetch_config(storage)

    @staticmethod
    async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return await PyRepository.fetch_config_async(storage)

    def save_config(self) -> None:
        """
        Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

        Returns
        -------
        None
        """
        return self._repository.save_config()

    async def save_config_async(self) -> None:
        """
        Save the repository configuration to storage (async version).

        Returns
        -------
        None
        """
        return await self._repository.save_config_async()

    @property
    def config(self) -> RepositoryConfig:
        """
        Get a copy of this repository's config.

        Returns
        -------
        RepositoryConfig
            The repository configuration.
        """
        return self._repository.config()

    @property
    def storage(self) -> Storage:
        """
        Get a copy of this repository's Storage instance.

        Returns
        -------
        Storage
            The repository storage instance.
        """
        return self._repository.storage()

    @property
    def authorized_virtual_container_prefixes(self) -> set[str]:
        """
        Get all authorized virtual chunk container prefixes.

        Returns
        -------
        url_prefixes: set[str]
            The set of authorized url prefixes for each virtual chunk container
        """
        return self._repository.authorized_virtual_container_prefixes

    def reopen(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials.

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return self.__class__(
            self._repository.reopen(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    async def reopen_async(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials (async version).

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
        return self.__class__(
            await self._repository.reopen_async(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the default commit metadata for the repository. This is useful for providing
        addition static system conexted metadata to all commits.

        When a commit is made, the metadata will be merged with the metadata provided, with any
        duplicate keys being overwritten by the metadata provided in the commit.

        !!! warning
            This metadata is only applied to sessions that are created after this call. Any open
            writable sessions will not be affected and will not use the new default metadata.

        Parameters
        ----------
        metadata : dict[str, Any]
            The default commit metadata. Pass an empty dict to clear the default metadata.
        """
        return self._repository.set_default_commit_metadata(metadata)

    def default_commit_metadata(self) -> dict[str, Any]:
        """
        Get the current configured default commit metadata for the repository.

        Returns
        -------
        dict[str, Any]
            The default commit metadata.
        """
        return self._repository.default_commit_metadata()

    def get_metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    @property
    def metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    async def get_metadata_async(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return await self._repository.get_metadata_async()

    def set_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        self._repository.set_metadata(metadata)

    async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        await self._repository.set_metadata_async(metadata)

    def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return self._repository.update_metadata(metadata)

    async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return await self._repository.update_metadata_async(metadata)

    def get_status(self) -> RepoStatus:
        """
        Get the current repository status.

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return self._repository.get_status()

    @property
    def status(self) -> RepoStatus:
        """
        Get the current repository status.

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return self._repository.get_status()

    async def get_status_async(self) -> RepoStatus:
        """
        Get the current repository status (async version).

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return await self._repository.get_status_async()

    def set_status(self, status: RepoStatus) -> None:
        """
        Set the repository status.

        Parameters
        ----------
        status : RepoStatus
            The new status for the repository.
        """
        self._repository.set_status(status)

    async def set_status_async(self, status: RepoStatus) -> None:
        """
        Set the repository status (async version).

        Parameters
        ----------
        status : RepoStatus
            The new status for the repository.
        """
        await self._repository.set_status_async(status)

    def feature_flags(self) -> list[FeatureFlag]:
        """
        Get all feature flags and their current state.

        Returns
        -------
        list[FeatureFlag]
            All feature flags with their id, name, default, setting, and effective state.
        """
        return self._repository.feature_flags()

    async def feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get all feature flags and their current state (async version).

        Returns
        -------
        list[FeatureFlag]
            All feature flags with their id, name, default, setting, and effective state.
        """
        return await self._repository.feature_flags_async()

    def enabled_feature_flags(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently enabled.

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is enabled.
        """
        return self._repository.enabled_feature_flags()

    async def enabled_feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently enabled (async version).

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is enabled.
        """
        return await self._repository.enabled_feature_flags_async()

    def disabled_feature_flags(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently disabled.

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is disabled.
        """
        return self._repository.disabled_feature_flags()

    async def disabled_feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently disabled (async version).

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is disabled.
        """
        return await self._repository.disabled_feature_flags_async()

    def set_feature_flag(self, name: str, setting: bool | None) -> None:
        """
        Set a feature flag.

        Parameters
        ----------
        name : str
            The name of the feature flag.
        setting : bool | None
            True to enable, False to disable, None to reset to default.
        """
        self._repository.set_feature_flag(name, setting)

    async def set_feature_flag_async(self, name: str, setting: bool | None) -> None:
        """
        Set a feature flag (async version).

        Parameters
        ----------
        name : str
            The name of the feature flag.
        setting : bool | None
            True to enable, False to disable, None to reset to default.
        """
        await self._repository.set_feature_flag_async(name, setting)

    def ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> Iterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[SnapshotInfo],
            self._repository.async_ancestry(
                branch=branch, tag=tag, snapshot_id=snapshot_id
            ),
        )
        return res

    def async_ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> AsyncCloseableIterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        )

    def ancestry_graph(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
        plain: bool = False,
    ) -> AncestryGraph:
        """
        Build a visual representation of the commit history.

        When called with no arguments, shows all branches as a tree.
        When called with one of branch/tag/snapshot_id, shows that ref's linear history.

        Parameters
        ----------
        branch : str, optional
            Show history for this branch.
        tag : str, optional
            Show history from this tag.
        snapshot_id : str, optional
            Show history from this snapshot.
        plain : bool, optional
            If True, render without colors (no ANSI codes in text, no fill colors
            in SVG). Useful for CI logs, piping to files, or LLM agents.

        Returns
        -------
        AncestryGraph
            A displayable object. Use print() for colored terminal output,
            or display in Jupyter for an SVG diagram.
        """
        return self._repository.ancestry_graph(
            branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
        )

    async def ancestry_graph_async(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
        plain: bool = False,
    ) -> AncestryGraph:
        """
        Async version of :meth:`ancestry_graph`.
        """
        return await self._repository.ancestry_graph_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
        )

    def ops_log(self) -> Iterator[Update]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[Update],
            self._repository.async_ops_log(),
        )
        return res

    def ops_log_async(self) -> AsyncCloseableIterator[Update]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        return self._repository.async_ops_log()

    def create_branch(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot.

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        self._repository.create_branch(branch, snapshot_id)

    async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot (async version).

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        await self._repository.create_branch_async(branch, snapshot_id)

    def list_branches(self) -> set[str]:
        """
        List the branches in the repository.

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return self._repository.list_branches()

    async def list_branches_async(self) -> set[str]:
        """
        List the branches in the repository (async version).

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return await self._repository.list_branches_async()

    def lookup_branch(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch.

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return self._repository.lookup_branch(branch)

    async def lookup_branch_async(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return await self._repository.lookup_branch_async(branch)

    def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return self._repository.lookup_snapshot(snapshot_id)

    async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID (async version)

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return await self._repository.lookup_snapshot_async(snapshot_id)

    def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return self._repository.list_manifest_files(snapshot_id)

    async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return await self._repository.list_manifest_files_async(snapshot_id)

    def reset_branch(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot.

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

    async def reset_branch_async(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot (async version).

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

    def delete_branch(self, branch: str) -> None:
        """
        Delete a branch.

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        self._repository.delete_branch(branch)

    async def delete_branch_async(self, branch: str) -> None:
        """
        Delete a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_branch_async(branch)

    def delete_tag(self, tag: str) -> None:
        """
        Delete a tag.

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        self._repository.delete_tag(tag)

    async def delete_tag_async(self, tag: str) -> None:
        """
        Delete a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_tag_async(tag)

    def create_tag(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot.

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        self._repository.create_tag(tag, snapshot_id)

    async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot (async version).

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        await self._repository.create_tag_async(tag, snapshot_id)

    def list_tags(self) -> set[str]:
        """
        List the tags in the repository.

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return self._repository.list_tags()

    async def list_tags_async(self) -> set[str]:
        """
        List the tags in the repository (async version).

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return await self._repository.list_tags_async()

    def lookup_tag(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag.

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return self._repository.lookup_tag(tag)

    async def lookup_tag_async(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return await self._repository.lookup_tag_async(tag)

    def diff(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to`.

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return self._repository.diff(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    async def diff_async(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to` (async version).

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return await self._repository.diff_async(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    def readonly_session(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session.

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            self._repository.readonly_session(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    async def readonly_session_async(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session (async version).

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            await self._repository.readonly_session_async(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    def writable_session(self, branch: str) -> Session:
        """
        Create a writable session on a branch.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.writable_session(branch))

    async def writable_session_async(self, branch: str) -> Session:
        """
        Create a writable session on a branch (async version).

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.writable_session_async(branch))

    def rearrange_session(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.rearrange_session(branch))

    async def rearrange_session_async(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.rearrange_session_async(branch))

    @contextmanager
    def transaction(
        self,
        branch: str,
        *,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
    ) -> Iterator[IcechunkStore]:
        """
        Create a transaction on a branch.

        This is a context manager that creates a writable session on the specified branch.
        When the context is exited, the session will be committed to the branch
        using the specified message.

        Parameters
        ----------
        branch : str
            The branch to create the transaction on.
        message : str
            The commit message to use when committing the session.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

        Yields
        -------
        store : IcechunkStore
            A Zarr Store which can be used to interact with the data in the repository.
        """
        session = self.writable_session(branch)
        yield session.store
        session.commit(
            message=message,
            metadata=metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
        )

    def expire_snapshots(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold.

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time. The bound is exclusive: a
            snapshot whose ``written_at`` equals ``older_than`` is kept. The root
            snapshot and the main branch tip are never expired. Other branch and
            tag tips are kept unless ``delete_expired_branches`` /
            ``delete_expired_tags`` are True.
        delete_expired_branches: bool, optional
            Whether to delete branches whose tip points at an expired snapshot.
            The main branch is never deleted.
        delete_expired_tags: bool, optional
            Whether to delete tags whose tip points at an expired snapshot.

        Returns
        -------
        set of expires snapshot IDs
        """
        return self._repository.expire_snapshots(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    async def expire_snapshots_async(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold (async version).

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time. The bound is exclusive: a
            snapshot whose ``written_at`` equals ``older_than`` is kept. The root
            snapshot and the main branch tip are never expired. Other branch and
            tag tips are kept unless ``delete_expired_branches`` /
            ``delete_expired_tags`` are True.
        delete_expired_branches: bool, optional
            Whether to delete branches whose tip points at an expired snapshot.
            The main branch is never deleted.
        delete_expired_tags: bool, optional
            Whether to delete tags whose tip points at an expired snapshot.

        Returns
        -------
        set of expires snapshot IDs
        """
        return await self._repository.expire_snapshots_async(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    def rewrite_manifests(
        self,
        message: str,
        *,
        branch: str,
        metadata: dict[str, Any] | None = None,
        commit_method: CommitMethod = "new_commit",
    ) -> str:
        """
        Rewrite manifests for all arrays.

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        commit_method : CommitMethod, optional
            The commit method to use. Defaults to ``"new_commit"``.
            Use ``"amend"`` to replace the previous commit.
            Note that ``"amend"`` is only supported for spec version 2
            repositories.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return self._repository.rewrite_manifests(
            message, branch=branch, metadata=metadata, commit_method=commit_method
        )

    async def rewrite_manifests_async(
        self,
        message: str,
        *,
        branch: str,
        metadata: dict[str, Any] | None = None,
        commit_method: CommitMethod = "new_commit",
    ) -> str:
        """
        Rewrite manifests for all arrays (async version).

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        commit_method : CommitMethod, optional
            The commit method to use. Defaults to ``"new_commit"``.
            Use ``"amend"`` to replace the previous commit.
            Note that ``"amend"`` is only supported for spec version 2
            repositories.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return await self._repository.rewrite_manifests_async(
            message, branch=branch, metadata=metadata, commit_method=commit_method
        )

    def garbage_collect(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time. The bound is exclusive. An
            object is deleted only if it is also not referenced by any surviving
            (non-expired) snapshot.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return self._repository.garbage_collect(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def garbage_collect_async(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags (async version).

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time. The bound is exclusive. An
            object is deleted only if it is also not referenced by any surviving
            (non-expired) snapshot.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return await self._repository.garbage_collect_async(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def chunk_storage_stats(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def chunk_storage_stats_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def total_chunks_storage(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    async def total_chunks_storage_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    def inspect_snapshot(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the node tree stored in a snapshot.

        The result contains every node's path, node ID, type (array or group),
        and manifest references. Useful for verifying node identity across
        commits or inspecting what a snapshot contains.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
            ``manifests``, ``nodes``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_snapshot(snapshot_id, pretty=False)
        )
        return result

    async def inspect_snapshot_async(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the node tree stored in a snapshot.

        The result contains every node's path, node ID, type (array or group),
        and manifest references. Useful for verifying node identity across
        commits or inspecting what a snapshot contains.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
            ``manifests``, ``nodes``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_snapshot_async(snapshot_id, pretty=False)
        )
        return result

    def inspect_repo_info(self) -> dict[str, Any]:
        """
        Return the top-level repository metadata.

        Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
        and the recent update log.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Returns
        -------
        dict[str, Any]
            Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
            ``snapshots``, ``metadata``, ``latest_updates``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_repo_info(pretty=False)
        )
        return result

    async def inspect_repo_info_async(self) -> dict[str, Any]:
        """
        Return the top-level repository metadata.

        Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
        and the recent update log.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Returns
        -------
        dict[str, Any]
            Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
            ``snapshots``, ``metadata``, ``latest_updates``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_repo_info_async(pretty=False)
        )
        return result

    def inspect_manifest(self, manifest_id: str) -> dict[str, Any]:
        """
        Return chunk storage statistics for a manifest.

        Shows per-array chunk counts broken down by storage type
        (inline, native, virtual) and compression details.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        manifest_id : str
            The manifest to inspect. Manifest IDs can be found in the
            ``manifest_refs`` of array nodes returned by
            :meth:`inspect_snapshot`.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``size_bytes``, ``num_arrays``,
            ``total_chunk_refs``, ``total_inline``, ``total_native``,
            ``total_virtual``, ``arrays``, ``compression``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_manifest(manifest_id, pretty=False)
        )
        return result

    async def inspect_manifest_async(self, manifest_id: str) -> dict[str, Any]:
        """
        Return chunk storage statistics for a manifest.

        Shows per-array chunk counts broken down by storage type
        (inline, native, virtual) and compression details.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        manifest_id : str
            The manifest to inspect. Manifest IDs can be found in the
            ``manifest_refs`` of array nodes returned by
            :meth:`inspect_snapshot_async`.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``size_bytes``, ``num_arrays``,
            ``total_chunk_refs``, ``total_inline``, ``total_native``,
            ``total_virtual``, ``arrays``, ``compression``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_manifest_async(manifest_id, pretty=False)
        )
        return result

    def inspect_transaction_log(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the record of what changed in a single commit.

        Lists the node IDs of every created, deleted, and updated node,
        the chunk coordinates that were written, and any move operations.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot whose transaction log to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
            ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
            ``updated_chunks``, ``moved_nodes``.

            When the snapshot's ancestry was collapsed by expiration, an
            additional ``synthetic_composite`` key is present. It shows
            the transaction log is not a single on-disk file but a
            synthetic merge. Keys: ``note`` (a field text explanation),
            ``merged_pruned_ancestor_tx_logs`` (the pruned-ancestor
            transaction logs merged into this one, oldest first), and
            ``missing_tx_logs`` (referenced pruned-ancestor logs absent from
            storage, expected only when an older GC deleted them).
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_transaction_log(snapshot_id, pretty=False)
        )
        return result

    async def inspect_transaction_log_async(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the record of what changed in a single commit.

        Lists the node IDs of every created, deleted, and updated node,
        the chunk coordinates that were written, and any move operations.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot whose transaction log to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
            ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
            ``updated_chunks``, ``moved_nodes``.

            When the snapshot's ancestry was collapsed by expiration, an
            additional ``synthetic_composite`` key is present. It shows
            the transaction log is not a single on-disk file but a
            synthetic merge. Keys: ``note`` (a field text explanation),
            ``merged_pruned_ancestor_tx_logs`` (the pruned-ancestor
            transaction logs merged into this one, oldest first), and
            ``missing_tx_logs`` (referenced pruned-ancestor logs absent from
            storage, expected only when an older GC deleted them).
        """
        raw = await self._repository.inspect_transaction_log_async(
            snapshot_id, pretty=False
        )
        result: dict[str, Any] = json.loads(raw)
        return result

    @property
    def spec_version(self) -> SpecVersion:
        return self._repository.spec_version

authorized_virtual_container_prefixes `property` #

authorized_virtual_container_prefixes

Get all authorized virtual chunk container prefixes.

Returns:

Name	Type	Description
`url_prefixes`	`set[str]`	The set of authorized url prefixes for each virtual chunk container

config `property` #

config

Get a copy of this repository's config.

Returns:

Type	Description
`RepositoryConfig`	The repository configuration.

metadata `property` #

metadata

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

status `property` #

status

Get the current repository status.

Returns:

Type	Description
`RepoStatus`	The current status of the repository.

storage `property` #

storage

Get a copy of this repository's Storage instance.

Returns:

Type	Description
`Storage`	The repository storage instance.

ancestry #

ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the ancestry of.	`None`
`tag`	`str`	The tag to get the ancestry of.	`None`
`snapshot_id`	`str`	The snapshot ID to get the ancestry of.	`None`

Returns:

Type	Description
`list[SnapshotInfo]`	The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> Iterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[SnapshotInfo],
        self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        ),
    )
    return res

ancestry_graph #

ancestry_graph(
    *, branch=None, tag=None, snapshot_id=None, plain=False
)

Build a visual representation of the commit history.

When called with no arguments, shows all branches as a tree. When called with one of branch/tag/snapshot_id, shows that ref's linear history.

Parameters:

Name	Type	Description	Default
`branch`	`str`	Show history for this branch.	`None`
`tag`	`str`	Show history from this tag.	`None`
`snapshot_id`	`str`	Show history from this snapshot.	`None`
`plain`	`bool`	If True, render without colors (no ANSI codes in text, no fill colors in SVG). Useful for CI logs, piping to files, or LLM agents.	`False`

Returns:

Type	Description
`AncestryGraph`	A displayable object. Use print() for colored terminal output, or display in Jupyter for an SVG diagram.

Source code in icechunk-python/python/icechunk/repository.py

def ancestry_graph(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
    plain: bool = False,
) -> AncestryGraph:
    """
    Build a visual representation of the commit history.

    When called with no arguments, shows all branches as a tree.
    When called with one of branch/tag/snapshot_id, shows that ref's linear history.

    Parameters
    ----------
    branch : str, optional
        Show history for this branch.
    tag : str, optional
        Show history from this tag.
    snapshot_id : str, optional
        Show history from this snapshot.
    plain : bool, optional
        If True, render without colors (no ANSI codes in text, no fill colors
        in SVG). Useful for CI logs, piping to files, or LLM agents.

    Returns
    -------
    AncestryGraph
        A displayable object. Use print() for colored terminal output,
        or display in Jupyter for an SVG diagram.
    """
    return self._repository.ancestry_graph(
        branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
    )

ancestry_graph_async `async` #

ancestry_graph_async(
    *, branch=None, tag=None, snapshot_id=None, plain=False
)

Async version of :meth:ancestry_graph.

Source code in icechunk-python/python/icechunk/repository.py

async def ancestry_graph_async(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
    plain: bool = False,
) -> AncestryGraph:
    """
    Async version of :meth:`ancestry_graph`.
    """
    return await self._repository.ancestry_graph_async(
        branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
    )

async_ancestry #

async_ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the ancestry of.	`None`
`tag`	`str`	The tag to get the ancestry of.	`None`
`snapshot_id`	`str`	The snapshot ID to get the ancestry of.	`None`

Returns:

Type	Description
`list[SnapshotInfo]`	The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def async_ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> AsyncCloseableIterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return self._repository.async_ancestry(
        branch=branch, tag=tag, snapshot_id=snapshot_id
    )

chunk_storage_stats #

chunk_storage_stats(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

def chunk_storage_stats(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

chunk_storage_stats_async `async` #

chunk_storage_stats_async(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

async def chunk_storage_stats_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

create `classmethod` #

create(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    spec_version=None,
    check_clean_root=True,
)

Create a new Icechunk repository. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository configuration. If not provided, a default configuration will be used.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`spec_version`	`SpecVersion`	Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Create a new Icechunk repository.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : SpecVersion, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        PyRepository.create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
            check_clean_root=check_clean_root,
        )
    )

create_async `async` `classmethod` #

create_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    spec_version=None,
    check_clean_root=True,
)

Create a new Icechunk repository asynchronously. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository configuration. If not provided, a default configuration will be used.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`spec_version`	`SpecVersion`	Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Create a new Icechunk repository asynchronously.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : SpecVersion, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        await PyRepository.create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
            check_clean_root=check_clean_root,
        )
    )

create_branch #

create_branch(branch, snapshot_id)

Create a new branch at the given snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The name of the branch to create.	required
`snapshot_id`	`str`	The snapshot ID to create the branch at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def create_branch(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot.

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    self._repository.create_branch(branch, snapshot_id)

create_branch_async `async` #

create_branch_async(branch, snapshot_id)

Create a new branch at the given snapshot (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The name of the branch to create.	required
`snapshot_id`	`str`	The snapshot ID to create the branch at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot (async version).

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    await self._repository.create_branch_async(branch, snapshot_id)

create_tag #

create_tag(tag, snapshot_id)

Create a new tag at the given snapshot.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The name of the tag to create.	required
`snapshot_id`	`str`	The snapshot ID to create the tag at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def create_tag(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot.

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    self._repository.create_tag(tag, snapshot_id)

create_tag_async `async` #

create_tag_async(tag, snapshot_id)

Create a new tag at the given snapshot (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The name of the tag to create.	required
`snapshot_id`	`str`	The snapshot ID to create the tag at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot (async version).

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    await self._repository.create_tag_async(tag, snapshot_id)

default_commit_metadata #

default_commit_metadata()

Get the current configured default commit metadata for the repository.

Returns:

Type	Description
`dict[str, Any]`	The default commit metadata.

Source code in icechunk-python/python/icechunk/repository.py

def default_commit_metadata(self) -> dict[str, Any]:
    """
    Get the current configured default commit metadata for the repository.

    Returns
    -------
    dict[str, Any]
        The default commit metadata.
    """
    return self._repository.default_commit_metadata()

delete_branch #

delete_branch(branch)

Delete a branch.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def delete_branch(self, branch: str) -> None:
    """
    Delete a branch.

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    self._repository.delete_branch(branch)

delete_branch_async `async` #

delete_branch_async(branch)

Delete a branch (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def delete_branch_async(self, branch: str) -> None:
    """
    Delete a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_branch_async(branch)

delete_tag #

delete_tag(tag)

Delete a tag.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def delete_tag(self, tag: str) -> None:
    """
    Delete a tag.

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    self._repository.delete_tag(tag)

delete_tag_async `async` #

delete_tag_async(tag)

Delete a tag (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def delete_tag_async(self, tag: str) -> None:
    """
    Delete a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_tag_async(tag)

diff #

diff(
    *,
    from_branch=None,
    from_tag=None,
    from_snapshot_id=None,
    to_branch=None,
    to_tag=None,
    to_snapshot_id=None,
)

Compute an overview of the operations executed from version from to version to.

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type	Description
`Diff`	The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py

def diff(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to`.

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return self._repository.diff(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

diff_async `async` #

diff_async(
    *,
    from_branch=None,
    from_tag=None,
    from_snapshot_id=None,
    to_branch=None,
    to_tag=None,
    to_snapshot_id=None,
)

Compute an overview of the operations executed from version from to version to (async version).

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type	Description
`Diff`	The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py

async def diff_async(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to` (async version).

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return await self._repository.diff_async(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

disabled_feature_flags #

disabled_feature_flags()

Get feature flags that are currently disabled.

Returns:

Type	Description
`list[FeatureFlag]`	Feature flags whose effective state is disabled.

Source code in icechunk-python/python/icechunk/repository.py

def disabled_feature_flags(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently disabled.

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is disabled.
    """
    return self._repository.disabled_feature_flags()

disabled_feature_flags_async `async` #

disabled_feature_flags_async()

Get feature flags that are currently disabled (async version).

Returns:

Type	Description
`list[FeatureFlag]`	Feature flags whose effective state is disabled.

Source code in icechunk-python/python/icechunk/repository.py

async def disabled_feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently disabled (async version).

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is disabled.
    """
    return await self._repository.disabled_feature_flags_async()

enabled_feature_flags #

enabled_feature_flags()

Get feature flags that are currently enabled.

Returns:

Type	Description
`list[FeatureFlag]`	Feature flags whose effective state is enabled.

Source code in icechunk-python/python/icechunk/repository.py

def enabled_feature_flags(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently enabled.

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is enabled.
    """
    return self._repository.enabled_feature_flags()

enabled_feature_flags_async `async` #

enabled_feature_flags_async()

Get feature flags that are currently enabled (async version).

Returns:

Type	Description
`list[FeatureFlag]`	Feature flags whose effective state is enabled.

Source code in icechunk-python/python/icechunk/repository.py

async def enabled_feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently enabled (async version).

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is enabled.
    """
    return await self._repository.enabled_feature_flags_async()

exists `staticmethod` #

exists(storage, storage_settings=None)

Check if a repository exists at the given storage location.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`storage_settings`	`StorageSettings \| None`	Optional storage settings to use for the initial storage call.	`None`

Returns:

Type	Description
`bool`	True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def exists(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> bool:
    """
    Check if a repository exists at the given storage location.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return PyRepository.exists(storage, storage_settings)

exists_async `async` `staticmethod` #

exists_async(storage, storage_settings=None)

Check if a repository exists at the given storage location (async version).

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`storage_settings`	`StorageSettings \| None`	Optional storage settings to use for the initial storage call.	`None`

Returns:

Type	Description
`bool`	True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def exists_async(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> bool:
    """
    Check if a repository exists at the given storage location (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return await PyRepository.exists_async(storage, storage_settings)

expire_snapshots #

expire_snapshots(
    older_than,
    *,
    delete_expired_branches=False,
    delete_expired_tags=False,
)

Expire all snapshots older than a threshold.

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name	Type	Description	Default
`older_than`	`datetime`	Expire snapshots older than this time. The bound is exclusive: a snapshot whose `written_at` equals `older_than` is kept. The root snapshot and the main branch tip are never expired. Other branch and tag tips are kept unless `delete_expired_branches` / `delete_expired_tags` are True.	required
`delete_expired_branches`	`bool`	Whether to delete branches whose tip points at an expired snapshot. The main branch is never deleted.	`False`
`delete_expired_tags`	`bool`	Whether to delete tags whose tip points at an expired snapshot.	`False`

Returns:

Type	Description
`set of expires snapshot IDs`

Source code in icechunk-python/python/icechunk/repository.py

def expire_snapshots(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold.

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time. The bound is exclusive: a
        snapshot whose ``written_at`` equals ``older_than`` is kept. The root
        snapshot and the main branch tip are never expired. Other branch and
        tag tips are kept unless ``delete_expired_branches`` /
        ``delete_expired_tags`` are True.
    delete_expired_branches: bool, optional
        Whether to delete branches whose tip points at an expired snapshot.
        The main branch is never deleted.
    delete_expired_tags: bool, optional
        Whether to delete tags whose tip points at an expired snapshot.

    Returns
    -------
    set of expires snapshot IDs
    """
    return self._repository.expire_snapshots(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

expire_snapshots_async `async` #

expire_snapshots_async(
    older_than,
    *,
    delete_expired_branches=False,
    delete_expired_tags=False,
)

Expire all snapshots older than a threshold (async version).

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name	Type	Description	Default
`older_than`	`datetime`	Expire snapshots older than this time. The bound is exclusive: a snapshot whose `written_at` equals `older_than` is kept. The root snapshot and the main branch tip are never expired. Other branch and tag tips are kept unless `delete_expired_branches` / `delete_expired_tags` are True.	required
`delete_expired_branches`	`bool`	Whether to delete branches whose tip points at an expired snapshot. The main branch is never deleted.	`False`
`delete_expired_tags`	`bool`	Whether to delete tags whose tip points at an expired snapshot.	`False`

Returns:

Type	Description
`set of expires snapshot IDs`

Source code in icechunk-python/python/icechunk/repository.py

async def expire_snapshots_async(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold (async version).

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time. The bound is exclusive: a
        snapshot whose ``written_at`` equals ``older_than`` is kept. The root
        snapshot and the main branch tip are never expired. Other branch and
        tag tips are kept unless ``delete_expired_branches`` /
        ``delete_expired_tags`` are True.
    delete_expired_branches: bool, optional
        Whether to delete branches whose tip points at an expired snapshot.
        The main branch is never deleted.
    delete_expired_tags: bool, optional
        Whether to delete tags whose tip points at an expired snapshot.

    Returns
    -------
    set of expires snapshot IDs
    """
    return await self._repository.expire_snapshots_async(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

feature_flags #

feature_flags()

Get all feature flags and their current state.

Returns:

Type	Description
`list[FeatureFlag]`	All feature flags with their id, name, default, setting, and effective state.

Source code in icechunk-python/python/icechunk/repository.py

def feature_flags(self) -> list[FeatureFlag]:
    """
    Get all feature flags and their current state.

    Returns
    -------
    list[FeatureFlag]
        All feature flags with their id, name, default, setting, and effective state.
    """
    return self._repository.feature_flags()

feature_flags_async `async` #

feature_flags_async()

Get all feature flags and their current state (async version).

Returns:

Type	Description
`list[FeatureFlag]`	All feature flags with their id, name, default, setting, and effective state.

Source code in icechunk-python/python/icechunk/repository.py

async def feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get all feature flags and their current state (async version).

    Returns
    -------
    list[FeatureFlag]
        All feature flags with their id, name, default, setting, and effective state.
    """
    return await self._repository.feature_flags_async()

fetch_config `staticmethod` #

fetch_config(storage)

Fetch the configuration for the repository saved in storage.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`RepositoryConfig \| None`	The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def fetch_config(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return PyRepository.fetch_config(storage)

fetch_config_async `async` `staticmethod` #

fetch_config_async(storage)

Fetch the configuration for the repository saved in storage (async version).

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`RepositoryConfig \| None`	The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return await PyRepository.fetch_config_async(storage)

fetch_spec_version `staticmethod` #

fetch_spec_version(storage, storage_settings=None)

Fetch the spec version of a repository without fully opening it.

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`storage_settings`	`StorageSettings \| None`	Optional storage settings to use for the initial storage call.	`None`

Returns:

Type	Description
`SpecVersion \| None`	The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def fetch_spec_version(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> SpecVersion | None:
    """
    Fetch the spec version of a repository without fully opening it.

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    SpecVersion | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return PyRepository.fetch_spec_version(storage, storage_settings)

fetch_spec_version_async `async` `staticmethod` #

fetch_spec_version_async(storage, storage_settings=None)

Fetch the spec version of a repository without fully opening it (async version).

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`storage_settings`	`StorageSettings \| None`	Optional storage settings to use for the initial storage call.	`None`

Returns:

Type	Description
`SpecVersion \| None`	The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def fetch_spec_version_async(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> SpecVersion | None:
    """
    Fetch the spec version of a repository without fully opening it (async version).

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    SpecVersion | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return await PyRepository.fetch_spec_version_async(storage, storage_settings)

garbage_collect #

garbage_collect(
    delete_object_older_than,
    *,
    dry_run=False,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Delete any objects no longer accessible from any branches or tags.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name	Type	Description	Default
`delete_object_older_than`	`datetime`	Delete objects older than this time. The bound is exclusive. An object is deleted only if it is also not referenced by any surviving (non-expired) snapshot.	required
`dry_run`	`bool`	Report results but don't delete any objects	`False`
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Returns:

Type	Description
`GCSummary`	Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py

def garbage_collect(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time. The bound is exclusive. An
        object is deleted only if it is also not referenced by any surviving
        (non-expired) snapshot.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return self._repository.garbage_collect(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

garbage_collect_async `async` #

garbage_collect_async(
    delete_object_older_than,
    *,
    dry_run=False,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Delete any objects no longer accessible from any branches or tags (async version).

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name	Type	Description	Default
`delete_object_older_than`	`datetime`	Delete objects older than this time. The bound is exclusive. An object is deleted only if it is also not referenced by any surviving (non-expired) snapshot.	required
`dry_run`	`bool`	Report results but don't delete any objects	`False`
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Returns:

Type	Description
`GCSummary`	Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py

async def garbage_collect_async(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags (async version).

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time. The bound is exclusive. An
        object is deleted only if it is also not referenced by any surviving
        (non-expired) snapshot.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return await self._repository.garbage_collect_async(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

get_metadata #

get_metadata()

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py

def get_metadata(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return self._repository.get_metadata()

get_metadata_async `async` #

get_metadata_async()

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py

async def get_metadata_async(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return await self._repository.get_metadata_async()

get_status #

get_status()

Get the current repository status.

Returns:

Type	Description
`RepoStatus`	The current status of the repository.

Source code in icechunk-python/python/icechunk/repository.py

def get_status(self) -> RepoStatus:
    """
    Get the current repository status.

    Returns
    -------
    RepoStatus
        The current status of the repository.
    """
    return self._repository.get_status()

get_status_async `async` #

get_status_async()

Get the current repository status (async version).

Returns:

Type	Description
`RepoStatus`	The current status of the repository.

Source code in icechunk-python/python/icechunk/repository.py

async def get_status_async(self) -> RepoStatus:
    """
    Get the current repository status (async version).

    Returns
    -------
    RepoStatus
        The current status of the repository.
    """
    return await self._repository.get_status_async()

inspect_manifest #

inspect_manifest(manifest_id)

Return chunk storage statistics for a manifest.

Shows per-array chunk counts broken down by storage type (inline, native, virtual) and compression details.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`manifest_id`	`str`	The manifest to inspect. Manifest IDs can be found in the `manifest_refs` of array nodes returned by :meth:`inspect_snapshot`.	required

Returns:

Type	Description
`dict[str, Any]`	Keys: `id`, `size_bytes`, `num_arrays`, `total_chunk_refs`, `total_inline`, `total_native`, `total_virtual`, `arrays`, `compression`.

Source code in icechunk-python/python/icechunk/repository.py

def inspect_manifest(self, manifest_id: str) -> dict[str, Any]:
    """
    Return chunk storage statistics for a manifest.

    Shows per-array chunk counts broken down by storage type
    (inline, native, virtual) and compression details.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    manifest_id : str
        The manifest to inspect. Manifest IDs can be found in the
        ``manifest_refs`` of array nodes returned by
        :meth:`inspect_snapshot`.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``size_bytes``, ``num_arrays``,
        ``total_chunk_refs``, ``total_inline``, ``total_native``,
        ``total_virtual``, ``arrays``, ``compression``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_manifest(manifest_id, pretty=False)
    )
    return result

inspect_manifest_async `async` #

inspect_manifest_async(manifest_id)

Return chunk storage statistics for a manifest.

Shows per-array chunk counts broken down by storage type (inline, native, virtual) and compression details.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`manifest_id`	`str`	The manifest to inspect. Manifest IDs can be found in the `manifest_refs` of array nodes returned by :meth:`inspect_snapshot_async`.	required

Returns:

Type	Description
`dict[str, Any]`	Keys: `id`, `size_bytes`, `num_arrays`, `total_chunk_refs`, `total_inline`, `total_native`, `total_virtual`, `arrays`, `compression`.

Source code in icechunk-python/python/icechunk/repository.py

async def inspect_manifest_async(self, manifest_id: str) -> dict[str, Any]:
    """
    Return chunk storage statistics for a manifest.

    Shows per-array chunk counts broken down by storage type
    (inline, native, virtual) and compression details.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    manifest_id : str
        The manifest to inspect. Manifest IDs can be found in the
        ``manifest_refs`` of array nodes returned by
        :meth:`inspect_snapshot_async`.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``size_bytes``, ``num_arrays``,
        ``total_chunk_refs``, ``total_inline``, ``total_native``,
        ``total_virtual``, ``arrays``, ``compression``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_manifest_async(manifest_id, pretty=False)
    )
    return result

inspect_repo_info #

inspect_repo_info()

Return the top-level repository metadata.

Includes the branch-to-snapshot mapping, tags, snapshot ancestry, and the recent update log.

This is a testing/debugging utility. The return type and structure may change in future versions.

Returns:

Type	Description
`dict[str, Any]`	Keys: `spec_version`, `branches`, `tags`, `deleted_tags`, `snapshots`, `metadata`, `latest_updates`.

Source code in icechunk-python/python/icechunk/repository.py

def inspect_repo_info(self) -> dict[str, Any]:
    """
    Return the top-level repository metadata.

    Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
    and the recent update log.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Returns
    -------
    dict[str, Any]
        Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
        ``snapshots``, ``metadata``, ``latest_updates``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_repo_info(pretty=False)
    )
    return result

inspect_repo_info_async `async` #

inspect_repo_info_async()

Return the top-level repository metadata.

Includes the branch-to-snapshot mapping, tags, snapshot ancestry, and the recent update log.

This is a testing/debugging utility. The return type and structure may change in future versions.

Returns:

Type	Description
`dict[str, Any]`	Keys: `spec_version`, `branches`, `tags`, `deleted_tags`, `snapshots`, `metadata`, `latest_updates`.

Source code in icechunk-python/python/icechunk/repository.py

async def inspect_repo_info_async(self) -> dict[str, Any]:
    """
    Return the top-level repository metadata.

    Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
    and the recent update log.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Returns
    -------
    dict[str, Any]
        Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
        ``snapshots``, ``metadata``, ``latest_updates``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_repo_info_async(pretty=False)
    )
    return result

inspect_snapshot #

inspect_snapshot(snapshot_id)

Return the node tree stored in a snapshot.

The result contains every node's path, node ID, type (array or group), and manifest references. Useful for verifying node identity across commits or inspecting what a snapshot contains.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The snapshot to inspect.	required

Returns:

Type	Description
`dict[str, Any]`	Keys: `id`, `flushed_at`, `commit_message`, `metadata`, `manifests`, `nodes`.

Source code in icechunk-python/python/icechunk/repository.py

def inspect_snapshot(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the node tree stored in a snapshot.

    The result contains every node's path, node ID, type (array or group),
    and manifest references. Useful for verifying node identity across
    commits or inspecting what a snapshot contains.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
        ``manifests``, ``nodes``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_snapshot(snapshot_id, pretty=False)
    )
    return result

inspect_snapshot_async `async` #

inspect_snapshot_async(snapshot_id)

Return the node tree stored in a snapshot.

The result contains every node's path, node ID, type (array or group), and manifest references. Useful for verifying node identity across commits or inspecting what a snapshot contains.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The snapshot to inspect.	required

Returns:

Type	Description
`dict[str, Any]`	Keys: `id`, `flushed_at`, `commit_message`, `metadata`, `manifests`, `nodes`.

Source code in icechunk-python/python/icechunk/repository.py

async def inspect_snapshot_async(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the node tree stored in a snapshot.

    The result contains every node's path, node ID, type (array or group),
    and manifest references. Useful for verifying node identity across
    commits or inspecting what a snapshot contains.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
        ``manifests``, ``nodes``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_snapshot_async(snapshot_id, pretty=False)
    )
    return result

inspect_transaction_log #

inspect_transaction_log(snapshot_id)

Return the record of what changed in a single commit.

Lists the node IDs of every created, deleted, and updated node, the chunk coordinates that were written, and any move operations.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The snapshot whose transaction log to inspect.	required

Returns:

Type Description

dict[str, Any]

Keys: new_groups, new_arrays, deleted_groups, deleted_arrays, updated_groups, updated_arrays, updated_chunks, moved_nodes.

When the snapshot's ancestry was collapsed by expiration, an additional synthetic_composite key is present. It shows the transaction log is not a single on-disk file but a synthetic merge. Keys: note (a field text explanation), merged_pruned_ancestor_tx_logs (the pruned-ancestor transaction logs merged into this one, oldest first), and missing_tx_logs (referenced pruned-ancestor logs absent from storage, expected only when an older GC deleted them).

Source code in icechunk-python/python/icechunk/repository.py

def inspect_transaction_log(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the record of what changed in a single commit.

    Lists the node IDs of every created, deleted, and updated node,
    the chunk coordinates that were written, and any move operations.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot whose transaction log to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
        ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
        ``updated_chunks``, ``moved_nodes``.

        When the snapshot's ancestry was collapsed by expiration, an
        additional ``synthetic_composite`` key is present. It shows
        the transaction log is not a single on-disk file but a
        synthetic merge. Keys: ``note`` (a field text explanation),
        ``merged_pruned_ancestor_tx_logs`` (the pruned-ancestor
        transaction logs merged into this one, oldest first), and
        ``missing_tx_logs`` (referenced pruned-ancestor logs absent from
        storage, expected only when an older GC deleted them).
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_transaction_log(snapshot_id, pretty=False)
    )
    return result

inspect_transaction_log_async `async` #

inspect_transaction_log_async(snapshot_id)

Return the record of what changed in a single commit.

Lists the node IDs of every created, deleted, and updated node, the chunk coordinates that were written, and any move operations.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The snapshot whose transaction log to inspect.	required

Returns:

Type Description

dict[str, Any]

Keys: new_groups, new_arrays, deleted_groups, deleted_arrays, updated_groups, updated_arrays, updated_chunks, moved_nodes.

When the snapshot's ancestry was collapsed by expiration, an additional synthetic_composite key is present. It shows the transaction log is not a single on-disk file but a synthetic merge. Keys: note (a field text explanation), merged_pruned_ancestor_tx_logs (the pruned-ancestor transaction logs merged into this one, oldest first), and missing_tx_logs (referenced pruned-ancestor logs absent from storage, expected only when an older GC deleted them).

Source code in icechunk-python/python/icechunk/repository.py

async def inspect_transaction_log_async(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the record of what changed in a single commit.

    Lists the node IDs of every created, deleted, and updated node,
    the chunk coordinates that were written, and any move operations.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot whose transaction log to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
        ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
        ``updated_chunks``, ``moved_nodes``.

        When the snapshot's ancestry was collapsed by expiration, an
        additional ``synthetic_composite`` key is present. It shows
        the transaction log is not a single on-disk file but a
        synthetic merge. Keys: ``note`` (a field text explanation),
        ``merged_pruned_ancestor_tx_logs`` (the pruned-ancestor
        transaction logs merged into this one, oldest first), and
        ``missing_tx_logs`` (referenced pruned-ancestor logs absent from
        storage, expected only when an older GC deleted them).
    """
    raw = await self._repository.inspect_transaction_log_async(
        snapshot_id, pretty=False
    )
    result: dict[str, Any] = json.loads(raw)
    return result

list_branches #

list_branches()

List the branches in the repository.

Returns:

Type	Description
`set[str]`	A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py

def list_branches(self) -> set[str]:
    """
    List the branches in the repository.

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return self._repository.list_branches()

list_branches_async `async` #

list_branches_async()

List the branches in the repository (async version).

Returns:

Type	Description
`set[str]`	A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py

async def list_branches_async(self) -> set[str]:
    """
    List the branches in the repository (async version).

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return await self._repository.list_branches_async()

list_manifest_files #

list_manifest_files(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to get information for	required

Returns:

Type	Description
`list[ManifestFileInfo]`

Source code in icechunk-python/python/icechunk/repository.py

def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return self._repository.list_manifest_files(snapshot_id)

list_manifest_files_async `async` #

list_manifest_files_async(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to get information for	required

Returns:

Type	Description
`list[ManifestFileInfo]`

Source code in icechunk-python/python/icechunk/repository.py

async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return await self._repository.list_manifest_files_async(snapshot_id)

list_tags #

list_tags()

List the tags in the repository.

Returns:

Type	Description
`set[str]`	A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py

def list_tags(self) -> set[str]:
    """
    List the tags in the repository.

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return self._repository.list_tags()

list_tags_async `async` #

list_tags_async()

List the tags in the repository (async version).

Returns:

Type	Description
`set[str]`	A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py

async def list_tags_async(self) -> set[str]:
    """
    List the tags in the repository (async version).

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return await self._repository.list_tags_async()

lookup_branch #

lookup_branch(branch)

Get the tip snapshot ID of a branch.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the tip of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py

def lookup_branch(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch.

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return self._repository.lookup_branch(branch)

lookup_branch_async `async` #

lookup_branch_async(branch)

Get the tip snapshot ID of a branch (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the tip of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_branch_async(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return await self._repository.lookup_branch_async(branch)

lookup_snapshot #

lookup_snapshot(snapshot_id)

Get the SnapshotInfo given a snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to look up	required

Returns:

Type	Description
`SnapshotInfo`

Source code in icechunk-python/python/icechunk/repository.py

def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return self._repository.lookup_snapshot(snapshot_id)

lookup_snapshot_async `async` #

lookup_snapshot_async(snapshot_id)

Get the SnapshotInfo given a snapshot ID (async version)

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to look up	required

Returns:

Type	Description
`SnapshotInfo`

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID (async version)

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return await self._repository.lookup_snapshot_async(snapshot_id)

lookup_tag #

lookup_tag(tag)

Get the snapshot ID of a tag.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to get the snapshot ID of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py

def lookup_tag(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag.

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return self._repository.lookup_tag(tag)

lookup_tag_async `async` #

lookup_tag_async(tag)

Get the snapshot ID of a tag (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to get the snapshot ID of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_tag_async(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return await self._repository.lookup_tag_async(tag)

open `classmethod` #

open(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
)

Open an existing Icechunk repository.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def open(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        PyRepository.open(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_async `async` `classmethod` #

open_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
)

Open an existing Icechunk repository asynchronously.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def open_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository asynchronously.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        await PyRepository.open_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_or_create `classmethod` #

open_or_create(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    create_version=None,
    check_clean_root=True,
)

Open an existing Icechunk repository or create a new one if it does not exist.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`create_version`	`SpecVersion`	Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def open_or_create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : SpecVersion, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.


    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        PyRepository.open_or_create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
            check_clean_root=check_clean_root,
        )
    )

open_or_create_async `async` `classmethod` #

open_or_create_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    create_version=None,
    check_clean_root=True,
)

Open an existing Icechunk repository or create a new one if it does not exist (async version).

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. Each value should be an explicit credential or no-auth sentinel: e.g. `S3Credentials.FromEnv()` / `s3_anonymous_credentials()` for S3, or the `icechunk.credentials.LocalFileSystemAccess` / `icechunk.credentials.HttpAccess` sentinels for `file://` and `http(s)://` containers. Passing `None` is deprecated and will be unsupported in a future release: it silently reads credentials from the environment (or uses anonymous access), which can expose private credentials. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`create_version`	`SpecVersion`	Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def open_or_create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist (async version).

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. Each value should be an explicit credential or no-auth
        sentinel: e.g. ``S3Credentials.FromEnv()`` / ``s3_anonymous_credentials()``
        for S3, or the ``icechunk.credentials.LocalFileSystemAccess`` /
        ``icechunk.credentials.HttpAccess`` sentinels for ``file://`` and
        ``http(s)://`` containers. Passing ``None`` is deprecated and will be
        unsupported in a future release: it silently reads credentials from the
        environment (or uses anonymous access), which can expose private credentials.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : SpecVersion, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return cls(
        await PyRepository.open_or_create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
            check_clean_root=check_clean_root,
        )
    )

ops_log #

ops_log()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py

def ops_log(self) -> Iterator[Update]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[Update],
        self._repository.async_ops_log(),
    )
    return res

ops_log_async #

ops_log_async()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py

def ops_log_async(self) -> AsyncCloseableIterator[Update]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    return self._repository.async_ops_log()

readonly_session #

readonly_session(
    branch=None, *, tag=None, snapshot_id=None, as_of=None
)

Create a read-only session.

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name	Type	Description	Default
`branch`	`str`	If provided, the branch to create the session on.	`None`
`tag`	`str`	If provided, the tag to create the session on.	`None`
`snapshot_id`	`str`	If provided, the snapshot ID to create the session on.	`None`
`as_of`	`datetime \| None`	When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime	`None`

Returns:

Type	Description
`Session`	The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def readonly_session(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session.

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        self._repository.readonly_session(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

readonly_session_async `async` #

readonly_session_async(
    branch=None, *, tag=None, snapshot_id=None, as_of=None
)

Create a read-only session (async version).

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name	Type	Description	Default
`branch`	`str`	If provided, the branch to create the session on.	`None`
`tag`	`str`	If provided, the tag to create the session on.	`None`
`snapshot_id`	`str`	If provided, the snapshot ID to create the session on.	`None`
`as_of`	`datetime \| None`	When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime	`None`

Returns:

Type	Description
`Session`	The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

async def readonly_session_async(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session (async version).

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        await self._repository.readonly_session_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

rearrange_session #

rearrange_session(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

def rearrange_session(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.rearrange_session(branch))

rearrange_session_async `async` #

rearrange_session_async(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def rearrange_session_async(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.rearrange_session_async(branch))

reopen #

reopen(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials.

Parameters:

Name	Type	Description	Default
`config`	`RepositoryConfig`	The new repository configuration. If not provided, uses the existing configuration.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	New virtual chunk access credentials.	`None`

Returns:

Type	Description
`Self`	A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py

def reopen(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials.

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return self.__class__(
        self._repository.reopen(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reopen_async `async` #

reopen_async(
    config=None, authorize_virtual_chunk_access=None
)

Reopen the repository with new configuration or credentials (async version).

Parameters:

Name	Type	Description	Default
`config`	`RepositoryConfig`	The new repository configuration. If not provided, uses the existing configuration.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	New virtual chunk access credentials.	`None`

Returns:

Type	Description
`Self`	A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py

async def reopen_async(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials (async version).

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    _warn_on_none_virtual_chunk_credentials(authorize_virtual_chunk_access)
    return self.__class__(
        await self._repository.reopen_async(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reset_branch #

reset_branch(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot.

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to reset.	required
`snapshot_id`	`str`	The snapshot ID to reset the branch to.	required
`from_snapshot_id`	`str \| None`	If passed, the reset will only be executed if the branch currently points to from_snapshot_id.	`None`

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def reset_branch(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot.

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

reset_branch_async `async` #

reset_branch_async(
    branch, snapshot_id, *, from_snapshot_id=None
)

Reset a branch to a specific snapshot (async version).

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to reset.	required
`snapshot_id`	`str`	The snapshot ID to reset the branch to.	required
`from_snapshot_id`	`str \| None`	If passed, the reset will only be executed if the branch currently points to from_snapshot_id.	`None`

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def reset_branch_async(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot (async version).

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

rewrite_manifests #

rewrite_manifests(
    message,
    *,
    branch,
    metadata=None,
    commit_method="new_commit",
)

Rewrite manifests for all arrays.

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`branch`	`str`	The branch to commit to.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`commit_method`	`CommitMethod`	The commit method to use. Defaults to `"new_commit"`. Use `"amend"` to replace the previous commit. Note that `"amend"` is only supported for spec version 2 repositories.	`'new_commit'`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py

def rewrite_manifests(
    self,
    message: str,
    *,
    branch: str,
    metadata: dict[str, Any] | None = None,
    commit_method: CommitMethod = "new_commit",
) -> str:
    """
    Rewrite manifests for all arrays.

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    commit_method : CommitMethod, optional
        The commit method to use. Defaults to ``"new_commit"``.
        Use ``"amend"`` to replace the previous commit.
        Note that ``"amend"`` is only supported for spec version 2
        repositories.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return self._repository.rewrite_manifests(
        message, branch=branch, metadata=metadata, commit_method=commit_method
    )

rewrite_manifests_async `async` #

rewrite_manifests_async(
    message,
    *,
    branch,
    metadata=None,
    commit_method="new_commit",
)

Rewrite manifests for all arrays (async version).

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`branch`	`str`	The branch to commit to.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`commit_method`	`CommitMethod`	The commit method to use. Defaults to `"new_commit"`. Use `"amend"` to replace the previous commit. Note that `"amend"` is only supported for spec version 2 repositories.	`'new_commit'`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py

async def rewrite_manifests_async(
    self,
    message: str,
    *,
    branch: str,
    metadata: dict[str, Any] | None = None,
    commit_method: CommitMethod = "new_commit",
) -> str:
    """
    Rewrite manifests for all arrays (async version).

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    commit_method : CommitMethod, optional
        The commit method to use. Defaults to ``"new_commit"``.
        Use ``"amend"`` to replace the previous commit.
        Note that ``"amend"`` is only supported for spec version 2
        repositories.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return await self._repository.rewrite_manifests_async(
        message, branch=branch, metadata=metadata, commit_method=commit_method
    )

save_config #

save_config()

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def save_config(self) -> None:
    """
    Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

    Returns
    -------
    None
    """
    return self._repository.save_config()

save_config_async `async` #

save_config_async()

Save the repository configuration to storage (async version).

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def save_config_async(self) -> None:
    """
    Save the repository configuration to storage (async version).

    Returns
    -------
    None
    """
    return await self._repository.save_config_async()

set_default_commit_metadata #

set_default_commit_metadata(metadata)

Set the default commit metadata for the repository. This is useful for providing addition static system conexted metadata to all commits.

When a commit is made, the metadata will be merged with the metadata provided, with any duplicate keys being overwritten by the metadata provided in the commit.

Warning

This metadata is only applied to sessions that are created after this call. Any open writable sessions will not be affected and will not use the new default metadata.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The default commit metadata. Pass an empty dict to clear the default metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the default commit metadata for the repository. This is useful for providing
    addition static system conexted metadata to all commits.

    When a commit is made, the metadata will be merged with the metadata provided, with any
    duplicate keys being overwritten by the metadata provided in the commit.

    !!! warning
        This metadata is only applied to sessions that are created after this call. Any open
        writable sessions will not be affected and will not use the new default metadata.

    Parameters
    ----------
    metadata : dict[str, Any]
        The default commit metadata. Pass an empty dict to clear the default metadata.
    """
    return self._repository.set_default_commit_metadata(metadata)

set_feature_flag #

set_feature_flag(name, setting)

Set a feature flag.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the feature flag.	required
`setting`	`bool \| None`	True to enable, False to disable, None to reset to default.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_feature_flag(self, name: str, setting: bool | None) -> None:
    """
    Set a feature flag.

    Parameters
    ----------
    name : str
        The name of the feature flag.
    setting : bool | None
        True to enable, False to disable, None to reset to default.
    """
    self._repository.set_feature_flag(name, setting)

set_feature_flag_async `async` #

set_feature_flag_async(name, setting)

Set a feature flag (async version).

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the feature flag.	required
`setting`	`bool \| None`	True to enable, False to disable, None to reset to default.	required

Source code in icechunk-python/python/icechunk/repository.py

async def set_feature_flag_async(self, name: str, setting: bool | None) -> None:
    """
    Set a feature flag (async version).

    Parameters
    ----------
    name : str
        The name of the feature flag.
    setting : bool | None
        True to enable, False to disable, None to reset to default.
    """
    await self._repository.set_feature_flag_async(name, setting)

set_metadata #

set_metadata(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The value to use as repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    self._repository.set_metadata(metadata)

set_metadata_async `async` #

set_metadata_async(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The value to use as repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    await self._repository.set_metadata_async(metadata)

set_status #

set_status(status)

Set the repository status.

Parameters:

Name	Type	Description	Default
`status`	`RepoStatus`	The new status for the repository.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_status(self, status: RepoStatus) -> None:
    """
    Set the repository status.

    Parameters
    ----------
    status : RepoStatus
        The new status for the repository.
    """
    self._repository.set_status(status)

set_status_async `async` #

set_status_async(status)

Set the repository status (async version).

Parameters:

Name	Type	Description	Default
`status`	`RepoStatus`	The new status for the repository.	required

Source code in icechunk-python/python/icechunk/repository.py

async def set_status_async(self, status: RepoStatus) -> None:
    """
    Set the repository status (async version).

    Parameters
    ----------
    status : RepoStatus
        The new status for the repository.
    """
    await self._repository.set_status_async(status)

total_chunks_storage #

total_chunks_storage(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

def total_chunks_storage(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

total_chunks_storage_async `async` #

total_chunks_storage_async(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

async def total_chunks_storage_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

transaction #

transaction(
    branch,
    *,
    message,
    metadata=None,
    rebase_with=None,
    rebase_tries=1000,
)

Create a transaction on a branch.

This is a context manager that creates a writable session on the specified branch. When the context is exited, the session will be committed to the branch using the specified message.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the transaction on.	required
`message`	`str`	The commit message to use when committing the session.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`rebase_with`	`ConflictSolver \| None`	If other session committed while the current session was writing, use Session.rebase with this solver.	`None`
`rebase_tries`	`int`	If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.	`1000`

Yields:

Name	Type	Description
`store`	`IcechunkStore`	A Zarr Store which can be used to interact with the data in the repository.

Source code in icechunk-python/python/icechunk/repository.py

@contextmanager
def transaction(
    self,
    branch: str,
    *,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
) -> Iterator[IcechunkStore]:
    """
    Create a transaction on a branch.

    This is a context manager that creates a writable session on the specified branch.
    When the context is exited, the session will be committed to the branch
    using the specified message.

    Parameters
    ----------
    branch : str
        The branch to create the transaction on.
    message : str
        The commit message to use when committing the session.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

    Yields
    -------
    store : IcechunkStore
        A Zarr Store which can be used to interact with the data in the repository.
    """
    session = self.writable_session(branch)
    yield session.store
    session.commit(
        message=message,
        metadata=metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
    )

update_metadata #

update_metadata(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The dict to merge into the repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return self._repository.update_metadata(metadata)

update_metadata_async `async` #

update_metadata_async(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The dict to merge into the repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return await self._repository.update_metadata_async(metadata)

writable_session #

writable_session(branch)

Create a writable session on a branch.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

def writable_session(self, branch: str) -> Session:
    """
    Create a writable session on a branch.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.writable_session(branch))

writable_session_async `async` #

writable_session_async(branch)

Create a writable session on a branch (async version).

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def writable_session_async(self, branch: str) -> Session:
    """
    Create a writable session on a branch (async version).

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.writable_session_async(branch))

`icechunk.IcechunkStore`#

icechunk.store.IcechunkStore #

Bases: Store, SyncMixin

Methods:

Name	Description
`__init__`	Create a new IcechunkStore.
`array_chunk_iterator`	Async iterator yielding columnar batches of chunk references for one array.
`clear`	Clear the store.
`delete`	Remove a key from the store
`delete_dir`	Delete a prefix
`exists`	Check if a key exists in the store.
`get`	Retrieve the value associated with a given key.
`get_partial_values`	Retrieve possibly partial values from given key_ranges.
`is_empty`	Check if the directory is empty.
`list`	Retrieve all keys in the store.
`list_dir`	Retrieve all keys and prefixes with a given prefix and which do not contain the character
`list_prefix`	Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
`set`	Store a (key, value) pair.
`set_if_not_exists`	Store a key to `value` if the key is not already present.
`set_partial_values`	Store values at a given key, starting at byte range_start.
`set_virtual_ref`	Store a virtual reference to a chunk.
`set_virtual_ref_async`	Store a virtual reference to a chunk asynchronously.
`set_virtual_refs`	Store multiple virtual references for the same array.
`set_virtual_refs_arr`	Store virtual references for an array from flat arrays of locations, offsets, and lengths.
`set_virtual_refs_arr_async`	Store virtual references for an array from flat arrays (async).
`set_virtual_refs_async`	Store multiple virtual references for the same array asynchronously.
`sync_clear`	Clear the store.

Attributes:

Name	Type	Description
`supports_listing`	`bool`	Does the store support listing?
`supports_partial_writes`	`Literal[False]`	Does the store support partial writes?
`supports_writes`	`bool`	Does the store support writes?

Source code in icechunk-python/python/icechunk/store.py

class IcechunkStore(Store, SyncMixin):
    _store: PyStore

    def __init__(
        self,
        store: PyStore,
        read_only: bool | None = None,
        *args: Any,
        **kwargs: Any,
    ):
        """Create a new IcechunkStore.

        This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
        """
        read_only = read_only if read_only is not None else store.read_only
        super().__init__(read_only=read_only)
        if store is None:
            raise ValueError(
                "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
            )
        self._store = store
        self._is_open = True

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, IcechunkStore):
            return False
        return self._store == value._store

    def __repr__(self) -> str:
        return repr(self._store)

    def __str__(self) -> str:
        return str(self._store)

    def _repr_html_(self) -> str:
        return self._store._repr_html_()

    def __getstate__(self) -> object:
        # for read_only sessions we allow pickling, this allows distributed reads without forking
        session = self._store.session
        if not session.read_only and not session.is_fork:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                "See https://icechunk.io/en/stable/icechunk-python/parallel/#cooperative-distributed-writes. "
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
            )
        d = self.__dict__.copy()
        # we serialize the Rust store as bytes
        d["_store"] = self._store.as_bytes()
        return d

    def __setstate__(self, state: Any) -> None:
        # we have to deserialize the bytes of the Rust store
        store_repr = state["_store"]
        state["_store"] = PyStore.from_bytes(store_repr)
        self.__dict__ = state

    def with_read_only(self, read_only: bool = False) -> Store:
        new_store = IcechunkStore(store=self._store, read_only=read_only)
        new_store._is_open = False
        return new_store

    @property
    def session(self) -> "Session":
        from icechunk.session import ForkSession, Session

        if self._store.session.is_fork:
            return ForkSession(self._store.session)
        else:
            return Session(self._store.session)

    async def clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return await self._store.clear()

    def sync_clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return self._store.sync_clear()

    async def is_empty(self, prefix: str) -> bool:
        """
        Check if the directory is empty.

        Parameters
        ----------
        prefix : str
            Prefix of keys to check.

        Returns
        -------
        bool
            True if the store is empty, False otherwise.
        """
        return await self._store.is_empty(prefix)

    async def get(
        self,
        key: str,
        prototype: BufferPrototype,
        byte_range: ByteRequest | None = None,
    ) -> Buffer | None:
        """Retrieve the value associated with a given key.

        Parameters
        ----------
        key : str
        byte_range : ByteRequest, optional

            ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

            - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
            - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
            - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

        Returns
        -------
        Buffer
        """

        try:
            result = await self._store.get(key, _byte_request_to_tuple(byte_range))
        except KeyError as _e:
            # Zarr python expects None to be returned if the key does not exist
            # but an IcechunkStore returns an error if the key does not exist
            return None

        return prototype.buffer.from_bytes(result)

    async def get_partial_values(
        self,
        prototype: BufferPrototype,
        key_ranges: Iterable[tuple[str, ByteRequest | None]],
    ) -> list[Buffer | None]:
        """Retrieve possibly partial values from given key_ranges.

        Parameters
        ----------
        key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
            Ordered set of key, range pairs, a key may occur multiple times with different ranges

        Returns
        -------
        list of values, in the order of the key_ranges, may contain null/none for missing keys
        """
        # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
        result = await self._store.get_partial_values(list(ranges))
        return [prototype.buffer.from_bytes(r) for r in result]

    async def exists(self, key: str) -> bool:
        """Check if a key exists in the store.

        Parameters
        ----------
        key : str

        Returns
        -------
        bool
        """
        return await self._store.exists(key)

    @property
    def supports_writes(self) -> bool:
        """Does the store support writes?"""
        return self._store.supports_writes

    async def set(self, key: str, value: Buffer) -> None:
        """Store a (key, value) pair.

        Parameters
        ----------
        key : str
        value : Buffer
        """
        if not isinstance(value, Buffer):
            raise TypeError(
                f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
            )
        return await self._store.set(key, value.to_bytes())

    async def set_if_not_exists(self, key: str, value: Buffer) -> None:
        """
        Store a key to ``value`` if the key is not already present.

        Parameters
        -----------
        key : str
        value : Buffer
        """
        return await self._store.set_if_not_exists(key, value.to_bytes())

    def set_virtual_ref(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'.
            The object key is the URL path, used verbatim (``//`` and ``.``/``..`` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular ``?`` -> ``%3F``, ``#`` -> ``%23`` and ``%`` -> ``%25``.
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return self._store.set_virtual_ref(
            key, location, offset, length, checksum, validate_container
        )

    def array_chunk_iterator(
        self,
        array_path: str,
        batch_size: int = 100_000,
    ) -> AsyncCloseableIterator[
        tuple[
            "np.ndarray[tuple[int, int], np.dtype[np.uint32]]",  # coords (n, ndim)
            "np.ndarray[tuple[int], np.dtype[np.uint8]]",  # kinds (n,)
            list[str],  # paths
            "np.ndarray[tuple[int], np.dtype[np.uint64]]",  # offsets (n,)
            "np.ndarray[tuple[int], np.dtype[np.uint64]]",  # lengths (n,)
            dict[int, bytes],  # inlined
        ]
    ]:
        """Async iterator yielding columnar batches of chunk references for one array.

        Each batch is a 6-tuple; row ``i`` across the columns describes one
        chunk (columns are aligned in lock-step)::

            coords:   np.ndarray[uint32, (n, ndim)]  chunk grid coordinates
            kinds:    np.ndarray[uint8]              values of icechunk.ChunkType
                                                      (native=1, virtual=2, inline=3)
            paths:    list[str]                      URL (virtual) | chunk_id (native) | "" (inline)
            offsets:  np.ndarray[uint64]             byte offset within the source
                                                      URL (virtual) or chunk file
                                                      (native); 0 for inline
            lengths:  np.ndarray[uint64]             byte length; equals
                                                      ``len(bytes)`` for inline
            inlined:  dict[int, bytes]               *Inline rows only*. Keyed by
                                                      the row index ``i`` in this
                                                      batch. Rows whose kind is
                                                      virtual, native, or missing
                                                      are NOT present in this dict.

        Native and virtual rows share the same ``(path, offset, length)`` shape;
        the only structural difference is that ``paths[i]`` holds a bare
        ``chunk_id`` for native rows vs a fully-resolved URL for virtual rows.
        Consumers can therefore treat both uniformly as virtual references
        after prepending a URL prefix to the chunk_id.

        Virtual locations are passed through the session's resolver before
        yielding — relative ``vcc://name/path`` forms expand to absolute URLs;
        absolute URLs pass through unchanged. Missing chunks are not yielded.

        Parameters
        ----------
        array_path : str
            Zarr path to the array (e.g. ``"a"`` or ``"/group/var"``).
        batch_size : int
            Maximum number of rows per batch.
        """
        return self._store.array_chunk_iterator(array_path, batch_size)

    async def set_virtual_ref_async(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk asynchronously.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'.
            The object key is the URL path, used verbatim (``//`` and ``.``/``..`` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular ``?`` -> ``%3F``, ``#`` -> ``%23`` and ``%`` -> ``%25``.
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return await self._store.set_virtual_ref_async(
            key, location, offset, length, checksum, validate_container
        )

    def set_virtual_refs(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return self._store.set_virtual_refs(array_path, chunks, validate_containers)

    async def set_virtual_refs_async(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array asynchronously.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return await self._store.set_virtual_refs_async(
            array_path, chunks, validate_containers
        )

    def set_virtual_refs_arr(
        self,
        array_path: str,
        chunk_grid_shape: tuple[int, ...],
        locations: list[str],
        offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
        lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
        *,
        validate_containers: bool = True,
        arr_offset: tuple[int, ...] | None = None,
        checksum: datetime | str | None = None,
    ) -> list[tuple[int, ...]] | None:
        """Store virtual references for an array from flat arrays of locations, offsets, and lengths.

        More efficient than ``set_virtual_refs`` as it avoids creating
        per-chunk ``VirtualChunkSpec`` Python objects. The locations list
        is iterated in Rust (borrowing strings directly from CPython),
        and the offset/length numpy arrays are accessed via zero-copy.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store.
            Example: "/groupA/groupB/outputs/my-array"
        chunk_grid_shape : tuple[int, ...]
            Shape of the chunk grid (number of chunks per dimension).
            The product must equal the length of the arrays.
            Arrays are assumed to be flattened in C (row-major) order.
        locations : list[str]
            URLs to external files containing chunk data. Empty strings
            represent missing chunks and are silently skipped.
            Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
            Each is a URL; the object key is its path, used verbatim (``//`` and
            ``.``/``..`` are preserved). Characters reserved in a URL that are part
            of the key must be percent-encoded (``?`` -> ``%3F``, ``#`` -> ``%23``,
            ``%`` -> ``%25``).
        offsets : np.ndarray
            1-D uint64 array of byte offsets within each file.
        lengths : np.ndarray
            1-D uint64 array of byte lengths of each chunk.
        validate_containers : bool
            If True, validate that locations match registered virtual
            chunk containers. Default is True.
        arr_offset : tuple[int, ...] | None
            Optional offset to add to computed chunk indices. Useful for
            append operations where new chunks should be written at an
            offset from (0, 0, ...). Must have the same length as
            chunk_grid_shape. Default is None.
        checksum : datetime | str | None
            Optional checksum for all chunks. Can be a datetime
            (last modified time) or a string (ETag). Default is None.

        Returns
        -------
        list[tuple[int, ...]] | None
            If all virtual references were successfully updated, returns None.
            If there were validation errors, returns the chunk indices of
            all failed references.
        """
        return self._store.set_virtual_refs_arr(
            array_path,
            list(chunk_grid_shape),
            locations,
            offsets,
            lengths,
            validate_containers=validate_containers,
            arr_offset=list(arr_offset) if arr_offset is not None else None,
            checksum=checksum,
        )

    async def set_virtual_refs_arr_async(
        self,
        array_path: str,
        chunk_grid_shape: tuple[int, ...],
        locations: list[str],
        offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
        lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
        *,
        validate_containers: bool = True,
        arr_offset: tuple[int, ...] | None = None,
        checksum: datetime | str | None = None,
    ) -> list[tuple[int, ...]] | None:
        """Store virtual references for an array from flat arrays (async).

        Async variant of ``set_virtual_refs_arr``. The vref construction
        still requires the GIL (to borrow strings from the Python list),
        but the store insertion releases it. Use ``asyncio.gather()`` to
        overlap vref building for one array with store insertion for another.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store.
            Example: "/groupA/groupB/outputs/my-array"
        chunk_grid_shape : tuple[int, ...]
            Shape of the chunk grid (number of chunks per dimension).
            The product must equal the length of the arrays.
            Arrays are assumed to be flattened in C (row-major) order.
        locations : list[str]
            URLs to external files containing chunk data. Empty strings
            represent missing chunks and are silently skipped.
            Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
            Each is a URL; the object key is its path, used verbatim (``//`` and
            ``.``/``..`` are preserved). Characters reserved in a URL that are part
            of the key must be percent-encoded (``?`` -> ``%3F``, ``#`` -> ``%23``,
            ``%`` -> ``%25``).
        offsets : np.ndarray
            1-D uint64 array of byte offsets within each file.
        lengths : np.ndarray
            1-D uint64 array of byte lengths of each chunk.
        validate_containers : bool
            If True, validate that locations match registered virtual
            chunk containers. Default is True.
        arr_offset : tuple[int, ...] | None
            Optional offset to add to computed chunk indices. Useful for
            append operations where new chunks should be written at an
            offset from (0, 0, ...). Must have the same length as
            chunk_grid_shape. Default is None.
        checksum : datetime | str | None
            Optional checksum for all chunks. Can be a datetime
            (last modified time) or a string (ETag). Default is None.

        Returns
        -------
        list[tuple[int, ...]] | None
            If all virtual references were successfully updated, returns None.
            If there were validation errors, returns the chunk indices of
            all failed references.
        """
        return await self._store.set_virtual_refs_arr_async(
            array_path,
            list(chunk_grid_shape),
            locations,
            offsets,
            lengths,
            validate_containers=validate_containers,
            arr_offset=list(arr_offset) if arr_offset is not None else None,
            checksum=checksum,
        )

    async def delete(self, key: str) -> None:
        """Remove a key from the store

        Parameters
        ----------
        key : str
        """
        return await self._store.delete(key)

    async def delete_dir(self, prefix: str) -> None:
        """Delete a prefix

        Parameters
        ----------
        prefix : str
        """
        return await self._store.delete_dir(prefix)

    @property
    def supports_partial_writes(self) -> Literal[False]:
        """Does the store support partial writes?

        Partial writes are no longer used by Zarr, so this is always false.
        """
        return self._store.supports_partial_writes  # type: ignore[return-value]

    async def set_partial_values(
        self, key_start_values: Iterable[tuple[str, int, BytesLike]]
    ) -> None:
        """Store values at a given key, starting at byte range_start.

        Parameters
        ----------
        key_start_values : list[tuple[str, int, BytesLike]]
            set of key, range_start, values triples, a key may occur multiple times with different
            range_starts, range_starts (considering the length of the respective values) must not
            specify overlapping ranges for the same key
        """
        # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        # NOTE: currently we only implement the case where the values are bytes
        return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

    @property
    def supports_listing(self) -> bool:
        """Does the store support listing?"""
        return self._store.supports_listing

    @property
    def supports_consolidated_metadata(self) -> bool:
        return self._store.supports_consolidated_metadata

    @property
    def supports_deletes(self) -> bool:
        return self._store.supports_deletes

    def list(self) -> AsyncCloseableIterator[str]:
        """Retrieve all keys in the store.

        Returns
        -------
        AsyncCloseableIterator[str]
        """
        # This method should be async, like overridden methods in child classes.
        # However, that's not straightforward:
        # https://stackoverflow.com/questions/68905848

        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list()

    def list_prefix(self, prefix: str) -> AsyncCloseableIterator[str]:
        """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
        to the root of the store.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncCloseableIterator[str]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_prefix(prefix)

    def list_dir(self, prefix: str) -> AsyncCloseableIterator[str]:
        """
        Retrieve all keys and prefixes with a given prefix and which do not contain the character
        “/” after the given prefix.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncCloseableIterator[str]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_dir(prefix)

    async def getsize(self, key: str) -> int:
        return await self._store.getsize(key)

    async def getsize_prefix(self, prefix: str) -> int:
        return await self._store.getsize_prefix(prefix)

supports_listing `property` #

supports_listing

Does the store support listing?

supports_partial_writes `property` #

supports_partial_writes

Does the store support partial writes?

Partial writes are no longer used by Zarr, so this is always false.

supports_writes `property` #

supports_writes

Does the store support writes?

init #

__init__(store, read_only=None, *args, **kwargs)

Create a new IcechunkStore.

This should not be called directly, instead use the create, open_existing or open_or_create class methods.

Source code in icechunk-python/python/icechunk/store.py

def __init__(
    self,
    store: PyStore,
    read_only: bool | None = None,
    *args: Any,
    **kwargs: Any,
):
    """Create a new IcechunkStore.

    This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
    """
    read_only = read_only if read_only is not None else store.read_only
    super().__init__(read_only=read_only)
    if store is None:
        raise ValueError(
            "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
        )
    self._store = store
    self._is_open = True

array_chunk_iterator #

array_chunk_iterator(array_path, batch_size=100000)

Async iterator yielding columnar batches of chunk references for one array.

Each batch is a 6-tuple; row i across the columns describes one chunk (columns are aligned in lock-step)::

coords:   np.ndarray[uint32, (n, ndim)]  chunk grid coordinates
kinds:    np.ndarray[uint8]              values of icechunk.ChunkType
                                          (native=1, virtual=2, inline=3)
paths:    list[str]                      URL (virtual) | chunk_id (native) | "" (inline)
offsets:  np.ndarray[uint64]             byte offset within the source
                                          URL (virtual) or chunk file
                                          (native); 0 for inline
lengths:  np.ndarray[uint64]             byte length; equals
                                          ``len(bytes)`` for inline
inlined:  dict[int, bytes]               *Inline rows only*. Keyed by
                                          the row index ``i`` in this
                                          batch. Rows whose kind is
                                          virtual, native, or missing
                                          are NOT present in this dict.

Native and virtual rows share the same (path, offset, length) shape; the only structural difference is that paths[i] holds a bare chunk_id for native rows vs a fully-resolved URL for virtual rows. Consumers can therefore treat both uniformly as virtual references after prepending a URL prefix to the chunk_id.

Virtual locations are passed through the session's resolver before yielding — relative vcc://name/path forms expand to absolute URLs; absolute URLs pass through unchanged. Missing chunks are not yielded.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	Zarr path to the array (e.g. `"a"` or `"/group/var"`).	required
`batch_size`	`int`	Maximum number of rows per batch.	`100000`

Source code in icechunk-python/python/icechunk/store.py

def array_chunk_iterator(
    self,
    array_path: str,
    batch_size: int = 100_000,
) -> AsyncCloseableIterator[
    tuple[
        "np.ndarray[tuple[int, int], np.dtype[np.uint32]]",  # coords (n, ndim)
        "np.ndarray[tuple[int], np.dtype[np.uint8]]",  # kinds (n,)
        list[str],  # paths
        "np.ndarray[tuple[int], np.dtype[np.uint64]]",  # offsets (n,)
        "np.ndarray[tuple[int], np.dtype[np.uint64]]",  # lengths (n,)
        dict[int, bytes],  # inlined
    ]
]:
    """Async iterator yielding columnar batches of chunk references for one array.

    Each batch is a 6-tuple; row ``i`` across the columns describes one
    chunk (columns are aligned in lock-step)::

        coords:   np.ndarray[uint32, (n, ndim)]  chunk grid coordinates
        kinds:    np.ndarray[uint8]              values of icechunk.ChunkType
                                                  (native=1, virtual=2, inline=3)
        paths:    list[str]                      URL (virtual) | chunk_id (native) | "" (inline)
        offsets:  np.ndarray[uint64]             byte offset within the source
                                                  URL (virtual) or chunk file
                                                  (native); 0 for inline
        lengths:  np.ndarray[uint64]             byte length; equals
                                                  ``len(bytes)`` for inline
        inlined:  dict[int, bytes]               *Inline rows only*. Keyed by
                                                  the row index ``i`` in this
                                                  batch. Rows whose kind is
                                                  virtual, native, or missing
                                                  are NOT present in this dict.

    Native and virtual rows share the same ``(path, offset, length)`` shape;
    the only structural difference is that ``paths[i]`` holds a bare
    ``chunk_id`` for native rows vs a fully-resolved URL for virtual rows.
    Consumers can therefore treat both uniformly as virtual references
    after prepending a URL prefix to the chunk_id.

    Virtual locations are passed through the session's resolver before
    yielding — relative ``vcc://name/path`` forms expand to absolute URLs;
    absolute URLs pass through unchanged. Missing chunks are not yielded.

    Parameters
    ----------
    array_path : str
        Zarr path to the array (e.g. ``"a"`` or ``"/group/var"``).
    batch_size : int
        Maximum number of rows per batch.
    """
    return self._store.array_chunk_iterator(array_path, batch_size)

clear `async` #

clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py

async def clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return await self._store.clear()

delete `async` #

delete(key)

Remove a key from the store

Parameters:

Name	Type	Description	Default
`key`	`str`		required

Source code in icechunk-python/python/icechunk/store.py

async def delete(self, key: str) -> None:
    """Remove a key from the store

    Parameters
    ----------
    key : str
    """
    return await self._store.delete(key)

delete_dir `async` #

delete_dir(prefix)

Delete a prefix

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Source code in icechunk-python/python/icechunk/store.py

async def delete_dir(self, prefix: str) -> None:
    """Delete a prefix

    Parameters
    ----------
    prefix : str
    """
    return await self._store.delete_dir(prefix)

exists `async` #

exists(key)

Check if a key exists in the store.

Parameters:

Name	Type	Description	Default
`key`	`str`		required

Returns:

Type	Description
`bool`

Source code in icechunk-python/python/icechunk/store.py

async def exists(self, key: str) -> bool:
    """Check if a key exists in the store.

    Parameters
    ----------
    key : str

    Returns
    -------
    bool
    """
    return await self._store.exists(key)

get `async` #

get(key, prototype, byte_range=None)

Retrieve the value associated with a given key.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`byte_range`	`ByteRequest`	ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved. RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned. OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header. SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.	`None`

Returns:

Type	Description
`Buffer`

Source code in icechunk-python/python/icechunk/store.py

async def get(
    self,
    key: str,
    prototype: BufferPrototype,
    byte_range: ByteRequest | None = None,
) -> Buffer | None:
    """Retrieve the value associated with a given key.

    Parameters
    ----------
    key : str
    byte_range : ByteRequest, optional

        ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

        - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
        - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
        - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

    Returns
    -------
    Buffer
    """

    try:
        result = await self._store.get(key, _byte_request_to_tuple(byte_range))
    except KeyError as _e:
        # Zarr python expects None to be returned if the key does not exist
        # but an IcechunkStore returns an error if the key does not exist
        return None

    return prototype.buffer.from_bytes(result)

get_partial_values `async` #

get_partial_values(prototype, key_ranges)

Retrieve possibly partial values from given key_ranges.

Parameters:

Name	Type	Description	Default
`key_ranges`	`Iterable[tuple[str, tuple[int \| None, int \| None]]]`	Ordered set of key, range pairs, a key may occur multiple times with different ranges	required

Returns:

Type	Description
`list of values, in the order of the key_ranges, may contain null/none for missing keys`

Source code in icechunk-python/python/icechunk/store.py

async def get_partial_values(
    self,
    prototype: BufferPrototype,
    key_ranges: Iterable[tuple[str, ByteRequest | None]],
) -> list[Buffer | None]:
    """Retrieve possibly partial values from given key_ranges.

    Parameters
    ----------
    key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
        Ordered set of key, range pairs, a key may occur multiple times with different ranges

    Returns
    -------
    list of values, in the order of the key_ranges, may contain null/none for missing keys
    """
    # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
    result = await self._store.get_partial_values(list(ranges))
    return [prototype.buffer.from_bytes(r) for r in result]

is_empty `async` #

is_empty(prefix)

Check if the directory is empty.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	Prefix of keys to check.	required

Returns:

Type	Description
`bool`	True if the store is empty, False otherwise.

Source code in icechunk-python/python/icechunk/store.py

async def is_empty(self, prefix: str) -> bool:
    """
    Check if the directory is empty.

    Parameters
    ----------
    prefix : str
        Prefix of keys to check.

    Returns
    -------
    bool
        True if the store is empty, False otherwise.
    """
    return await self._store.is_empty(prefix)

list #

list()

Retrieve all keys in the store.

Returns:

Type	Description
`AsyncCloseableIterator[str]`

Source code in icechunk-python/python/icechunk/store.py

def list(self) -> AsyncCloseableIterator[str]:
    """Retrieve all keys in the store.

    Returns
    -------
    AsyncCloseableIterator[str]
    """
    # This method should be async, like overridden methods in child classes.
    # However, that's not straightforward:
    # https://stackoverflow.com/questions/68905848

    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list()

list_dir #

list_dir(prefix)

Retrieve all keys and prefixes with a given prefix and which do not contain the character “/” after the given prefix.

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Returns:

Type	Description
`AsyncCloseableIterator[str]`

Source code in icechunk-python/python/icechunk/store.py

def list_dir(self, prefix: str) -> AsyncCloseableIterator[str]:
    """
    Retrieve all keys and prefixes with a given prefix and which do not contain the character
    “/” after the given prefix.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncCloseableIterator[str]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_dir(prefix)

list_prefix #

list_prefix(prefix)

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative to the root of the store.

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Returns:

Type	Description
`AsyncCloseableIterator[str]`

Source code in icechunk-python/python/icechunk/store.py

def list_prefix(self, prefix: str) -> AsyncCloseableIterator[str]:
    """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
    to the root of the store.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncCloseableIterator[str]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_prefix(prefix)

set `async` #

set(key, value)

Store a (key, value) pair.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`value`	`Buffer`		required

Source code in icechunk-python/python/icechunk/store.py

async def set(self, key: str, value: Buffer) -> None:
    """Store a (key, value) pair.

    Parameters
    ----------
    key : str
    value : Buffer
    """
    if not isinstance(value, Buffer):
        raise TypeError(
            f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
        )
    return await self._store.set(key, value.to_bytes())

set_if_not_exists `async` #

set_if_not_exists(key, value)

Store a key to value if the key is not already present.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`value`	`Buffer`		required

Source code in icechunk-python/python/icechunk/store.py

async def set_if_not_exists(self, key: str, value: Buffer) -> None:
    """
    Store a key to ``value`` if the key is not already present.

    Parameters
    -----------
    key : str
    value : Buffer
    """
    return await self._store.set_if_not_exists(key, value.to_bytes())

set_partial_values `async` #

set_partial_values(key_start_values)

Store values at a given key, starting at byte range_start.

Parameters:

Name	Type	Description	Default
`key_start_values`	`list[tuple[str, int, BytesLike]]`	set of key, range_start, values triples, a key may occur multiple times with different range_starts, range_starts (considering the length of the respective values) must not specify overlapping ranges for the same key	required

Source code in icechunk-python/python/icechunk/store.py

async def set_partial_values(
    self, key_start_values: Iterable[tuple[str, int, BytesLike]]
) -> None:
    """Store values at a given key, starting at byte range_start.

    Parameters
    ----------
    key_start_values : list[tuple[str, int, BytesLike]]
        set of key, range_start, values triples, a key may occur multiple times with different
        range_starts, range_starts (considering the length of the respective values) must not
        specify overlapping ranges for the same key
    """
    # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    # NOTE: currently we only implement the case where the values are bytes
    return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

set_virtual_ref #

set_virtual_ref(
    key,
    location,
    *,
    offset,
    length,
    checksum=None,
    validate_container=True,
)

Store a virtual reference to a chunk.

Parameters:

Name	Type	Description	Default
`key`	`str`	The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'	required
`location`	`str`	The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'. The object key is the URL path, used verbatim (`//` and `.`/`..` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular `?` -> `%3F`, `#` -> `%23` and `%` -> `%25`.	required
`offset`	`int`	The offset in bytes from the start of the file location in storage the chunk starts at	required
`length`	`int`	The length of the chunk in bytes, measured from the given offset	required
`checksum`	`str \| datetime \| None`	The etag or last_medified_at field of the object	`None`
`validate_container`	`bool`	If set to true, fail for locations that don't match any existing virtual chunk container	`True`

Source code in icechunk-python/python/icechunk/store.py

def set_virtual_ref(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'.
        The object key is the URL path, used verbatim (``//`` and ``.``/``..`` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular ``?`` -> ``%3F``, ``#`` -> ``%23`` and ``%`` -> ``%25``.
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return self._store.set_virtual_ref(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_ref_async `async` #

set_virtual_ref_async(
    key,
    location,
    *,
    offset,
    length,
    checksum=None,
    validate_container=True,
)

Store a virtual reference to a chunk asynchronously.

Parameters:

Name	Type	Description	Default
`key`	`str`	The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'	required
`location`	`str`	The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'. The object key is the URL path, used verbatim (`//` and `.`/`..` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular `?` -> `%3F`, `#` -> `%23` and `%` -> `%25`.	required
`offset`	`int`	The offset in bytes from the start of the file location in storage the chunk starts at	required
`length`	`int`	The length of the chunk in bytes, measured from the given offset	required
`checksum`	`str \| datetime \| None`	The etag or last_medified_at field of the object	`None`
`validate_container`	`bool`	If set to true, fail for locations that don't match any existing virtual chunk container	`True`

Source code in icechunk-python/python/icechunk/store.py

async def set_virtual_ref_async(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk asynchronously.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage, as a URL. This is the absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'.
        The object key is the URL path, used verbatim (``//`` and ``.``/``..`` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded, in particular ``?`` -> ``%3F``, ``#`` -> ``%23`` and ``%`` -> ``%25``.
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return await self._store.set_virtual_ref_async(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_refs #

set_virtual_refs(
    array_path, chunks, *, validate_containers=True
)

Store multiple virtual references for the same array.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunks`	`(list[VirtualChunkSpec],)`	The list of virtual chunks to add	required
`validate_containers`	`bool`	If set to true, ignore virtual references for locations that don't match any existing virtual chunk container	`True`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

def set_virtual_refs(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return self._store.set_virtual_refs(array_path, chunks, validate_containers)

set_virtual_refs_arr #

set_virtual_refs_arr(
    array_path,
    chunk_grid_shape,
    locations,
    offsets,
    lengths,
    *,
    validate_containers=True,
    arr_offset=None,
    checksum=None,
)

Store virtual references for an array from flat arrays of locations, offsets, and lengths.

More efficient than set_virtual_refs as it avoids creating per-chunk VirtualChunkSpec Python objects. The locations list is iterated in Rust (borrowing strings directly from CPython), and the offset/length numpy arrays are accessed via zero-copy.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunk_grid_shape`	`tuple[int, ...]`	Shape of the chunk grid (number of chunks per dimension). The product must equal the length of the arrays. Arrays are assumed to be flattened in C (row-major) order.	required
`locations`	`list[str]`	URLs to external files containing chunk data. Empty strings represent missing chunks and are silently skipped. Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"] Each is a URL; the object key is its path, used verbatim (`//` and `.`/`..` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded (`?` -> `%3F`, `#` -> `%23`, `%` -> `%25`).	required
`offsets`	`ndarray`	1-D uint64 array of byte offsets within each file.	required
`lengths`	`ndarray`	1-D uint64 array of byte lengths of each chunk.	required
`validate_containers`	`bool`	If True, validate that locations match registered virtual chunk containers. Default is True.	`True`
`arr_offset`	`tuple[int, ...] \| None`	Optional offset to add to computed chunk indices. Useful for append operations where new chunks should be written at an offset from (0, 0, ...). Must have the same length as chunk_grid_shape. Default is None.	`None`
`checksum`	`datetime \| str \| None`	Optional checksum for all chunks. Can be a datetime (last modified time) or a string (ETag). Default is None.	`None`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references were successfully updated, returns None. If there were validation errors, returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

def set_virtual_refs_arr(
    self,
    array_path: str,
    chunk_grid_shape: tuple[int, ...],
    locations: list[str],
    offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
    lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
    *,
    validate_containers: bool = True,
    arr_offset: tuple[int, ...] | None = None,
    checksum: datetime | str | None = None,
) -> list[tuple[int, ...]] | None:
    """Store virtual references for an array from flat arrays of locations, offsets, and lengths.

    More efficient than ``set_virtual_refs`` as it avoids creating
    per-chunk ``VirtualChunkSpec`` Python objects. The locations list
    is iterated in Rust (borrowing strings directly from CPython),
    and the offset/length numpy arrays are accessed via zero-copy.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store.
        Example: "/groupA/groupB/outputs/my-array"
    chunk_grid_shape : tuple[int, ...]
        Shape of the chunk grid (number of chunks per dimension).
        The product must equal the length of the arrays.
        Arrays are assumed to be flattened in C (row-major) order.
    locations : list[str]
        URLs to external files containing chunk data. Empty strings
        represent missing chunks and are silently skipped.
        Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
        Each is a URL; the object key is its path, used verbatim (``//`` and
        ``.``/``..`` are preserved). Characters reserved in a URL that are part
        of the key must be percent-encoded (``?`` -> ``%3F``, ``#`` -> ``%23``,
        ``%`` -> ``%25``).
    offsets : np.ndarray
        1-D uint64 array of byte offsets within each file.
    lengths : np.ndarray
        1-D uint64 array of byte lengths of each chunk.
    validate_containers : bool
        If True, validate that locations match registered virtual
        chunk containers. Default is True.
    arr_offset : tuple[int, ...] | None
        Optional offset to add to computed chunk indices. Useful for
        append operations where new chunks should be written at an
        offset from (0, 0, ...). Must have the same length as
        chunk_grid_shape. Default is None.
    checksum : datetime | str | None
        Optional checksum for all chunks. Can be a datetime
        (last modified time) or a string (ETag). Default is None.

    Returns
    -------
    list[tuple[int, ...]] | None
        If all virtual references were successfully updated, returns None.
        If there were validation errors, returns the chunk indices of
        all failed references.
    """
    return self._store.set_virtual_refs_arr(
        array_path,
        list(chunk_grid_shape),
        locations,
        offsets,
        lengths,
        validate_containers=validate_containers,
        arr_offset=list(arr_offset) if arr_offset is not None else None,
        checksum=checksum,
    )

set_virtual_refs_arr_async `async` #

set_virtual_refs_arr_async(
    array_path,
    chunk_grid_shape,
    locations,
    offsets,
    lengths,
    *,
    validate_containers=True,
    arr_offset=None,
    checksum=None,
)

Store virtual references for an array from flat arrays (async).

Async variant of set_virtual_refs_arr. The vref construction still requires the GIL (to borrow strings from the Python list), but the store insertion releases it. Use asyncio.gather() to overlap vref building for one array with store insertion for another.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunk_grid_shape`	`tuple[int, ...]`	Shape of the chunk grid (number of chunks per dimension). The product must equal the length of the arrays. Arrays are assumed to be flattened in C (row-major) order.	required
`locations`	`list[str]`	URLs to external files containing chunk data. Empty strings represent missing chunks and are silently skipped. Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"] Each is a URL; the object key is its path, used verbatim (`//` and `.`/`..` are preserved). Characters reserved in a URL that are part of the key must be percent-encoded (`?` -> `%3F`, `#` -> `%23`, `%` -> `%25`).	required
`offsets`	`ndarray`	1-D uint64 array of byte offsets within each file.	required
`lengths`	`ndarray`	1-D uint64 array of byte lengths of each chunk.	required
`validate_containers`	`bool`	If True, validate that locations match registered virtual chunk containers. Default is True.	`True`
`arr_offset`	`tuple[int, ...] \| None`	Optional offset to add to computed chunk indices. Useful for append operations where new chunks should be written at an offset from (0, 0, ...). Must have the same length as chunk_grid_shape. Default is None.	`None`
`checksum`	`datetime \| str \| None`	Optional checksum for all chunks. Can be a datetime (last modified time) or a string (ETag). Default is None.	`None`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references were successfully updated, returns None. If there were validation errors, returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

async def set_virtual_refs_arr_async(
    self,
    array_path: str,
    chunk_grid_shape: tuple[int, ...],
    locations: list[str],
    offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
    lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
    *,
    validate_containers: bool = True,
    arr_offset: tuple[int, ...] | None = None,
    checksum: datetime | str | None = None,
) -> list[tuple[int, ...]] | None:
    """Store virtual references for an array from flat arrays (async).

    Async variant of ``set_virtual_refs_arr``. The vref construction
    still requires the GIL (to borrow strings from the Python list),
    but the store insertion releases it. Use ``asyncio.gather()`` to
    overlap vref building for one array with store insertion for another.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store.
        Example: "/groupA/groupB/outputs/my-array"
    chunk_grid_shape : tuple[int, ...]
        Shape of the chunk grid (number of chunks per dimension).
        The product must equal the length of the arrays.
        Arrays are assumed to be flattened in C (row-major) order.
    locations : list[str]
        URLs to external files containing chunk data. Empty strings
        represent missing chunks and are silently skipped.
        Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
        Each is a URL; the object key is its path, used verbatim (``//`` and
        ``.``/``..`` are preserved). Characters reserved in a URL that are part
        of the key must be percent-encoded (``?`` -> ``%3F``, ``#`` -> ``%23``,
        ``%`` -> ``%25``).
    offsets : np.ndarray
        1-D uint64 array of byte offsets within each file.
    lengths : np.ndarray
        1-D uint64 array of byte lengths of each chunk.
    validate_containers : bool
        If True, validate that locations match registered virtual
        chunk containers. Default is True.
    arr_offset : tuple[int, ...] | None
        Optional offset to add to computed chunk indices. Useful for
        append operations where new chunks should be written at an
        offset from (0, 0, ...). Must have the same length as
        chunk_grid_shape. Default is None.
    checksum : datetime | str | None
        Optional checksum for all chunks. Can be a datetime
        (last modified time) or a string (ETag). Default is None.

    Returns
    -------
    list[tuple[int, ...]] | None
        If all virtual references were successfully updated, returns None.
        If there were validation errors, returns the chunk indices of
        all failed references.
    """
    return await self._store.set_virtual_refs_arr_async(
        array_path,
        list(chunk_grid_shape),
        locations,
        offsets,
        lengths,
        validate_containers=validate_containers,
        arr_offset=list(arr_offset) if arr_offset is not None else None,
        checksum=checksum,
    )

set_virtual_refs_async `async` #

set_virtual_refs_async(
    array_path, chunks, *, validate_containers=True
)

Store multiple virtual references for the same array asynchronously.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunks`	`(list[VirtualChunkSpec],)`	The list of virtual chunks to add	required
`validate_containers`	`bool`	If set to true, ignore virtual references for locations that don't match any existing virtual chunk container	`True`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

async def set_virtual_refs_async(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array asynchronously.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return await self._store.set_virtual_refs_async(
        array_path, chunks, validate_containers
    )

sync_clear #

sync_clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py

def sync_clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return self._store.sync_clear()

Exceptions#

icechunk.IcechunkError #

Bases: Exception

Base class for all Icechunk errors

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class IcechunkError(Exception):
    """Base class for all Icechunk errors"""

    @property
    def message(self) -> str: ...

icechunk.ConflictError #

Bases: Exception

An error that occurs when a conflict is detected

Methods:

Name	Description
`__new__`	Create a new ConflictError.

Attributes:

Name	Type	Description
`actual_parent`	`str`	The actual parent snapshot ID of the branch that the session attempted to commit to.
`expected_parent`	`str`	The expected parent snapshot ID.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ConflictError(Exception):
    """An error that occurs when a conflict is detected"""

    def __new__(
        cls,
        expected_parent: str | None = None,
        actual_parent: str | None = None,
    ) -> ConflictError:
        """
        Create a new ConflictError.

        Parameters
        ----------
        expected_parent: str | None
            The expected parent snapshot ID.
        actual_parent: str | None
            The actual parent snapshot ID of the branch.
        """
        ...

    @property
    def expected_parent(self) -> str:
        """The expected parent snapshot ID.

        This is the snapshot ID that the session was based on when the
        commit operation was called.
        """
        ...
    @property
    def actual_parent(self) -> str:
        """
        The actual parent snapshot ID of the branch that the session attempted to commit to.

        When the session is based on a branch, this is the snapshot ID of the branch tip. If this
        error is raised, it means the branch was modified and committed by another session after
        the session was created.
        """
        ...
    ...

actual_parent `property` #

actual_parent

The actual parent snapshot ID of the branch that the session attempted to commit to.

When the session is based on a branch, this is the snapshot ID of the branch tip. If this error is raised, it means the branch was modified and committed by another session after the session was created.

expected_parent `property` #

expected_parent

The expected parent snapshot ID.

This is the snapshot ID that the session was based on when the commit operation was called.

new #

__new__(expected_parent=None, actual_parent=None)

Create a new ConflictError.

Parameters:

Name	Type	Description	Default
`expected_parent`	`str \| None`	The expected parent snapshot ID.	`None`
`actual_parent`	`str \| None`	The actual parent snapshot ID of the branch.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __new__(
    cls,
    expected_parent: str | None = None,
    actual_parent: str | None = None,
) -> ConflictError:
    """
    Create a new ConflictError.

    Parameters
    ----------
    expected_parent: str | None
        The expected parent snapshot ID.
    actual_parent: str | None
        The actual parent snapshot ID of the branch.
    """
    ...

icechunk.RebaseFailedError #

Bases: IcechunkError

An error that occurs when a rebase operation fails

Methods:

Name	Description
`__new__`	Create a new RebaseFailedError.

Attributes:

Name	Type	Description
`conflicts`	`list[Conflict]`	The conflicts that occurred during the rebase operation
`snapshot`	`str`	The snapshot ID that the session was rebased to

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class RebaseFailedError(IcechunkError):
    """An error that occurs when a rebase operation fails"""

    def __new__(cls, snapshot: str, conflicts: list[Conflict]) -> RebaseFailedError:
        """
        Create a new RebaseFailedError.

        Parameters
        ----------
        snapshot: str
            The snapshot ID that the session was rebased to.
        conflicts: list[Conflict]
            The conflicts that occurred during the rebase operation.
        """
        ...

    @property
    def snapshot(self) -> str:
        """The snapshot ID that the session was rebased to"""
        ...

    @property
    def conflicts(self) -> list[Conflict]:
        """The conflicts that occurred during the rebase operation

        Returns:
            list[Conflict]: The conflicts that occurred during the rebase operation
        """
    ...

conflicts `property` #

conflicts

The conflicts that occurred during the rebase operation

Returns: list[Conflict]: The conflicts that occurred during the rebase operation

snapshot `property` #

snapshot

The snapshot ID that the session was rebased to

new #

__new__(snapshot, conflicts)

Create a new RebaseFailedError.

Parameters:

Name	Type	Description	Default
`snapshot`	`str`	The snapshot ID that the session was rebased to.	required
`conflicts`	`list[Conflict]`	The conflicts that occurred during the rebase operation.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __new__(cls, snapshot: str, conflicts: list[Conflict]) -> RebaseFailedError:
    """
    Create a new RebaseFailedError.

    Parameters
    ----------
    snapshot: str
        The snapshot ID that the session was rebased to.
    conflicts: list[Conflict]
        The conflicts that occurred during the rebase operation.
    """
    ...

Top-level utilities#

icechunk.print_debug_info #

print_debug_info()

Source code in icechunk-python/python/icechunk/__init__.py

def print_debug_info() -> None:
    import platform
    from importlib import import_module

    print(f"platform:  {platform.platform()}")
    print(f"python:  {platform.python_version()}")
    print(f"icechunk:  {__version__}")
    for package in ["zarr", "numcodecs", "xarray", "virtualizarr"]:
        try:
            print(f"{package}:  {import_module(package).__version__}")
        except ModuleNotFoundError:
            continue

icechunk.upgrade_icechunk_repository #

upgrade_icechunk_repository(
    repo,
    *,
    dry_run,
    delete_unused_v1_files=True,
    prefetch_concurrency=None,
)

Migrate a repository to the latest version of Icechunk.

This is an administrative operation, and must be executed in isolation from other readers and writers. Other processes running concurrently on the same repo may see undefined behavior.

At this time, this function supports only migration from Icechunk spec version 1 to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

The operation is usually fast, but it can take several minutes if there is a very large version history (thousands of snapshots).

Returns a new Repository object. The original repo object should not be used after calling this function.

Parameters:

Name	Type	Description	Default
`repo`	`Repository`	The repository to upgrade.	required
`dry_run`	`bool`	If True, perform a dry run without actually upgrading. If False, perform the upgrade.	required
`delete_unused_v1_files`	`bool`	If True (the default), delete unused v1 files after upgrading.	`True`
`prefetch_concurrency`	`int or None`	Number of snapshots to prefetch concurrently during migration. Defaults to 64 if not specified. Lower this value for repos that cannot fit many snapshots in memory.	`None`

Returns:

Type	Description
`Repository`	A freshly opened repository with the updated spec version.

Source code in icechunk-python/python/icechunk/__init__.py

def upgrade_icechunk_repository(
    repo: Repository,
    *,
    dry_run: bool,
    delete_unused_v1_files: bool = True,
    prefetch_concurrency: int | None = None,
) -> Repository:
    """
    Migrate a repository to the latest version of Icechunk.

    This is an administrative operation, and must be executed in isolation from
    other readers and writers. Other processes running concurrently on the same
    repo may see undefined behavior.

    At this time, this function supports only migration from Icechunk spec version 1
    to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

    The operation is usually fast, but it can take several minutes if there is a very
    large version history (thousands of snapshots).

    Returns a new Repository object. The original repo object should not be used
    after calling this function.

    Parameters
    ----------
    repo : Repository
        The repository to upgrade.
    dry_run : bool
        If True, perform a dry run without actually upgrading. If False, perform
        the upgrade.
    delete_unused_v1_files : bool, optional
        If True (the default), delete unused v1 files after upgrading.
    prefetch_concurrency : int or None, optional
        Number of snapshots to prefetch concurrently during migration.
        Defaults to 64 if not specified. Lower this value for repos that
        cannot fit many snapshots in memory.

    Returns
    -------
    Repository
        A freshly opened repository with the updated spec version.
    """
    new_repo = _upgrade_icechunk_repository(
        repo._repository,
        dry_run=dry_run,
        delete_unused_v1_files=delete_unused_v1_files,
        prefetch_concurrency=prefetch_concurrency,
    )
    if not dry_run:
        repo._repository = _InvalidatedRepository()  # type: ignore[assignment]
    return Repository(new_repo)

icechunk.supported_spec_versions #

supported_spec_versions()

Source code in icechunk-python/python/icechunk/__init__.py

def supported_spec_versions() -> list[SpecVersion]:
    return [SpecVersion.v2, SpecVersion.v1]

Python API Reference#

Submodules#

Top-level API#

icechunk.Repository#

icechunk.Repository #

authorized_virtual_container_prefixes property #

config property #

metadata property #

status property #

storage property #

ancestry #

ancestry_graph #

ancestry_graph_async async #

async_ancestry #

chunk_storage_stats #

chunk_storage_stats_async async #

create classmethod #

create_async async classmethod #

create_branch #

create_branch_async async #

create_tag #

create_tag_async async #

default_commit_metadata #

delete_branch #

delete_branch_async async #

delete_tag #

delete_tag_async async #

diff #

diff_async async #

disabled_feature_flags #

disabled_feature_flags_async async #

enabled_feature_flags #

enabled_feature_flags_async async #

exists staticmethod #

exists_async async staticmethod #

expire_snapshots #

expire_snapshots_async async #

feature_flags #

feature_flags_async async #

fetch_config staticmethod #

fetch_config_async async staticmethod #

fetch_spec_version staticmethod #

fetch_spec_version_async async staticmethod #

garbage_collect #

garbage_collect_async async #

get_metadata #

get_metadata_async async #

get_status #

get_status_async async #

inspect_manifest #

inspect_manifest_async async #

inspect_repo_info #

inspect_repo_info_async async #

inspect_snapshot #

inspect_snapshot_async async #

inspect_transaction_log #

inspect_transaction_log_async async #

list_branches #

list_branches_async async #

list_manifest_files #

list_manifest_files_async async #

list_tags #

list_tags_async async #

lookup_branch #

lookup_branch_async async #

lookup_snapshot #

lookup_snapshot_async async #

lookup_tag #

lookup_tag_async async #

open classmethod #

open_async async classmethod #

open_or_create classmethod #

open_or_create_async async classmethod #

ops_log #

ops_log_async #

readonly_session #

readonly_session_async async #

rearrange_session #

rearrange_session_async async #

reopen #

`icechunk.Repository`#

authorized_virtual_container_prefixes `property` #

config `property` #

metadata `property` #

status `property` #

storage `property` #

ancestry_graph_async `async` #

chunk_storage_stats_async `async` #

create `classmethod` #

create_async `async` `classmethod` #

create_branch_async `async` #

create_tag_async `async` #

delete_branch_async `async` #

delete_tag_async `async` #

diff_async `async` #

disabled_feature_flags_async `async` #

enabled_feature_flags_async `async` #

exists `staticmethod` #

exists_async `async` `staticmethod` #

expire_snapshots_async `async` #

feature_flags_async `async` #

fetch_config `staticmethod` #

fetch_config_async `async` `staticmethod` #

fetch_spec_version `staticmethod` #

fetch_spec_version_async `async` `staticmethod` #

garbage_collect_async `async` #

get_metadata_async `async` #

get_status_async `async` #

inspect_manifest_async `async` #

inspect_repo_info_async `async` #

inspect_snapshot_async `async` #

inspect_transaction_log_async `async` #

list_branches_async `async` #

list_manifest_files_async `async` #

list_tags_async `async` #

lookup_branch_async `async` #

lookup_snapshot_async `async` #

lookup_tag_async `async` #

open `classmethod` #

open_async `async` `classmethod` #

open_or_create `classmethod` #

open_or_create_async `async` `classmethod` #

readonly_session_async `async` #

rearrange_session_async `async` #

reopen_async `async` #

reset_branch_async `async` #

rewrite_manifests_async `async` #

save_config_async `async` #

set_feature_flag_async `async` #

set_metadata_async `async` #

set_status_async `async` #

total_chunks_storage_async `async` #

update_metadata_async `async` #

writable_session_async `async` #

`icechunk.IcechunkStore`#

supports_listing `property` #

supports_partial_writes `property` #

supports_writes `property` #

init #

clear `async` #

delete `async` #

delete_dir `async` #

exists `async` #

get `async` #

get_partial_values `async` #

is_empty `async` #

set `async` #

set_if_not_exists `async` #

set_partial_values `async` #

set_virtual_ref_async `async` #

set_virtual_refs_arr_async `async` #

set_virtual_refs_async `async` #

actual_parent `property` #

expected_parent `property` #

new #

conflicts `property` #

snapshot `property` #

new #