Skip to content

Home / reference / storage

icechunk.storage#

Storage backends and configuration for S3, GCS, Azure, local filesystem, and more.

icechunk.storage #

Classes:

Name Description
ChunkType

Enum for Zarr chunk types

S3Options

Options for accessing an S3-compatible storage backend

Storage

Storage configuration for an IcechunkStore

StorageConcurrencySettings

Configuration for how Icechunk uses its Storage instance

StorageRetriesSettings

Configuration for how Icechunk retries requests.

StorageSettings

Configuration for how Icechunk uses its Storage instance

StorageTimeoutSettings

Configuration for AWS SDK timeout settings.

Functions:

Name Description
azure_storage

Create a Storage instance that saves data in Azure Blob Storage object store.

azure_store

Build an ObjectStoreConfig instance for Azure stores.

gcs_storage

Create a Storage instance that saves data in Google Cloud Storage object store.

gcs_store

Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

http_storage

Create a read-only Storage instance that reads data from an HTTP(s) server

http_store

Build an ObjectStoreConfig instance for HTTP object stores.

in_memory_storage

Create a Storage instance that saves data in memory.

local_filesystem_storage

Create a Storage instance that saves data in the local file system.

local_filesystem_store

Build an ObjectStoreConfig instance for local file stores.

r2_storage

Create a Storage instance that saves data in Tigris object store.

redirect_storage

Create a read-only Storage instance that follows HTTP redirects to resolve the underlying storage backend.

s3_storage

Create a Storage instance that saves data in S3 or S3 compatible object stores.

s3_store

Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.

tigris_storage

Create a Storage instance that saves data in Tigris object store.

ChunkType #

Bases: Enum

Enum for Zarr chunk types

Attributes:

Name Type Description
Uninitialized int

Chunk doesn't have a materialized type yet

Native int

Regular Zarr chunks

Virtual int

Chunk conforming to the VirtualiZarr spec

Inline int

Chunk is store inline in the manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ChunkType(Enum):
    """Enum for Zarr chunk types

    Attributes
    ----------
    Uninitialized: int
        Chunk doesn't have a materialized type yet
    Native: int
        Regular Zarr chunks
    Virtual: int
        Chunk conforming to the VirtualiZarr spec
    Inline: int
        Chunk is store inline in the manifest
    """

    uninitialized = 0
    native = 1
    virtual = 2
    inline = 3

S3Options #

Options for accessing an S3-compatible storage backend

Methods:

Name Description
__new__

Create a new S3Options object

Attributes:

Name Type Description
allow_http bool

Whether HTTP requests are allowed for the storage backend.

anonymous bool

Whether to use anonymous credentials (unsigned requests).

endpoint_url str | None

Optional endpoint URL for the storage backend.

force_path_style bool

Whether to force path-style bucket addressing.

network_stream_timeout_seconds int | None

Timeout in seconds for idle network streams.

region str | None

Optional region to use for the storage backend.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class S3Options:
    """Options for accessing an S3-compatible storage backend"""
    def __new__(
        cls,
        region: str | None = None,
        endpoint_url: str | None = None,
        allow_http: bool = False,
        anonymous: bool = False,
        force_path_style: bool = False,
        network_stream_timeout_seconds: int | None = None,
        requester_pays: bool = False,
    ) -> S3Options:
        """
        Create a new `S3Options` object

        Parameters
        ----------
        region: str | None
            Optional, the region to use for the storage backend.
        endpoint_url: str | None
            Optional, the endpoint URL to use for the storage backend.
        allow_http: bool
            Whether to allow HTTP requests to the storage backend.
        anonymous: bool
            Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
        force_path_style: bool
            Whether to force use of path-style addressing for buckets.
        network_stream_timeout_seconds: int | None
            Timeout requests if no bytes can be transmitted during this period of time.
            If set to 0, timeout is disabled. Default: 60.
        requester_pays: bool
            Enable requester pays for S3 buckets
        """

    @property
    def region(self) -> str | None:
        """
        Optional region to use for the storage backend.

        Returns
        -------
        str | None
            The region configured for the storage backend.
        """
        ...

    @region.setter
    def region(self, value: str | None) -> None:
        """
        Set the region to use for the storage backend.

        Parameters
        ----------
        value: str | None
            The region to use for the storage backend.
        """
        ...

    @property
    def endpoint_url(self) -> str | None:
        """
        Optional endpoint URL for the storage backend.

        Returns
        -------
        str | None
            The endpoint URL configured for the storage backend.
        """
        ...

    @endpoint_url.setter
    def endpoint_url(self, value: str | None) -> None:
        """
        Set the endpoint URL for the storage backend.

        Parameters
        ----------
        value: str | None
            The endpoint URL to use for the storage backend.
        """
        ...

    @property
    def allow_http(self) -> bool:
        """
        Whether HTTP requests are allowed for the storage backend.

        Returns
        -------
        bool
            ``True`` when HTTP requests to the storage backend are permitted.
        """
        ...

    @allow_http.setter
    def allow_http(self, value: bool) -> None:
        """
        Set whether HTTP requests are allowed for the storage backend.

        Parameters
        ----------
        value: bool
            ``True`` to allow HTTP requests to the storage backend, ``False`` otherwise.
        """
        ...

    @property
    def anonymous(self) -> bool:
        """
        Whether to use anonymous credentials (unsigned requests).

        Returns
        -------
        bool
            ``True`` when anonymous access is configured.
        """
        ...

    @anonymous.setter
    def anonymous(self, value: bool) -> None:
        """
        Set whether to use anonymous credentials.

        Parameters
        ----------
        value: bool
            ``True`` to perform unsigned requests, ``False`` to sign requests.
        """
        ...

    @property
    def force_path_style(self) -> bool:
        """
        Whether to force path-style bucket addressing.

        Returns
        -------
        bool
            ``True`` when path-style addressing is forced.
        """
        ...

    @force_path_style.setter
    def force_path_style(self, value: bool) -> None:
        """
        Set whether to force path-style bucket addressing.

        Parameters
        ----------
        value: bool
            ``True`` to always use path-style addressing, ``False`` to allow virtual-host style.
        """
        ...

    @property
    def network_stream_timeout_seconds(self) -> int | None:
        """
        Timeout in seconds for idle network streams.

        Returns
        -------
        int | None
            The timeout duration; ``0`` disables the timeout and ``None`` uses the default.
        """
        ...

    @network_stream_timeout_seconds.setter
    def network_stream_timeout_seconds(self, value: int | None) -> None:
        """
        Set the timeout for idle network streams.

        Parameters
        ----------
        value: int | None
            Timeout duration in seconds. Use ``0`` to disable or ``None`` for the default.
        """
        ...

allow_http property writable #

allow_http

Whether HTTP requests are allowed for the storage backend.

Returns:

Type Description
bool

True when HTTP requests to the storage backend are permitted.

anonymous property writable #

anonymous

Whether to use anonymous credentials (unsigned requests).

Returns:

Type Description
bool

True when anonymous access is configured.

endpoint_url property writable #

endpoint_url

Optional endpoint URL for the storage backend.

Returns:

Type Description
str | None

The endpoint URL configured for the storage backend.

force_path_style property writable #

force_path_style

Whether to force path-style bucket addressing.

Returns:

Type Description
bool

True when path-style addressing is forced.

network_stream_timeout_seconds property writable #

network_stream_timeout_seconds

Timeout in seconds for idle network streams.

Returns:

Type Description
int | None

The timeout duration; 0 disables the timeout and None uses the default.

region property writable #

region

Optional region to use for the storage backend.

Returns:

Type Description
str | None

The region configured for the storage backend.

__new__ #

__new__(
    region=None,
    endpoint_url=None,
    allow_http=False,
    anonymous=False,
    force_path_style=False,
    network_stream_timeout_seconds=None,
    requester_pays=False,
)

Create a new S3Options object

Parameters:

Name Type Description Default
region str | None

Optional, the region to use for the storage backend.

None
endpoint_url str | None

Optional, the endpoint URL to use for the storage backend.

None
allow_http bool

Whether to allow HTTP requests to the storage backend.

False
anonymous bool

Whether to use anonymous credentials to the storage backend. When True, the s3 requests will not be signed.

False
force_path_style bool

Whether to force use of path-style addressing for buckets.

False
network_stream_timeout_seconds int | None

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default: 60.

None
requester_pays bool

Enable requester pays for S3 buckets

False
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int | None = None,
    requester_pays: bool = False,
) -> S3Options:
    """
    Create a new `S3Options` object

    Parameters
    ----------
    region: str | None
        Optional, the region to use for the storage backend.
    endpoint_url: str | None
        Optional, the endpoint URL to use for the storage backend.
    allow_http: bool
        Whether to allow HTTP requests to the storage backend.
    anonymous: bool
        Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
    force_path_style: bool
        Whether to force use of path-style addressing for buckets.
    network_stream_timeout_seconds: int | None
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default: 60.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

Storage #

Storage configuration for an IcechunkStore

Currently supports memory, filesystem S3, azure blob, and google cloud storage backends. Use the following methods to create a Storage object with the desired backend.

Ex:

storage = icechunk.in_memory_storage()
storage = icechunk.local_filesystem_storage("/path/to/root")
storage = icechunk.s3_storage("bucket", "prefix", ...)
storage = icechunk.gcs_storage("bucket", "prefix", ...)
storage = icechunk.azure_storage("container", "prefix", ...)

Methods:

Name Description
list_objects

List objects in the storage backend, optionally filtered by a key prefix.

list_objects_metadata

List objects with full metadata, optionally filtered by a key prefix.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Storage:
    """Storage configuration for an IcechunkStore

    Currently supports memory, filesystem S3, azure blob, and google cloud storage backends.
    Use the following methods to create a Storage object with the desired backend.

    Ex:
    ```
    storage = icechunk.in_memory_storage()
    storage = icechunk.local_filesystem_storage("/path/to/root")
    storage = icechunk.s3_storage("bucket", "prefix", ...)
    storage = icechunk.gcs_storage("bucket", "prefix", ...)
    storage = icechunk.azure_storage("container", "prefix", ...)
    ```
    """

    @classmethod
    def new_s3(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: _AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_s3_object_store(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: _AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_tigris(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        use_weak_consistency: bool,
        credentials: _AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_in_memory(cls) -> Storage: ...
    @classmethod
    def new_local_filesystem(cls, path: str) -> Storage: ...
    @classmethod
    def new_gcs(
        cls,
        bucket: str,
        prefix: str | None,
        credentials: _AnyGcsCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_r2(
        cls,
        config: S3Options,
        bucket: str | None = None,
        prefix: str | None = None,
        account_id: str | None = None,
        credentials: _AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_azure_blob(
        cls,
        account: str,
        container: str,
        prefix: str,
        credentials: _AnyAzureCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_http(
        cls,
        base_url: str,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_redirect(
        cls,
        base_url: str,
    ) -> Storage: ...
    def __repr__(self) -> str: ...
    def __str__(self) -> str: ...
    def _repr_html_(self) -> str: ...
    def default_settings(self) -> StorageSettings: ...
    def list_objects(
        self, settings: StorageSettings | None = None, prefix: str | None = None
    ) -> list[tuple[str, int]]:
        """List objects in the storage backend, optionally filtered by a key prefix.

        Deprecated: use ``list_objects_metadata`` instead, which also returns
        the ``created_at`` timestamp.

        Parameters
        ----------
        settings : StorageSettings | None
            Optional storage settings to override the defaults (retries, concurrency, etc.).
        prefix : str | None
            If provided, only objects whose keys start with this prefix are returned.
            When ``None`` or empty, all objects under the repository root are listed.

        Returns
        -------
        list[tuple[str, int]]
            A list of ``(key, size_in_bytes)`` tuples for each object found.
        """
        ...
    def list_objects_metadata(
        self, settings: StorageSettings | None = None, prefix: str | None = None
    ) -> list[StorageObjectInfo]:
        """List objects with full metadata, optionally filtered by a key prefix.

        Parameters
        ----------
        settings : StorageSettings | None
            Optional storage settings to override the defaults.
        prefix : str | None
            If provided, only objects whose keys start with this prefix are returned.

        Returns
        -------
        list[StorageObjectInfo]
            A list of :class:`StorageObjectInfo` objects.
        """
        ...

list_objects #

list_objects(settings=None, prefix=None)

List objects in the storage backend, optionally filtered by a key prefix.

Deprecated: use list_objects_metadata instead, which also returns the created_at timestamp.

Parameters:

Name Type Description Default
settings StorageSettings | None

Optional storage settings to override the defaults (retries, concurrency, etc.).

None
prefix str | None

If provided, only objects whose keys start with this prefix are returned. When None or empty, all objects under the repository root are listed.

None

Returns:

Type Description
list[tuple[str, int]]

A list of (key, size_in_bytes) tuples for each object found.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def list_objects(
    self, settings: StorageSettings | None = None, prefix: str | None = None
) -> list[tuple[str, int]]:
    """List objects in the storage backend, optionally filtered by a key prefix.

    Deprecated: use ``list_objects_metadata`` instead, which also returns
    the ``created_at`` timestamp.

    Parameters
    ----------
    settings : StorageSettings | None
        Optional storage settings to override the defaults (retries, concurrency, etc.).
    prefix : str | None
        If provided, only objects whose keys start with this prefix are returned.
        When ``None`` or empty, all objects under the repository root are listed.

    Returns
    -------
    list[tuple[str, int]]
        A list of ``(key, size_in_bytes)`` tuples for each object found.
    """
    ...

list_objects_metadata #

list_objects_metadata(settings=None, prefix=None)

List objects with full metadata, optionally filtered by a key prefix.

Parameters:

Name Type Description Default
settings StorageSettings | None

Optional storage settings to override the defaults.

None
prefix str | None

If provided, only objects whose keys start with this prefix are returned.

None

Returns:

Type Description
list[StorageObjectInfo]

A list of :class:StorageObjectInfo objects.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def list_objects_metadata(
    self, settings: StorageSettings | None = None, prefix: str | None = None
) -> list[StorageObjectInfo]:
    """List objects with full metadata, optionally filtered by a key prefix.

    Parameters
    ----------
    settings : StorageSettings | None
        Optional storage settings to override the defaults.
    prefix : str | None
        If provided, only objects whose keys start with this prefix are returned.

    Returns
    -------
    list[StorageObjectInfo]
        A list of :class:`StorageObjectInfo` objects.
    """
    ...

StorageConcurrencySettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name Description
__new__

Create a new StorageConcurrencySettings object

Attributes:

Name Type Description
ideal_concurrent_request_size int | None

The ideal concurrent request size.

max_concurrent_requests_for_object int | None

The maximum number of concurrent requests for an object.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageConcurrencySettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __new__(
        cls,
        max_concurrent_requests_for_object: int | None = None,
        ideal_concurrent_request_size: int | None = None,
    ) -> StorageConcurrencySettings:
        """
        Create a new `StorageConcurrencySettings` object

        Parameters
        ----------
        max_concurrent_requests_for_object: int | None
            The maximum number of concurrent requests for an object.
            Default: 18
        ideal_concurrent_request_size: int | None
            The ideal concurrent request size in bytes.
            Default: 12,582,912 (12 MB)
        """
        ...
    @property
    def max_concurrent_requests_for_object(self) -> int | None:
        """
        The maximum number of concurrent requests for an object.

        Returns
        -------
        int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @max_concurrent_requests_for_object.setter
    def max_concurrent_requests_for_object(self, value: int | None) -> None:
        """
        Set the maximum number of concurrent requests for an object.

        Parameters
        ----------
        value: int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @property
    def ideal_concurrent_request_size(self) -> int | None:
        """
        The ideal concurrent request size.

        Returns
        -------
        int | None
            The ideal concurrent request size.
        """
        ...
    @ideal_concurrent_request_size.setter
    def ideal_concurrent_request_size(self, value: int | None) -> None:
        """
        Set the ideal concurrent request size.

        Parameters
        ----------
        value: int | None
            The ideal concurrent request size.
        """
        ...

ideal_concurrent_request_size property writable #

ideal_concurrent_request_size

The ideal concurrent request size.

Returns:

Type Description
int | None

The ideal concurrent request size.

max_concurrent_requests_for_object property writable #

max_concurrent_requests_for_object

The maximum number of concurrent requests for an object.

Returns:

Type Description
int | None

The maximum number of concurrent requests for an object.

__new__ #

__new__(
    max_concurrent_requests_for_object=None,
    ideal_concurrent_request_size=None,
)

Create a new StorageConcurrencySettings object

Parameters:

Name Type Description Default
max_concurrent_requests_for_object int | None

The maximum number of concurrent requests for an object. Default: 18

None
ideal_concurrent_request_size int | None

The ideal concurrent request size in bytes. Default: 12,582,912 (12 MB)

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    max_concurrent_requests_for_object: int | None = None,
    ideal_concurrent_request_size: int | None = None,
) -> StorageConcurrencySettings:
    """
    Create a new `StorageConcurrencySettings` object

    Parameters
    ----------
    max_concurrent_requests_for_object: int | None
        The maximum number of concurrent requests for an object.
        Default: 18
    ideal_concurrent_request_size: int | None
        The ideal concurrent request size in bytes.
        Default: 12,582,912 (12 MB)
    """
    ...

StorageRetriesSettings #

Configuration for how Icechunk retries requests.

Icechunk retries failed requests with an exponential backoff algorithm.

Methods:

Name Description
__new__

Create a new StorageRetriesSettings object

Attributes:

Name Type Description
initial_backoff_ms int | None

The initial backoff duration in milliseconds.

max_backoff_ms int | None

The maximum backoff duration in milliseconds.

max_tries int | None

The maximum number of tries, including the initial one.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageRetriesSettings:
    """Configuration for how Icechunk retries requests.

    Icechunk retries failed requests with an exponential backoff algorithm."""

    def __new__(
        cls,
        max_tries: int | None = None,
        initial_backoff_ms: int | None = None,
        max_backoff_ms: int | None = None,
    ) -> StorageRetriesSettings:
        """
        Create a new `StorageRetriesSettings` object

        Parameters
        ----------
        max_tries: int | None
            The maximum number of tries, including the initial one. Set to 1 to disable retries.
            Default: 10
        initial_backoff_ms: int | None
            The initial backoff duration in milliseconds.
            Default: 100
        max_backoff_ms: int | None
            The limit to backoff duration in milliseconds.
            Default: 180,000 (3 minutes)
        """
        ...
    @property
    def max_tries(self) -> int | None:
        """
        The maximum number of tries, including the initial one.

        Returns
        -------
        int | None
            The maximum number of tries.
        """
        ...
    @max_tries.setter
    def max_tries(self, value: int | None) -> None:
        """
        Set the maximum number of tries. Set to 1 to disable retries.

        Parameters
        ----------
        value: int | None
            The maximum number of tries
        """
        ...
    @property
    def initial_backoff_ms(self) -> int | None:
        """
        The initial backoff duration in milliseconds.

        Returns
        -------
        int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @initial_backoff_ms.setter
    def initial_backoff_ms(self, value: int | None) -> None:
        """
        Set the initial backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @property
    def max_backoff_ms(self) -> int | None:
        """
        The maximum backoff duration in milliseconds.

        Returns
        -------
        int | None
            The maximum backoff duration in milliseconds.
        """
        ...
    @max_backoff_ms.setter
    def max_backoff_ms(self, value: int | None) -> None:
        """
        Set the maximum backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The maximum backoff duration in milliseconds.
        """
        ...

initial_backoff_ms property writable #

initial_backoff_ms

The initial backoff duration in milliseconds.

Returns:

Type Description
int | None

The initial backoff duration in milliseconds.

max_backoff_ms property writable #

max_backoff_ms

The maximum backoff duration in milliseconds.

Returns:

Type Description
int | None

The maximum backoff duration in milliseconds.

max_tries property writable #

max_tries

The maximum number of tries, including the initial one.

Returns:

Type Description
int | None

The maximum number of tries.

__new__ #

__new__(
    max_tries=None,
    initial_backoff_ms=None,
    max_backoff_ms=None,
)

Create a new StorageRetriesSettings object

Parameters:

Name Type Description Default
max_tries int | None

The maximum number of tries, including the initial one. Set to 1 to disable retries. Default: 10

None
initial_backoff_ms int | None

The initial backoff duration in milliseconds. Default: 100

None
max_backoff_ms int | None

The limit to backoff duration in milliseconds. Default: 180,000 (3 minutes)

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    max_tries: int | None = None,
    initial_backoff_ms: int | None = None,
    max_backoff_ms: int | None = None,
) -> StorageRetriesSettings:
    """
    Create a new `StorageRetriesSettings` object

    Parameters
    ----------
    max_tries: int | None
        The maximum number of tries, including the initial one. Set to 1 to disable retries.
        Default: 10
    initial_backoff_ms: int | None
        The initial backoff duration in milliseconds.
        Default: 100
    max_backoff_ms: int | None
        The limit to backoff duration in milliseconds.
        Default: 180,000 (3 minutes)
    """
    ...

StorageSettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name Description
__new__

Create a new StorageSettings object

Attributes:

Name Type Description
chunks_storage_class str | None

Chunk objects in object store will use this storage class or self.storage_class if None

concurrency StorageConcurrencySettings | None

The configuration for how much concurrency Icechunk store uses

metadata_storage_class str | None

Metadata objects in object store will use this storage class or self.storage_class if None

minimum_size_for_multipart_upload int | None

Use object store's multipart upload for objects larger than this size in bytes

retries StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

storage_class str | None

All objects in object store will use this storage class or the default if None

timeouts StorageTimeoutSettings | None

The configuration for AWS SDK timeout settings.

unsafe_use_conditional_create bool | None

True if Icechunk will use conditional PUT operations for creation in the object store

unsafe_use_conditional_update bool | None

True if Icechunk will use conditional PUT operations for updates in the object store

unsafe_use_metadata bool | None

True if Icechunk will write object metadata in the object store

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageSettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __new__(
        cls,
        concurrency: StorageConcurrencySettings | None = None,
        retries: StorageRetriesSettings | None = None,
        unsafe_use_conditional_create: bool | None = None,
        unsafe_use_conditional_update: bool | None = None,
        unsafe_use_metadata: bool | None = None,
        storage_class: str | None = None,
        metadata_storage_class: str | None = None,
        chunks_storage_class: str | None = None,
        minimum_size_for_multipart_upload: int | None = None,
        timeouts: StorageTimeoutSettings | None = None,
    ) -> StorageSettings:
        """
        Create a new `StorageSettings` object

        Parameters
        ----------
        concurrency: StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.

        retries: StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.

        unsafe_use_conditional_update: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use it at your own risk.
            Default: True

        unsafe_use_conditional_create: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.
            Default: True

        unsafe_use_metadata: bool | None
            Don't write metadata fields in Icechunk files.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.
            Default: True

        storage_class: str | None
            Store all objects using this object store storage class
            If None the object store default will be used.
            Currently not supported in GCS.
            Example: STANDARD_IA

        metadata_storage_class: str | None
            Store metadata objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        chunks_storage_class: str | None
            Store chunk objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        minimum_size_for_multipart_upload: int | None
            Use object store's multipart upload for objects larger than this size in bytes.
            Default: 100 MB if None is passed.

        timeouts: StorageTimeoutSettings | None
            The configuration for AWS SDK timeout settings.
        """
        ...
    def __repr__(self, /) -> str: ...
    def __str__(self, /) -> str: ...
    def _repr_html_(self, /) -> str: ...
    @property
    def concurrency(self) -> StorageConcurrencySettings | None:
        """
        The configuration for how much concurrency Icechunk store uses

        Returns
        -------
        StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.
        """

    @concurrency.setter
    def concurrency(self, value: StorageConcurrencySettings | None) -> None: ...
    @property
    def retries(self) -> StorageRetriesSettings | None:
        """
        The configuration for how Icechunk retries failed requests.

        Returns
        -------
        StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.
        """

    @retries.setter
    def retries(self, value: StorageRetriesSettings | None) -> None: ...
    @property
    def timeouts(self) -> StorageTimeoutSettings | None:
        """
        The configuration for AWS SDK timeout settings.

        Returns
        -------
        StorageTimeoutSettings | None
            The timeout configuration.
        """

    @timeouts.setter
    def timeouts(self, value: StorageTimeoutSettings | None) -> None: ...
    @property
    def unsafe_use_conditional_update(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for updates in the object store"""
        ...

    @unsafe_use_conditional_update.setter
    def unsafe_use_conditional_update(self, value: bool) -> None: ...
    @property
    def unsafe_use_conditional_create(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for creation in the object store"""
        ...

    @unsafe_use_conditional_create.setter
    def unsafe_use_conditional_create(self, value: bool) -> None: ...
    @property
    def unsafe_use_metadata(self) -> bool | None:
        """True if Icechunk will write object metadata in the object store"""
        ...

    @unsafe_use_metadata.setter
    def unsafe_use_metadata(self, value: bool) -> None: ...
    @property
    def storage_class(self) -> str | None:
        """All objects in object store will use this storage class or the default if None"""
        ...

    @storage_class.setter
    def storage_class(self, value: str) -> None: ...
    @property
    def metadata_storage_class(self) -> str | None:
        """Metadata objects in object store will use this storage class or self.storage_class if None"""
        ...

    @metadata_storage_class.setter
    def metadata_storage_class(self, value: str) -> None: ...
    @property
    def chunks_storage_class(self) -> str | None:
        """Chunk objects in object store will use this storage class or self.storage_class if None"""
        ...

    @chunks_storage_class.setter
    def chunks_storage_class(self, value: str) -> None: ...
    @property
    def minimum_size_for_multipart_upload(self) -> int | None:
        """Use object store's multipart upload for objects larger than this size in bytes"""
        ...

    @minimum_size_for_multipart_upload.setter
    def minimum_size_for_multipart_upload(self, value: int) -> None: ...

chunks_storage_class property writable #

chunks_storage_class

Chunk objects in object store will use this storage class or self.storage_class if None

concurrency property writable #

concurrency

The configuration for how much concurrency Icechunk store uses

Returns:

Type Description
StorageConcurrencySettings | None

The configuration for how Icechunk uses its Storage instance.

metadata_storage_class property writable #

metadata_storage_class

Metadata objects in object store will use this storage class or self.storage_class if None

minimum_size_for_multipart_upload property writable #

minimum_size_for_multipart_upload

Use object store's multipart upload for objects larger than this size in bytes

retries property writable #

retries

The configuration for how Icechunk retries failed requests.

Returns:

Type Description
StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

storage_class property writable #

storage_class

All objects in object store will use this storage class or the default if None

timeouts property writable #

timeouts

The configuration for AWS SDK timeout settings.

Returns:

Type Description
StorageTimeoutSettings | None

The timeout configuration.

unsafe_use_conditional_create property writable #

unsafe_use_conditional_create

True if Icechunk will use conditional PUT operations for creation in the object store

unsafe_use_conditional_update property writable #

unsafe_use_conditional_update

True if Icechunk will use conditional PUT operations for updates in the object store

unsafe_use_metadata property writable #

unsafe_use_metadata

True if Icechunk will write object metadata in the object store

__new__ #

__new__(
    concurrency=None,
    retries=None,
    unsafe_use_conditional_create=None,
    unsafe_use_conditional_update=None,
    unsafe_use_metadata=None,
    storage_class=None,
    metadata_storage_class=None,
    chunks_storage_class=None,
    minimum_size_for_multipart_upload=None,
    timeouts=None,
)

Create a new StorageSettings object

Parameters:

Name Type Description Default
concurrency StorageConcurrencySettings | None

The configuration for how Icechunk uses its Storage instance.

None
retries StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

None
unsafe_use_conditional_update bool | None

If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use it at your own risk. Default: True

None
unsafe_use_conditional_create bool | None

If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use at your own risk. Default: True

None
unsafe_use_metadata bool | None

Don't write metadata fields in Icechunk files. This is only useful in object stores that don't support the feature. Use at your own risk. Default: True

None
storage_class str | None

Store all objects using this object store storage class If None the object store default will be used. Currently not supported in GCS. Example: STANDARD_IA

None
metadata_storage_class str | None

Store metadata objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.

None
chunks_storage_class str | None

Store chunk objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.

None
minimum_size_for_multipart_upload int | None

Use object store's multipart upload for objects larger than this size in bytes. Default: 100 MB if None is passed.

None
timeouts StorageTimeoutSettings | None

The configuration for AWS SDK timeout settings.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    concurrency: StorageConcurrencySettings | None = None,
    retries: StorageRetriesSettings | None = None,
    unsafe_use_conditional_create: bool | None = None,
    unsafe_use_conditional_update: bool | None = None,
    unsafe_use_metadata: bool | None = None,
    storage_class: str | None = None,
    metadata_storage_class: str | None = None,
    chunks_storage_class: str | None = None,
    minimum_size_for_multipart_upload: int | None = None,
    timeouts: StorageTimeoutSettings | None = None,
) -> StorageSettings:
    """
    Create a new `StorageSettings` object

    Parameters
    ----------
    concurrency: StorageConcurrencySettings | None
        The configuration for how Icechunk uses its Storage instance.

    retries: StorageRetriesSettings | None
        The configuration for how Icechunk retries failed requests.

    unsafe_use_conditional_update: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use it at your own risk.
        Default: True

    unsafe_use_conditional_create: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.
        Default: True

    unsafe_use_metadata: bool | None
        Don't write metadata fields in Icechunk files.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.
        Default: True

    storage_class: str | None
        Store all objects using this object store storage class
        If None the object store default will be used.
        Currently not supported in GCS.
        Example: STANDARD_IA

    metadata_storage_class: str | None
        Store metadata objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    chunks_storage_class: str | None
        Store chunk objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    minimum_size_for_multipart_upload: int | None
        Use object store's multipart upload for objects larger than this size in bytes.
        Default: 100 MB if None is passed.

    timeouts: StorageTimeoutSettings | None
        The configuration for AWS SDK timeout settings.
    """
    ...

StorageTimeoutSettings #

Configuration for AWS SDK timeout settings.

Controls connect, read, and operation timeouts for the underlying S3 client.

Methods:

Name Description
__new__

Create a new StorageTimeoutSettings object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@final
class StorageTimeoutSettings:
    """Configuration for AWS SDK timeout settings.

    Controls connect, read, and operation timeouts for the underlying S3 client."""

    def __new__(
        cls,
        connect_timeout_ms: int | None = None,
        read_timeout_ms: int | None = None,
        operation_timeout_ms: int | None = None,
        operation_attempt_timeout_ms: int | None = None,
    ) -> StorageTimeoutSettings:
        """
        Create a new `StorageTimeoutSettings` object

        All timeouts default to None, meaning the underlying
        `AWS SDK default <https://docs.aws.amazon.com/sdk-for-rust/latest/dg/timeouts.html>`_
        is used.

        Parameters
        ----------
        connect_timeout_ms: int | None
            The timeout for establishing a connection in milliseconds.
        read_timeout_ms: int | None
            The timeout for reading a response in milliseconds.
        operation_timeout_ms: int | None
            The timeout for the entire operation (including retries) in milliseconds.
        operation_attempt_timeout_ms: int | None
            The timeout for a single attempt of an operation in milliseconds.
        """
        ...
    @property
    def connect_timeout_ms(self) -> int | None: ...
    @connect_timeout_ms.setter
    def connect_timeout_ms(self, value: int | None) -> None: ...
    @property
    def read_timeout_ms(self) -> int | None: ...
    @read_timeout_ms.setter
    def read_timeout_ms(self, value: int | None) -> None: ...
    @property
    def operation_timeout_ms(self) -> int | None: ...
    @operation_timeout_ms.setter
    def operation_timeout_ms(self, value: int | None) -> None: ...
    @property
    def operation_attempt_timeout_ms(self) -> int | None: ...
    @operation_attempt_timeout_ms.setter
    def operation_attempt_timeout_ms(self, value: int | None) -> None: ...

__new__ #

__new__(
    connect_timeout_ms=None,
    read_timeout_ms=None,
    operation_timeout_ms=None,
    operation_attempt_timeout_ms=None,
)

Create a new StorageTimeoutSettings object

All timeouts default to None, meaning the underlying AWS SDK default <https://docs.aws.amazon.com/sdk-for-rust/latest/dg/timeouts.html>_ is used.

Parameters:

Name Type Description Default
connect_timeout_ms int | None

The timeout for establishing a connection in milliseconds.

None
read_timeout_ms int | None

The timeout for reading a response in milliseconds.

None
operation_timeout_ms int | None

The timeout for the entire operation (including retries) in milliseconds.

None
operation_attempt_timeout_ms int | None

The timeout for a single attempt of an operation in milliseconds.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    connect_timeout_ms: int | None = None,
    read_timeout_ms: int | None = None,
    operation_timeout_ms: int | None = None,
    operation_attempt_timeout_ms: int | None = None,
) -> StorageTimeoutSettings:
    """
    Create a new `StorageTimeoutSettings` object

    All timeouts default to None, meaning the underlying
    `AWS SDK default <https://docs.aws.amazon.com/sdk-for-rust/latest/dg/timeouts.html>`_
    is used.

    Parameters
    ----------
    connect_timeout_ms: int | None
        The timeout for establishing a connection in milliseconds.
    read_timeout_ms: int | None
        The timeout for reading a response in milliseconds.
    operation_timeout_ms: int | None
        The timeout for the entire operation (including retries) in milliseconds.
    operation_attempt_timeout_ms: int | None
        The timeout for a single attempt of an operation in milliseconds.
    """
    ...

azure_storage #

azure_storage(
    *,
    account,
    container,
    prefix,
    access_key=None,
    sas_token=None,
    bearer_token=None,
    from_env=None,
    config=None,
)

Create a Storage instance that saves data in Azure Blob Storage object store.

Parameters:

Name Type Description Default
account str

The account to which the caller must have access privileges

required
container str

The container where the repository will store its data

required
prefix str

The prefix within the container that is the root directory of the repository

required
access_key str | None

Azure Blob Storage credential access key

None
sas_token str | None

Azure Blob Storage credential SAS token

None
bearer_token str | None

Azure Blob Storage credential bearer token

None
from_env bool | None

Fetch credentials from the operative system environment

None
config dict[str, str] | None

A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.

None
Source code in icechunk-python/python/icechunk/storage.py
def azure_storage(
    *,
    account: str,
    container: str,
    prefix: str,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
) -> Storage:
    """Create a Storage instance that saves data in Azure Blob Storage object store.

    Parameters
    ----------
    account: str
        The account to which the caller must have access privileges
    container: str
        The container where the repository will store its data
    prefix: str
        The prefix within the container that is the root directory of the repository
    access_key: str | None
        Azure Blob Storage credential access key
    sas_token: str | None
        Azure Blob Storage credential SAS token
    bearer_token: str | None
        Azure Blob Storage credential bearer token
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.
    """
    credentials = azure_credentials(
        access_key=access_key,
        sas_token=sas_token,
        bearer_token=bearer_token,
        from_env=from_env,
    )
    return Storage.new_azure_blob(
        account=account,
        container=container,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

azure_store #

azure_store(*, account, config=None)

Build an ObjectStoreConfig instance for Azure stores.

Parameters:

Name Type Description Default
account str

The account to which the caller must have access privileges

required
config dict[str, str] | None

A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.

None
Source code in icechunk-python/python/icechunk/storage.py
def azure_store(
    *,
    account: str,
    config: dict[str, str] | None = None,
) -> ObjectStoreConfig.Azure:
    """Build an ObjectStoreConfig instance for Azure stores.

    Parameters
    ----------
    account: str
        The account to which the caller must have access privileges
    config: dict[str, str] | None
        A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.
    """
    return ObjectStoreConfig.Azure({"account": account, **(config or {})})

gcs_storage #

gcs_storage(
    *,
    bucket,
    prefix,
    service_account_file=None,
    service_account_key=None,
    application_credentials=None,
    bearer_token=None,
    anonymous=None,
    from_env=None,
    config=None,
    get_credentials=None,
    scatter_initial_credentials=False,
)

Create a Storage instance that saves data in Google Cloud Storage object store.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
service_account_file str | None

The path to the service account file

None
service_account_key str | None

The service account key

None
application_credentials str | None

The path to the application credentials file

None
bearer_token str | None

The bearer token to use for the object store

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
config dict[str, str] | None

A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.

None
get_credentials Callable[[], GcsBearerCredential] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
Source code in icechunk-python/python/icechunk/storage.py
def gcs_storage(
    *,
    bucket: str,
    prefix: str | None,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
    get_credentials: Callable[[], GcsBearerCredential] | None = None,
    scatter_initial_credentials: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in Google Cloud Storage object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    service_account_file: str | None
        The path to the service account file
    service_account_key: str | None
        The service account key
    application_credentials: str | None
        The path to the application credentials file
    bearer_token: str | None
        The bearer token to use for the object store
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    get_credentials: Callable[[], GcsBearerCredential] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    credentials = gcs_credentials(
        service_account_file=service_account_file,
        service_account_key=service_account_key,
        application_credentials=application_credentials,
        bearer_token=bearer_token,
        from_env=from_env,
        anonymous=anonymous,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    return Storage.new_gcs(
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

gcs_store #

gcs_store(opts=None)

Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

Parameters:

Name Type Description Default
opts dict[str, str] | None

A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.

None
Source code in icechunk-python/python/icechunk/storage.py
def gcs_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Gcs:
    """Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    """
    return ObjectStoreConfig.Gcs(opts)

http_storage #

http_storage(base_url, opts=None)

Create a read-only Storage instance that reads data from an HTTP(s) server

Parameters:

Name Type Description Default
base_url str

The URL path to the root of the repository

required
opts dict[str, str] | None

A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.

None
Source code in icechunk-python/python/icechunk/storage.py
def http_storage(base_url: str, opts: dict[str, str] | None = None) -> Storage:
    """Create a read-only Storage instance that reads data from an HTTP(s) server

    Parameters
    ----------
    base_url: str
        The URL path to the root of the repository
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return Storage.new_http(base_url, opts)

http_store #

http_store(opts=None)

Build an ObjectStoreConfig instance for HTTP object stores.

Parameters:

Name Type Description Default
opts dict[str, str] | None

A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.

None
Source code in icechunk-python/python/icechunk/storage.py
def http_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Http:
    """Build an ObjectStoreConfig instance for HTTP object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return ObjectStoreConfig.Http(opts)

in_memory_storage #

in_memory_storage()

Create a Storage instance that saves data in memory.

This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data.

Source code in icechunk-python/python/icechunk/storage.py
def in_memory_storage() -> Storage:
    """Create a Storage instance that saves data in memory.

    This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data."""
    return Storage.new_in_memory()

local_filesystem_storage #

local_filesystem_storage(path)

Create a Storage instance that saves data in the local file system.

This Storage instance is not recommended for production data

Source code in icechunk-python/python/icechunk/storage.py
def local_filesystem_storage(path: str) -> Storage:
    """Create a Storage instance that saves data in the local file system.

    This Storage instance is not recommended for production data
    """
    return Storage.new_local_filesystem(path)

local_filesystem_store #

local_filesystem_store(path)

Build an ObjectStoreConfig instance for local file stores.

Parameters:

Name Type Description Default
path str

The root directory for the store.

required
Source code in icechunk-python/python/icechunk/storage.py
def local_filesystem_store(
    path: str,
) -> ObjectStoreConfig.LocalFileSystem:
    """Build an ObjectStoreConfig instance for local file stores.

    Parameters
    ----------
    path: str
        The root directory for the store.
    """
    return ObjectStoreConfig.LocalFileSystem(path)

r2_storage #

r2_storage(
    *,
    bucket=None,
    prefix=None,
    account_id=None,
    endpoint_url=None,
    region=None,
    allow_http=False,
    access_key_id=None,
    secret_access_key=None,
    session_token=None,
    expires_after=None,
    anonymous=None,
    from_env=None,
    get_credentials=None,
    scatter_initial_credentials=False,
    network_stream_timeout_seconds=60,
)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name Type Description Default
bucket str | None

The bucket name

None
prefix str | None

The prefix within the bucket that is the root directory of the repository

None
account_id str | None

Cloudflare account ID. When provided, a default endpoint URL is constructed as https://<ACCOUNT_ID>.r2.cloudflarestorage.com. If not provided, endpoint_url must be provided instead.

None
endpoint_url str | None

Endpoint where the object store serves data, example: https://<ACCOUNT_ID>.r2.cloudflarestorage.com

None
region str | None

The region to use in the object store, if None the default region 'auto' will be used

None
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default: 60.

60
Source code in icechunk-python/python/icechunk/storage.py
def r2_storage(
    *,
    bucket: str | None = None,
    prefix: str | None = None,
    account_id: str | None = None,
    endpoint_url: str | None = None,
    region: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str | None
        The bucket name
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    account_id: str | None
        Cloudflare account ID. When provided, a default endpoint URL is constructed as
        `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`. If not provided, `endpoint_url`
        must be provided instead.
    endpoint_url: str | None
        Endpoint where the object store serves data, example: `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`
    region: str | None
        The region to use in the object store, if `None` the default region 'auto' will be used
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default: 60.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_r2(
        config=options,
        bucket=bucket,
        prefix=prefix,
        account_id=account_id,
        credentials=credentials,
    )

redirect_storage #

redirect_storage(base_url)

Create a read-only Storage instance that follows HTTP redirects to resolve the underlying storage backend.

The given URL is expected to return an HTTP redirect (3xx) with a Location header pointing to a supported storage scheme (s3://, gs://, r2://, tigris://, http+icechunk://, etc.). Icechunk will follow redirects until it reaches a recognized scheme and then use that as the actual storage backend.

This is useful when a service controls which bucket or path a repository lives in, so clients don't need to know the final storage location ahead of time.

Parameters:

Name Type Description Default
base_url str

The URL that will be followed to discover the actual storage location.

required
Source code in icechunk-python/python/icechunk/storage.py
def redirect_storage(base_url: str) -> Storage:
    """Create a read-only Storage instance that follows HTTP redirects to resolve the underlying storage backend.

    The given URL is expected to return an HTTP redirect (3xx) with a ``Location`` header
    pointing to a supported storage scheme (``s3://``, ``gs://``, ``r2://``, ``tigris://``,
    ``http+icechunk://``, etc.). Icechunk will follow redirects until it reaches a recognized
    scheme and then use that as the actual storage backend.

    This is useful when a service controls which bucket or path a repository lives in, so
    clients don't need to know the final storage location ahead of time.

    Parameters
    ----------
    base_url: str
        The URL that will be followed to discover the actual storage location.
    """
    return Storage.new_redirect(base_url)

s3_storage #

s3_storage(
    *,
    bucket,
    prefix,
    region=None,
    endpoint_url=None,
    allow_http=False,
    access_key_id=None,
    secret_access_key=None,
    session_token=None,
    expires_after=None,
    anonymous=None,
    from_env=None,
    get_credentials=None,
    scatter_initial_credentials=False,
    force_path_style=False,
    network_stream_timeout_seconds=60,
    requester_pays=False,
)

Create a Storage instance that saves data in S3 or S3 compatible object stores.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
region str | None

The region to use in the object store, if None a default region will be used

None
endpoint_url str | None

Optional endpoint where the object store serves data, example: http://localhost:4200

None
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
force_path_style bool

Whether to force using path-style addressing for buckets

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default: 60.

60
requester_pays bool

Enable requester pays for S3 buckets

False
Source code in icechunk-python/python/icechunk/storage.py
def s3_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in S3 or S3 compatible object stores.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:4200
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    force_path_style: bool
        Whether to force using path-style addressing for buckets
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default: 60.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous or False,
    )
    return Storage.new_s3(
        config=options,
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
    )

s3_store #

s3_store(
    region=None,
    endpoint_url=None,
    allow_http=False,
    anonymous=False,
    s3_compatible=False,
    force_path_style=False,
    network_stream_timeout_seconds=60,
    requester_pays=False,
)

Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.

Source code in icechunk-python/python/icechunk/storage.py
def s3_store(
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    s3_compatible: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> ObjectStoreConfig.S3Compatible | ObjectStoreConfig.S3:
    """Build an ObjectStoreConfig instance for S3 or S3 compatible object stores."""

    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous,
    )
    return (
        ObjectStoreConfig.S3Compatible(options)
        if s3_compatible
        else ObjectStoreConfig.S3(options)
    )

tigris_storage #

tigris_storage(
    *,
    bucket,
    prefix,
    region=None,
    endpoint_url=None,
    use_weak_consistency=False,
    allow_http=False,
    access_key_id=None,
    secret_access_key=None,
    session_token=None,
    expires_after=None,
    anonymous=None,
    from_env=None,
    get_credentials=None,
    scatter_initial_credentials=False,
    network_stream_timeout_seconds=60,
)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
region str | None

The region to use in the object store, if None a default region will be used

None
endpoint_url str | None

Optional endpoint where the object store serves data, example: http://localhost:4200

None
use_weak_consistency bool

If set to True it will return a Storage instance that is read only, and can read from the the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet. This option is for experts only.

False
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default: 60.

60
Source code in icechunk-python/python/icechunk/storage.py
def tigris_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    use_weak_consistency: bool = False,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:4200
    use_weak_consistency: bool
        If set to True it will return a Storage instance that is read only, and can read from the
        the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet.
        This option is for experts only.
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default: 60.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_tigris(
        config=options,
        bucket=bucket,
        prefix=prefix,
        use_weak_consistency=use_weak_consistency,
        credentials=credentials,
    )