Skip to content

Home / guides / flaky-networks

Flaky Networks#

Icechunk provides configurable retries and timeouts to handle unreliable networks.

Note

If you encounter errors or operations that you believe should be retried but aren't, please open an issue.

Retries: StorageRetriesSettings#

Controls retry behavior for all storage operations (reads, writes, deletes).

Parameter Default Description
max_tries 10 Maximum number of attempts (including the initial request)
initial_backoff_ms 100 Initial delay (ms) for the exponential backoff stage
max_backoff_ms 180,000 (3 min) Maximum delay (ms) between retries
import icechunk

config = icechunk.RepositoryConfig.default()
config.storage.retries = icechunk.StorageRetriesSettings(
    max_tries=20,
    initial_backoff_ms=200,
    max_backoff_ms=60_000,
)

Timeouts: StorageTimeoutSettings#

Controls connect, read, and operation timeouts for the underlying S3 client. All values are in milliseconds. By default, these are unset and the AWS SDK defaults apply.

Parameter Description
connect_timeout_ms Time allowed to establish a TCP connection
read_timeout_ms Time allowed to read a single response
operation_timeout_ms Total time allowed for an entire operation (including all retries)
operation_attempt_timeout_ms Time allowed for a single attempt of an operation
config.storage.timeouts = icechunk.StorageTimeoutSettings(
    connect_timeout_ms=5_000,
    read_timeout_ms=30_000,
    operation_timeout_ms=300_000,
    operation_attempt_timeout_ms=60_000,
)

Stalled stream detection#

A stalled network stream is an HTTP connection which does not transfer any data over a certain period. Stalled connections may occur in the following situations:

  • When the client is connecting to a remote object store behind a slow network connection.
  • When the client is behind a VPN or proxy server which is limiting the number or throughput of connections between the client and the remote object store.

The Amazon S3 SDK allows detection and retrying on stalled stream errors, which typically look like

  |-> I/O error
  |-> streaming error
  `-> minimum throughput was specified at 1 B/s, but throughput of 0 B/s was observed

This behavior is configurable when creating a new Storage option, via the network_stream_timeout_seconds parameter. The default is 60 seconds. To set a different value, you may specify as follows

storage = icechunk.s3_storage(
    **other_storage_kwargs,
    network_stream_timeout_seconds=50,
)
repo = icechunk.Repository.open(storage=storage)

Specifying a value of 0 disables this check entirely.

Applying the configuration#

Pass the configured RepositoryConfig when creating or opening a repository:

repo = icechunk.Repository.open(
    storage=storage,
    config=config,
)

See Configuration for full details on applying and persisting configuration.