Skip to content

Home / reference

Python API Reference#

The icechunk package provides Repository — the main entry point — and a few commonly-used exceptions and utilities directly in the top-level namespace. Everything else is organized into submodules.

import icechunk as ic

# Top-level: Repository, storage factories, exceptions, utilities
repo = ic.Repository.create(ic.s3_storage(bucket="my-bucket", prefix="my-prefix", from_env=True))

# Submodules for everything else
config = ic.config.RepositoryConfig(...)
solver = ic.conflicts.BasicConflictSolver(...)

Submodules#

Module Description
icechunk.config Repository configuration, manifest settings, compression, caching
icechunk.conflicts Conflict detection and resolution
icechunk.credentials Credential types and factories for S3, GCS, Azure
icechunk.ops Operation types: updates, garbage collection summaries
icechunk.session Sessions for reading and writing data
icechunk.snapshots Snapshot metadata, diffs, manifest file info
icechunk.storage Storage backends and configuration
icechunk.virtual Virtual chunk containers
icechunk.xarray Xarray integration
icechunk.dask Dask integration

Top-level API#

The following classes, exceptions, and utilities are available directly in the icechunk namespace and are not part of any submodule.

Name Kind Description
Repository class Main entry point for creating and opening repositories
IcechunkStore class Zarr-compatible store backed by an Icechunk session
IcechunkError exception Base exception for Icechunk errors
ConflictError exception Raised on conflicting concurrent writes
RebaseFailedError exception Raised when a rebase cannot be completed
print_debug_info function Print versions of icechunk and related packages
upgrade_icechunk_repository function Migrate a repository to the latest spec version
supported_spec_versions function List supported spec versions

icechunk.Repository#

icechunk.Repository #

An Icechunk repository.

Methods:

Name Description
ancestry

Get the ancestry of a snapshot.

ancestry_graph

Build a visual representation of the commit history.

ancestry_graph_async

Async version of :meth:ancestry_graph.

async_ancestry

Get the ancestry of a snapshot.

chunk_storage_stats

Calculate the total storage used for chunks, in bytes.

chunk_storage_stats_async

Calculate the total storage used for chunks, in bytes (async version).

create

Create a new Icechunk repository.

create_async

Create a new Icechunk repository asynchronously.

create_branch

Create a new branch at the given snapshot.

create_branch_async

Create a new branch at the given snapshot (async version).

create_tag

Create a new tag at the given snapshot.

create_tag_async

Create a new tag at the given snapshot (async version).

default_commit_metadata

Get the current configured default commit metadata for the repository.

delete_branch

Delete a branch.

delete_branch_async

Delete a branch (async version).

delete_tag

Delete a tag.

delete_tag_async

Delete a tag (async version).

diff

Compute an overview of the operations executed from version from to version to.

diff_async

Compute an overview of the operations executed from version from to version to (async version).

disabled_feature_flags

Get feature flags that are currently disabled.

disabled_feature_flags_async

Get feature flags that are currently disabled (async version).

enabled_feature_flags

Get feature flags that are currently enabled.

enabled_feature_flags_async

Get feature flags that are currently enabled (async version).

exists

Check if a repository exists at the given storage location.

exists_async

Check if a repository exists at the given storage location (async version).

expire_snapshots

Expire all snapshots older than a threshold.

expire_snapshots_async

Expire all snapshots older than a threshold (async version).

feature_flags

Get all feature flags and their current state.

feature_flags_async

Get all feature flags and their current state (async version).

fetch_config

Fetch the configuration for the repository saved in storage.

fetch_config_async

Fetch the configuration for the repository saved in storage (async version).

fetch_spec_version

Fetch the spec version of a repository without fully opening it.

fetch_spec_version_async

Fetch the spec version of a repository without fully opening it (async version).

garbage_collect

Delete any objects no longer accessible from any branches or tags.

garbage_collect_async

Delete any objects no longer accessible from any branches or tags (async version).

get_metadata

Get the current configured repository metadata.

get_metadata_async

Get the current configured repository metadata.

get_status

Get the current repository status.

get_status_async

Get the current repository status (async version).

inspect_manifest

Return chunk storage statistics for a manifest.

inspect_manifest_async

Return chunk storage statistics for a manifest.

inspect_repo_info

Return the top-level repository metadata.

inspect_repo_info_async

Return the top-level repository metadata.

inspect_snapshot

Return the node tree stored in a snapshot.

inspect_snapshot_async

Return the node tree stored in a snapshot.

inspect_transaction_log

Return the record of what changed in a single commit.

inspect_transaction_log_async

Return the record of what changed in a single commit.

list_branches

List the branches in the repository.

list_branches_async

List the branches in the repository (async version).

list_manifest_files

Get the manifest files used by the given snapshot ID

list_manifest_files_async

Get the manifest files used by the given snapshot ID

list_tags

List the tags in the repository.

list_tags_async

List the tags in the repository (async version).

lookup_branch

Get the tip snapshot ID of a branch.

lookup_branch_async

Get the tip snapshot ID of a branch (async version).

lookup_snapshot

Get the SnapshotInfo given a snapshot ID

lookup_snapshot_async

Get the SnapshotInfo given a snapshot ID (async version)

lookup_tag

Get the snapshot ID of a tag.

lookup_tag_async

Get the snapshot ID of a tag (async version).

open

Open an existing Icechunk repository.

open_async

Open an existing Icechunk repository asynchronously.

open_or_create

Open an existing Icechunk repository or create a new one if it does not exist.

open_or_create_async

Open an existing Icechunk repository or create a new one if it does not exist (async version).

ops_log

Get a summary of changes to the repository

ops_log_async

Get a summary of changes to the repository

readonly_session

Create a read-only session.

readonly_session_async

Create a read-only session (async version).

rearrange_session

Create a session to move/rename nodes in the Zarr hierarchy.

rearrange_session_async

Create a session to move/rename nodes in the Zarr hierarchy.

reopen

Reopen the repository with new configuration or credentials.

reopen_async

Reopen the repository with new configuration or credentials (async version).

reset_branch

Reset a branch to a specific snapshot.

reset_branch_async

Reset a branch to a specific snapshot (async version).

rewrite_manifests

Rewrite manifests for all arrays.

rewrite_manifests_async

Rewrite manifests for all arrays (async version).

save_config

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

save_config_async

Save the repository configuration to storage (async version).

set_default_commit_metadata

Set the default commit metadata for the repository. This is useful for providing

set_feature_flag

Set a feature flag.

set_feature_flag_async

Set a feature flag (async version).

set_metadata

Set the repository metadata, the passed dict will replace the complete metadata.

set_metadata_async

Set the repository metadata, the passed dict will replace the complete metadata.

set_status

Set the repository status.

set_status_async

Set the repository status (async version).

total_chunks_storage

Calculate the total storage used for chunks, in bytes.

total_chunks_storage_async

Calculate the total storage used for chunks, in bytes (async version).

transaction

Create a transaction on a branch.

update_metadata

Update the repository metadata.

update_metadata_async

Update the repository metadata.

writable_session

Create a writable session on a branch.

writable_session_async

Create a writable session on a branch (async version).

Attributes:

Name Type Description
authorized_virtual_container_prefixes set[str]

Get all authorized virtual chunk container prefixes.

config RepositoryConfig

Get a copy of this repository's config.

metadata dict[str, Any]

Get the current configured repository metadata.

status RepoStatus

Get the current repository status.

storage Storage

Get a copy of this repository's Storage instance.

Source code in icechunk-python/python/icechunk/repository.py
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
class Repository:
    """An Icechunk repository."""

    _repository: PyRepository

    def __init__(self, repository: PyRepository):
        self._repository = repository

    def __repr__(self) -> str:
        return repr(self._repository)

    def __str__(self) -> str:
        return str(self._repository)

    def _repr_html_(self) -> str:
        return self._repository._repr_html_()

    @classmethod
    def create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Create a new Icechunk repository.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : SpecVersion, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    async def create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Create a new Icechunk repository asynchronously.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : SpecVersion, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    def open(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    async def open_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository asynchronously.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    def open_or_create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : SpecVersion, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.


        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open_or_create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
                check_clean_root=check_clean_root,
            )
        )

    @classmethod
    async def open_or_create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: SpecVersion | int | None = None,
        check_clean_root: bool = True,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist (async version).

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : SpecVersion, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_or_create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
                check_clean_root=check_clean_root,
            )
        )

    @staticmethod
    def exists(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> bool:
        """
        Check if a repository exists at the given storage location.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return PyRepository.exists(storage, storage_settings)

    @staticmethod
    async def exists_async(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> bool:
        """
        Check if a repository exists at the given storage location (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return await PyRepository.exists_async(storage, storage_settings)

    @staticmethod
    def fetch_spec_version(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> SpecVersion | None:
        """
        Fetch the spec version of a repository without fully opening it.

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        SpecVersion | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return PyRepository.fetch_spec_version(storage, storage_settings)

    @staticmethod
    async def fetch_spec_version_async(
        storage: Storage,
        storage_settings: StorageSettings | None = None,
    ) -> SpecVersion | None:
        """
        Fetch the spec version of a repository without fully opening it (async version).

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        storage_settings : StorageSettings | None
            Optional storage settings to use for the initial storage call.

        Returns
        -------
        SpecVersion | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return await PyRepository.fetch_spec_version_async(storage, storage_settings)

    def __getstate__(self) -> object:
        return {
            "_repository": self._repository.as_bytes(),
        }

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid repository state")
        self._repository = PyRepository.from_bytes(state["_repository"])

    @staticmethod
    def fetch_config(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return PyRepository.fetch_config(storage)

    @staticmethod
    async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return await PyRepository.fetch_config_async(storage)

    def save_config(self) -> None:
        """
        Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

        Returns
        -------
        None
        """
        return self._repository.save_config()

    async def save_config_async(self) -> None:
        """
        Save the repository configuration to storage (async version).

        Returns
        -------
        None
        """
        return await self._repository.save_config_async()

    @property
    def config(self) -> RepositoryConfig:
        """
        Get a copy of this repository's config.

        Returns
        -------
        RepositoryConfig
            The repository configuration.
        """
        return self._repository.config()

    @property
    def storage(self) -> Storage:
        """
        Get a copy of this repository's Storage instance.

        Returns
        -------
        Storage
            The repository storage instance.
        """
        return self._repository.storage()

    @property
    def authorized_virtual_container_prefixes(self) -> set[str]:
        """
        Get all authorized virtual chunk container prefixes.

        Returns
        -------
        url_prefixes: set[str]
            The set of authorized url prefixes for each virtual chunk container
        """
        return self._repository.authorized_virtual_container_prefixes

    def reopen(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials.

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            self._repository.reopen(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    async def reopen_async(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials (async version).

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            await self._repository.reopen_async(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the default commit metadata for the repository. This is useful for providing
        addition static system conexted metadata to all commits.

        When a commit is made, the metadata will be merged with the metadata provided, with any
        duplicate keys being overwritten by the metadata provided in the commit.

        !!! warning
            This metadata is only applied to sessions that are created after this call. Any open
            writable sessions will not be affected and will not use the new default metadata.

        Parameters
        ----------
        metadata : dict[str, Any]
            The default commit metadata. Pass an empty dict to clear the default metadata.
        """
        return self._repository.set_default_commit_metadata(metadata)

    def default_commit_metadata(self) -> dict[str, Any]:
        """
        Get the current configured default commit metadata for the repository.

        Returns
        -------
        dict[str, Any]
            The default commit metadata.
        """
        return self._repository.default_commit_metadata()

    def get_metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    @property
    def metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    async def get_metadata_async(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return await self._repository.get_metadata_async()

    def set_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        self._repository.set_metadata(metadata)

    async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        await self._repository.set_metadata_async(metadata)

    def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return self._repository.update_metadata(metadata)

    async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return await self._repository.update_metadata_async(metadata)

    def get_status(self) -> RepoStatus:
        """
        Get the current repository status.

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return self._repository.get_status()

    @property
    def status(self) -> RepoStatus:
        """
        Get the current repository status.

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return self._repository.get_status()

    async def get_status_async(self) -> RepoStatus:
        """
        Get the current repository status (async version).

        Returns
        -------
        RepoStatus
            The current status of the repository.
        """
        return await self._repository.get_status_async()

    def set_status(self, status: RepoStatus) -> None:
        """
        Set the repository status.

        Parameters
        ----------
        status : RepoStatus
            The new status for the repository.
        """
        self._repository.set_status(status)

    async def set_status_async(self, status: RepoStatus) -> None:
        """
        Set the repository status (async version).

        Parameters
        ----------
        status : RepoStatus
            The new status for the repository.
        """
        await self._repository.set_status_async(status)

    def feature_flags(self) -> list[FeatureFlag]:
        """
        Get all feature flags and their current state.

        Returns
        -------
        list[FeatureFlag]
            All feature flags with their id, name, default, setting, and effective state.
        """
        return self._repository.feature_flags()

    async def feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get all feature flags and their current state (async version).

        Returns
        -------
        list[FeatureFlag]
            All feature flags with their id, name, default, setting, and effective state.
        """
        return await self._repository.feature_flags_async()

    def enabled_feature_flags(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently enabled.

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is enabled.
        """
        return self._repository.enabled_feature_flags()

    async def enabled_feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently enabled (async version).

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is enabled.
        """
        return await self._repository.enabled_feature_flags_async()

    def disabled_feature_flags(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently disabled.

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is disabled.
        """
        return self._repository.disabled_feature_flags()

    async def disabled_feature_flags_async(self) -> list[FeatureFlag]:
        """
        Get feature flags that are currently disabled (async version).

        Returns
        -------
        list[FeatureFlag]
            Feature flags whose effective state is disabled.
        """
        return await self._repository.disabled_feature_flags_async()

    def set_feature_flag(self, name: str, setting: bool | None) -> None:
        """
        Set a feature flag.

        Parameters
        ----------
        name : str
            The name of the feature flag.
        setting : bool | None
            True to enable, False to disable, None to reset to default.
        """
        self._repository.set_feature_flag(name, setting)

    async def set_feature_flag_async(self, name: str, setting: bool | None) -> None:
        """
        Set a feature flag (async version).

        Parameters
        ----------
        name : str
            The name of the feature flag.
        setting : bool | None
            True to enable, False to disable, None to reset to default.
        """
        await self._repository.set_feature_flag_async(name, setting)

    def ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> Iterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[SnapshotInfo],
            self._repository.async_ancestry(
                branch=branch, tag=tag, snapshot_id=snapshot_id
            ),
        )
        return res

    def async_ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> AsyncIterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        )

    def ancestry_graph(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
        plain: bool = False,
    ) -> AncestryGraph:
        """
        Build a visual representation of the commit history.

        When called with no arguments, shows all branches as a tree.
        When called with one of branch/tag/snapshot_id, shows that ref's linear history.

        Parameters
        ----------
        branch : str, optional
            Show history for this branch.
        tag : str, optional
            Show history from this tag.
        snapshot_id : str, optional
            Show history from this snapshot.
        plain : bool, optional
            If True, render without colors (no ANSI codes in text, no fill colors
            in SVG). Useful for CI logs, piping to files, or LLM agents.

        Returns
        -------
        AncestryGraph
            A displayable object. Use print() for colored terminal output,
            or display in Jupyter for an SVG diagram.
        """
        return self._repository.ancestry_graph(
            branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
        )

    async def ancestry_graph_async(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
        plain: bool = False,
    ) -> AncestryGraph:
        """
        Async version of :meth:`ancestry_graph`.
        """
        return await self._repository.ancestry_graph_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
        )

    def ops_log(self) -> Iterator[Update]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[Update],
            self._repository.async_ops_log(),
        )
        return res

    def ops_log_async(self) -> AsyncIterator[Update]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        return self._repository.async_ops_log()

    def create_branch(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot.

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        self._repository.create_branch(branch, snapshot_id)

    async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot (async version).

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        await self._repository.create_branch_async(branch, snapshot_id)

    def list_branches(self) -> set[str]:
        """
        List the branches in the repository.

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return self._repository.list_branches()

    async def list_branches_async(self) -> set[str]:
        """
        List the branches in the repository (async version).

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return await self._repository.list_branches_async()

    def lookup_branch(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch.

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return self._repository.lookup_branch(branch)

    async def lookup_branch_async(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return await self._repository.lookup_branch_async(branch)

    def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return self._repository.lookup_snapshot(snapshot_id)

    async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID (async version)

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return await self._repository.lookup_snapshot_async(snapshot_id)

    def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return self._repository.list_manifest_files(snapshot_id)

    async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return await self._repository.list_manifest_files_async(snapshot_id)

    def reset_branch(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot.

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

    async def reset_branch_async(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot (async version).

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

    def delete_branch(self, branch: str) -> None:
        """
        Delete a branch.

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        self._repository.delete_branch(branch)

    async def delete_branch_async(self, branch: str) -> None:
        """
        Delete a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_branch_async(branch)

    def delete_tag(self, tag: str) -> None:
        """
        Delete a tag.

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        self._repository.delete_tag(tag)

    async def delete_tag_async(self, tag: str) -> None:
        """
        Delete a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_tag_async(tag)

    def create_tag(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot.

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        self._repository.create_tag(tag, snapshot_id)

    async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot (async version).

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        await self._repository.create_tag_async(tag, snapshot_id)

    def list_tags(self) -> set[str]:
        """
        List the tags in the repository.

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return self._repository.list_tags()

    async def list_tags_async(self) -> set[str]:
        """
        List the tags in the repository (async version).

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return await self._repository.list_tags_async()

    def lookup_tag(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag.

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return self._repository.lookup_tag(tag)

    async def lookup_tag_async(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return await self._repository.lookup_tag_async(tag)

    def diff(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to`.

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return self._repository.diff(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    async def diff_async(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to` (async version).

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return await self._repository.diff_async(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    def readonly_session(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session.

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            self._repository.readonly_session(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    async def readonly_session_async(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session (async version).

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            await self._repository.readonly_session_async(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    def writable_session(self, branch: str) -> Session:
        """
        Create a writable session on a branch.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.writable_session(branch))

    async def writable_session_async(self, branch: str) -> Session:
        """
        Create a writable session on a branch (async version).

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.writable_session_async(branch))

    def rearrange_session(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.rearrange_session(branch))

    async def rearrange_session_async(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.rearrange_session_async(branch))

    @contextmanager
    def transaction(
        self,
        branch: str,
        *,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
    ) -> Iterator[IcechunkStore]:
        """
        Create a transaction on a branch.

        This is a context manager that creates a writable session on the specified branch.
        When the context is exited, the session will be committed to the branch
        using the specified message.

        Parameters
        ----------
        branch : str
            The branch to create the transaction on.
        message : str
            The commit message to use when committing the session.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

        Yields
        -------
        store : IcechunkStore
            A Zarr Store which can be used to interact with the data in the repository.
        """
        session = self.writable_session(branch)
        yield session.store
        session.commit(
            message=message,
            metadata=metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
        )

    def expire_snapshots(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold.

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return self._repository.expire_snapshots(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    async def expire_snapshots_async(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold (async version).

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return await self._repository.expire_snapshots_async(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    def rewrite_manifests(
        self,
        message: str,
        *,
        branch: str,
        metadata: dict[str, Any] | None = None,
        commit_method: CommitMethod = "new_commit",
    ) -> str:
        """
        Rewrite manifests for all arrays.

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        commit_method : CommitMethod, optional
            The commit method to use. Defaults to ``"new_commit"``.
            Use ``"amend"`` to replace the previous commit.
            Note that ``"amend"`` is only supported for spec version 2
            repositories.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return self._repository.rewrite_manifests(
            message, branch=branch, metadata=metadata, commit_method=commit_method
        )

    async def rewrite_manifests_async(
        self,
        message: str,
        *,
        branch: str,
        metadata: dict[str, Any] | None = None,
        commit_method: CommitMethod = "new_commit",
    ) -> str:
        """
        Rewrite manifests for all arrays (async version).

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        commit_method : CommitMethod, optional
            The commit method to use. Defaults to ``"new_commit"``.
            Use ``"amend"`` to replace the previous commit.
            Note that ``"amend"`` is only supported for spec version 2
            repositories.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return await self._repository.rewrite_manifests_async(
            message, branch=branch, metadata=metadata, commit_method=commit_method
        )

    def garbage_collect(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return self._repository.garbage_collect(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def garbage_collect_async(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags (async version).

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return await self._repository.garbage_collect_async(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def chunk_storage_stats(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def chunk_storage_stats_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def total_chunks_storage(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    async def total_chunks_storage_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    def inspect_snapshot(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the node tree stored in a snapshot.

        The result contains every node's path, node ID, type (array or group),
        and manifest references. Useful for verifying node identity across
        commits or inspecting what a snapshot contains.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
            ``manifests``, ``nodes``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_snapshot(snapshot_id, pretty=False)
        )
        return result

    async def inspect_snapshot_async(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the node tree stored in a snapshot.

        The result contains every node's path, node ID, type (array or group),
        and manifest references. Useful for verifying node identity across
        commits or inspecting what a snapshot contains.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
            ``manifests``, ``nodes``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_snapshot_async(snapshot_id, pretty=False)
        )
        return result

    def inspect_repo_info(self) -> dict[str, Any]:
        """
        Return the top-level repository metadata.

        Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
        and the recent update log.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Returns
        -------
        dict[str, Any]
            Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
            ``snapshots``, ``metadata``, ``latest_updates``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_repo_info(pretty=False)
        )
        return result

    async def inspect_repo_info_async(self) -> dict[str, Any]:
        """
        Return the top-level repository metadata.

        Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
        and the recent update log.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Returns
        -------
        dict[str, Any]
            Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
            ``snapshots``, ``metadata``, ``latest_updates``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_repo_info_async(pretty=False)
        )
        return result

    def inspect_manifest(self, manifest_id: str) -> dict[str, Any]:
        """
        Return chunk storage statistics for a manifest.

        Shows per-array chunk counts broken down by storage type
        (inline, native, virtual) and compression details.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        manifest_id : str
            The manifest to inspect. Manifest IDs can be found in the
            ``manifest_refs`` of array nodes returned by
            :meth:`inspect_snapshot`.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``size_bytes``, ``num_arrays``,
            ``total_chunk_refs``, ``total_inline``, ``total_native``,
            ``total_virtual``, ``arrays``, ``compression``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_manifest(manifest_id, pretty=False)
        )
        return result

    async def inspect_manifest_async(self, manifest_id: str) -> dict[str, Any]:
        """
        Return chunk storage statistics for a manifest.

        Shows per-array chunk counts broken down by storage type
        (inline, native, virtual) and compression details.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        manifest_id : str
            The manifest to inspect. Manifest IDs can be found in the
            ``manifest_refs`` of array nodes returned by
            :meth:`inspect_snapshot_async`.

        Returns
        -------
        dict[str, Any]
            Keys: ``id``, ``size_bytes``, ``num_arrays``,
            ``total_chunk_refs``, ``total_inline``, ``total_native``,
            ``total_virtual``, ``arrays``, ``compression``.
        """
        result: dict[str, Any] = json.loads(
            await self._repository.inspect_manifest_async(manifest_id, pretty=False)
        )
        return result

    def inspect_transaction_log(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the record of what changed in a single commit.

        Lists the node IDs of every created, deleted, and updated node,
        the chunk coordinates that were written, and any move operations.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot whose transaction log to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
            ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
            ``updated_chunks``, ``moved_nodes``.
        """
        result: dict[str, Any] = json.loads(
            self._repository.inspect_transaction_log(snapshot_id, pretty=False)
        )
        return result

    async def inspect_transaction_log_async(self, snapshot_id: str) -> dict[str, Any]:
        """
        Return the record of what changed in a single commit.

        Lists the node IDs of every created, deleted, and updated node,
        the chunk coordinates that were written, and any move operations.

        This is a testing/debugging utility. The return type and structure
        may change in future versions.

        Parameters
        ----------
        snapshot_id : str
            The snapshot whose transaction log to inspect.

        Returns
        -------
        dict[str, Any]
            Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
            ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
            ``updated_chunks``, ``moved_nodes``.
        """
        raw = await self._repository.inspect_transaction_log_async(
            snapshot_id, pretty=False
        )
        result: dict[str, Any] = json.loads(raw)
        return result

    @property
    def spec_version(self) -> SpecVersion:
        return self._repository.spec_version

authorized_virtual_container_prefixes property #

authorized_virtual_container_prefixes

Get all authorized virtual chunk container prefixes.

Returns:

Name Type Description
url_prefixes set[str]

The set of authorized url prefixes for each virtual chunk container

config property #

config

Get a copy of this repository's config.

Returns:

Type Description
RepositoryConfig

The repository configuration.

metadata property #

metadata

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

status property #

status

Get the current repository status.

Returns:

Type Description
RepoStatus

The current status of the repository.

storage property #

storage

Get a copy of this repository's Storage instance.

Returns:

Type Description
Storage

The repository storage instance.

ancestry #

ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name Type Description Default
branch str

The branch to get the ancestry of.

None
tag str

The tag to get the ancestry of.

None
snapshot_id str

The snapshot ID to get the ancestry of.

None

Returns:

Type Description
list[SnapshotInfo]

The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> Iterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[SnapshotInfo],
        self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        ),
    )
    return res

ancestry_graph #

ancestry_graph(
    *, branch=None, tag=None, snapshot_id=None, plain=False
)

Build a visual representation of the commit history.

When called with no arguments, shows all branches as a tree. When called with one of branch/tag/snapshot_id, shows that ref's linear history.

Parameters:

Name Type Description Default
branch str

Show history for this branch.

None
tag str

Show history from this tag.

None
snapshot_id str

Show history from this snapshot.

None
plain bool

If True, render without colors (no ANSI codes in text, no fill colors in SVG). Useful for CI logs, piping to files, or LLM agents.

False

Returns:

Type Description
AncestryGraph

A displayable object. Use print() for colored terminal output, or display in Jupyter for an SVG diagram.

Source code in icechunk-python/python/icechunk/repository.py
def ancestry_graph(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
    plain: bool = False,
) -> AncestryGraph:
    """
    Build a visual representation of the commit history.

    When called with no arguments, shows all branches as a tree.
    When called with one of branch/tag/snapshot_id, shows that ref's linear history.

    Parameters
    ----------
    branch : str, optional
        Show history for this branch.
    tag : str, optional
        Show history from this tag.
    snapshot_id : str, optional
        Show history from this snapshot.
    plain : bool, optional
        If True, render without colors (no ANSI codes in text, no fill colors
        in SVG). Useful for CI logs, piping to files, or LLM agents.

    Returns
    -------
    AncestryGraph
        A displayable object. Use print() for colored terminal output,
        or display in Jupyter for an SVG diagram.
    """
    return self._repository.ancestry_graph(
        branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
    )

ancestry_graph_async async #

ancestry_graph_async(
    *, branch=None, tag=None, snapshot_id=None, plain=False
)

Async version of :meth:ancestry_graph.

Source code in icechunk-python/python/icechunk/repository.py
async def ancestry_graph_async(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
    plain: bool = False,
) -> AncestryGraph:
    """
    Async version of :meth:`ancestry_graph`.
    """
    return await self._repository.ancestry_graph_async(
        branch=branch, tag=tag, snapshot_id=snapshot_id, plain=plain
    )

async_ancestry #

async_ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name Type Description Default
branch str

The branch to get the ancestry of.

None
tag str

The tag to get the ancestry of.

None
snapshot_id str

The snapshot ID to get the ancestry of.

None

Returns:

Type Description
list[SnapshotInfo]

The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def async_ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> AsyncIterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return self._repository.async_ancestry(
        branch=branch, tag=tag, snapshot_id=snapshot_id
    )

chunk_storage_stats #

chunk_storage_stats(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
def chunk_storage_stats(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

chunk_storage_stats_async async #

chunk_storage_stats_async(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
async def chunk_storage_stats_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

create classmethod #

create(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    spec_version=None,
    check_clean_root=True,
)

Create a new Icechunk repository. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository configuration. If not provided, a default configuration will be used.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
spec_version SpecVersion

Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Create a new Icechunk repository.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : SpecVersion, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
            check_clean_root=check_clean_root,
        )
    )

create_async async classmethod #

create_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    spec_version=None,
    check_clean_root=True,
)

Create a new Icechunk repository asynchronously. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository configuration. If not provided, a default configuration will be used.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
spec_version SpecVersion

Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Create a new Icechunk repository asynchronously.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : SpecVersion, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
            check_clean_root=check_clean_root,
        )
    )

create_branch #

create_branch(branch, snapshot_id)

Create a new branch at the given snapshot.

Parameters:

Name Type Description Default
branch str

The name of the branch to create.

required
snapshot_id str

The snapshot ID to create the branch at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def create_branch(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot.

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    self._repository.create_branch(branch, snapshot_id)

create_branch_async async #

create_branch_async(branch, snapshot_id)

Create a new branch at the given snapshot (async version).

Parameters:

Name Type Description Default
branch str

The name of the branch to create.

required
snapshot_id str

The snapshot ID to create the branch at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot (async version).

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    await self._repository.create_branch_async(branch, snapshot_id)

create_tag #

create_tag(tag, snapshot_id)

Create a new tag at the given snapshot.

Parameters:

Name Type Description Default
tag str

The name of the tag to create.

required
snapshot_id str

The snapshot ID to create the tag at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def create_tag(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot.

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    self._repository.create_tag(tag, snapshot_id)

create_tag_async async #

create_tag_async(tag, snapshot_id)

Create a new tag at the given snapshot (async version).

Parameters:

Name Type Description Default
tag str

The name of the tag to create.

required
snapshot_id str

The snapshot ID to create the tag at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot (async version).

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    await self._repository.create_tag_async(tag, snapshot_id)

default_commit_metadata #

default_commit_metadata()

Get the current configured default commit metadata for the repository.

Returns:

Type Description
dict[str, Any]

The default commit metadata.

Source code in icechunk-python/python/icechunk/repository.py
def default_commit_metadata(self) -> dict[str, Any]:
    """
    Get the current configured default commit metadata for the repository.

    Returns
    -------
    dict[str, Any]
        The default commit metadata.
    """
    return self._repository.default_commit_metadata()

delete_branch #

delete_branch(branch)

Delete a branch.

Parameters:

Name Type Description Default
branch str

The branch to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def delete_branch(self, branch: str) -> None:
    """
    Delete a branch.

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    self._repository.delete_branch(branch)

delete_branch_async async #

delete_branch_async(branch)

Delete a branch (async version).

Parameters:

Name Type Description Default
branch str

The branch to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def delete_branch_async(self, branch: str) -> None:
    """
    Delete a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_branch_async(branch)

delete_tag #

delete_tag(tag)

Delete a tag.

Parameters:

Name Type Description Default
tag str

The tag to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def delete_tag(self, tag: str) -> None:
    """
    Delete a tag.

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    self._repository.delete_tag(tag)

delete_tag_async async #

delete_tag_async(tag)

Delete a tag (async version).

Parameters:

Name Type Description Default
tag str

The tag to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def delete_tag_async(self, tag: str) -> None:
    """
    Delete a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_tag_async(tag)

diff #

diff(
    *,
    from_branch=None,
    from_tag=None,
    from_snapshot_id=None,
    to_branch=None,
    to_tag=None,
    to_snapshot_id=None,
)

Compute an overview of the operations executed from version from to version to.

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type Description
Diff

The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py
def diff(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to`.

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return self._repository.diff(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

diff_async async #

diff_async(
    *,
    from_branch=None,
    from_tag=None,
    from_snapshot_id=None,
    to_branch=None,
    to_tag=None,
    to_snapshot_id=None,
)

Compute an overview of the operations executed from version from to version to (async version).

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type Description
Diff

The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py
async def diff_async(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to` (async version).

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return await self._repository.diff_async(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

disabled_feature_flags #

disabled_feature_flags()

Get feature flags that are currently disabled.

Returns:

Type Description
list[FeatureFlag]

Feature flags whose effective state is disabled.

Source code in icechunk-python/python/icechunk/repository.py
def disabled_feature_flags(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently disabled.

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is disabled.
    """
    return self._repository.disabled_feature_flags()

disabled_feature_flags_async async #

disabled_feature_flags_async()

Get feature flags that are currently disabled (async version).

Returns:

Type Description
list[FeatureFlag]

Feature flags whose effective state is disabled.

Source code in icechunk-python/python/icechunk/repository.py
async def disabled_feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently disabled (async version).

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is disabled.
    """
    return await self._repository.disabled_feature_flags_async()

enabled_feature_flags #

enabled_feature_flags()

Get feature flags that are currently enabled.

Returns:

Type Description
list[FeatureFlag]

Feature flags whose effective state is enabled.

Source code in icechunk-python/python/icechunk/repository.py
def enabled_feature_flags(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently enabled.

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is enabled.
    """
    return self._repository.enabled_feature_flags()

enabled_feature_flags_async async #

enabled_feature_flags_async()

Get feature flags that are currently enabled (async version).

Returns:

Type Description
list[FeatureFlag]

Feature flags whose effective state is enabled.

Source code in icechunk-python/python/icechunk/repository.py
async def enabled_feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get feature flags that are currently enabled (async version).

    Returns
    -------
    list[FeatureFlag]
        Feature flags whose effective state is enabled.
    """
    return await self._repository.enabled_feature_flags_async()

exists staticmethod #

exists(storage, storage_settings=None)

Check if a repository exists at the given storage location.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
storage_settings StorageSettings | None

Optional storage settings to use for the initial storage call.

None

Returns:

Type Description
bool

True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def exists(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> bool:
    """
    Check if a repository exists at the given storage location.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return PyRepository.exists(storage, storage_settings)

exists_async async staticmethod #

exists_async(storage, storage_settings=None)

Check if a repository exists at the given storage location (async version).

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
storage_settings StorageSettings | None

Optional storage settings to use for the initial storage call.

None

Returns:

Type Description
bool

True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def exists_async(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> bool:
    """
    Check if a repository exists at the given storage location (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return await PyRepository.exists_async(storage, storage_settings)

expire_snapshots #

expire_snapshots(
    older_than,
    *,
    delete_expired_branches=False,
    delete_expired_tags=False,
)

Expire all snapshots older than a threshold.

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name Type Description Default
older_than datetime

Expire snapshots older than this time.

required
delete_expired_branches bool

Whether to delete any branches that now have only expired snapshots.

False
delete_expired_tags bool

Whether to delete any tags associated with expired snapshots

False

Returns:

Type Description
set of expires snapshot IDs
Source code in icechunk-python/python/icechunk/repository.py
def expire_snapshots(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold.

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return self._repository.expire_snapshots(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

expire_snapshots_async async #

expire_snapshots_async(
    older_than,
    *,
    delete_expired_branches=False,
    delete_expired_tags=False,
)

Expire all snapshots older than a threshold (async version).

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name Type Description Default
older_than datetime

Expire snapshots older than this time.

required
delete_expired_branches bool

Whether to delete any branches that now have only expired snapshots.

False
delete_expired_tags bool

Whether to delete any tags associated with expired snapshots

False

Returns:

Type Description
set of expires snapshot IDs
Source code in icechunk-python/python/icechunk/repository.py
async def expire_snapshots_async(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold (async version).

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return await self._repository.expire_snapshots_async(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

feature_flags #

feature_flags()

Get all feature flags and their current state.

Returns:

Type Description
list[FeatureFlag]

All feature flags with their id, name, default, setting, and effective state.

Source code in icechunk-python/python/icechunk/repository.py
def feature_flags(self) -> list[FeatureFlag]:
    """
    Get all feature flags and their current state.

    Returns
    -------
    list[FeatureFlag]
        All feature flags with their id, name, default, setting, and effective state.
    """
    return self._repository.feature_flags()

feature_flags_async async #

feature_flags_async()

Get all feature flags and their current state (async version).

Returns:

Type Description
list[FeatureFlag]

All feature flags with their id, name, default, setting, and effective state.

Source code in icechunk-python/python/icechunk/repository.py
async def feature_flags_async(self) -> list[FeatureFlag]:
    """
    Get all feature flags and their current state (async version).

    Returns
    -------
    list[FeatureFlag]
        All feature flags with their id, name, default, setting, and effective state.
    """
    return await self._repository.feature_flags_async()

fetch_config staticmethod #

fetch_config(storage)

Fetch the configuration for the repository saved in storage.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
RepositoryConfig | None

The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def fetch_config(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return PyRepository.fetch_config(storage)

fetch_config_async async staticmethod #

fetch_config_async(storage)

Fetch the configuration for the repository saved in storage (async version).

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
RepositoryConfig | None

The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return await PyRepository.fetch_config_async(storage)

fetch_spec_version staticmethod #

fetch_spec_version(storage, storage_settings=None)

Fetch the spec version of a repository without fully opening it.

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
storage_settings StorageSettings | None

Optional storage settings to use for the initial storage call.

None

Returns:

Type Description
SpecVersion | None

The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def fetch_spec_version(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> SpecVersion | None:
    """
    Fetch the spec version of a repository without fully opening it.

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    SpecVersion | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return PyRepository.fetch_spec_version(storage, storage_settings)

fetch_spec_version_async async staticmethod #

fetch_spec_version_async(storage, storage_settings=None)

Fetch the spec version of a repository without fully opening it (async version).

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
storage_settings StorageSettings | None

Optional storage settings to use for the initial storage call.

None

Returns:

Type Description
SpecVersion | None

The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def fetch_spec_version_async(
    storage: Storage,
    storage_settings: StorageSettings | None = None,
) -> SpecVersion | None:
    """
    Fetch the spec version of a repository without fully opening it (async version).

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    storage_settings : StorageSettings | None
        Optional storage settings to use for the initial storage call.

    Returns
    -------
    SpecVersion | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return await PyRepository.fetch_spec_version_async(storage, storage_settings)

garbage_collect #

garbage_collect(
    delete_object_older_than,
    *,
    dry_run=False,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Delete any objects no longer accessible from any branches or tags.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name Type Description Default
delete_object_older_than datetime

Delete objects older than this time.

required
dry_run bool

Report results but don't delete any objects

False
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500

Returns:

Type Description
GCSummary

Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py
def garbage_collect(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return self._repository.garbage_collect(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

garbage_collect_async async #

garbage_collect_async(
    delete_object_older_than,
    *,
    dry_run=False,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Delete any objects no longer accessible from any branches or tags (async version).

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name Type Description Default
delete_object_older_than datetime

Delete objects older than this time.

required
dry_run bool

Report results but don't delete any objects

False
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500

Returns:

Type Description
GCSummary

Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py
async def garbage_collect_async(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags (async version).

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return await self._repository.garbage_collect_async(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

get_metadata #

get_metadata()

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py
def get_metadata(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return self._repository.get_metadata()

get_metadata_async async #

get_metadata_async()

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py
async def get_metadata_async(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return await self._repository.get_metadata_async()

get_status #

get_status()

Get the current repository status.

Returns:

Type Description
RepoStatus

The current status of the repository.

Source code in icechunk-python/python/icechunk/repository.py
def get_status(self) -> RepoStatus:
    """
    Get the current repository status.

    Returns
    -------
    RepoStatus
        The current status of the repository.
    """
    return self._repository.get_status()

get_status_async async #

get_status_async()

Get the current repository status (async version).

Returns:

Type Description
RepoStatus

The current status of the repository.

Source code in icechunk-python/python/icechunk/repository.py
async def get_status_async(self) -> RepoStatus:
    """
    Get the current repository status (async version).

    Returns
    -------
    RepoStatus
        The current status of the repository.
    """
    return await self._repository.get_status_async()

inspect_manifest #

inspect_manifest(manifest_id)

Return chunk storage statistics for a manifest.

Shows per-array chunk counts broken down by storage type (inline, native, virtual) and compression details.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
manifest_id str

The manifest to inspect. Manifest IDs can be found in the manifest_refs of array nodes returned by :meth:inspect_snapshot.

required

Returns:

Type Description
dict[str, Any]

Keys: id, size_bytes, num_arrays, total_chunk_refs, total_inline, total_native, total_virtual, arrays, compression.

Source code in icechunk-python/python/icechunk/repository.py
def inspect_manifest(self, manifest_id: str) -> dict[str, Any]:
    """
    Return chunk storage statistics for a manifest.

    Shows per-array chunk counts broken down by storage type
    (inline, native, virtual) and compression details.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    manifest_id : str
        The manifest to inspect. Manifest IDs can be found in the
        ``manifest_refs`` of array nodes returned by
        :meth:`inspect_snapshot`.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``size_bytes``, ``num_arrays``,
        ``total_chunk_refs``, ``total_inline``, ``total_native``,
        ``total_virtual``, ``arrays``, ``compression``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_manifest(manifest_id, pretty=False)
    )
    return result

inspect_manifest_async async #

inspect_manifest_async(manifest_id)

Return chunk storage statistics for a manifest.

Shows per-array chunk counts broken down by storage type (inline, native, virtual) and compression details.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
manifest_id str

The manifest to inspect. Manifest IDs can be found in the manifest_refs of array nodes returned by :meth:inspect_snapshot_async.

required

Returns:

Type Description
dict[str, Any]

Keys: id, size_bytes, num_arrays, total_chunk_refs, total_inline, total_native, total_virtual, arrays, compression.

Source code in icechunk-python/python/icechunk/repository.py
async def inspect_manifest_async(self, manifest_id: str) -> dict[str, Any]:
    """
    Return chunk storage statistics for a manifest.

    Shows per-array chunk counts broken down by storage type
    (inline, native, virtual) and compression details.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    manifest_id : str
        The manifest to inspect. Manifest IDs can be found in the
        ``manifest_refs`` of array nodes returned by
        :meth:`inspect_snapshot_async`.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``size_bytes``, ``num_arrays``,
        ``total_chunk_refs``, ``total_inline``, ``total_native``,
        ``total_virtual``, ``arrays``, ``compression``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_manifest_async(manifest_id, pretty=False)
    )
    return result

inspect_repo_info #

inspect_repo_info()

Return the top-level repository metadata.

Includes the branch-to-snapshot mapping, tags, snapshot ancestry, and the recent update log.

This is a testing/debugging utility. The return type and structure may change in future versions.

Returns:

Type Description
dict[str, Any]

Keys: spec_version, branches, tags, deleted_tags, snapshots, metadata, latest_updates.

Source code in icechunk-python/python/icechunk/repository.py
def inspect_repo_info(self) -> dict[str, Any]:
    """
    Return the top-level repository metadata.

    Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
    and the recent update log.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Returns
    -------
    dict[str, Any]
        Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
        ``snapshots``, ``metadata``, ``latest_updates``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_repo_info(pretty=False)
    )
    return result

inspect_repo_info_async async #

inspect_repo_info_async()

Return the top-level repository metadata.

Includes the branch-to-snapshot mapping, tags, snapshot ancestry, and the recent update log.

This is a testing/debugging utility. The return type and structure may change in future versions.

Returns:

Type Description
dict[str, Any]

Keys: spec_version, branches, tags, deleted_tags, snapshots, metadata, latest_updates.

Source code in icechunk-python/python/icechunk/repository.py
async def inspect_repo_info_async(self) -> dict[str, Any]:
    """
    Return the top-level repository metadata.

    Includes the branch-to-snapshot mapping, tags, snapshot ancestry,
    and the recent update log.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Returns
    -------
    dict[str, Any]
        Keys: ``spec_version``, ``branches``, ``tags``, ``deleted_tags``,
        ``snapshots``, ``metadata``, ``latest_updates``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_repo_info_async(pretty=False)
    )
    return result

inspect_snapshot #

inspect_snapshot(snapshot_id)

Return the node tree stored in a snapshot.

The result contains every node's path, node ID, type (array or group), and manifest references. Useful for verifying node identity across commits or inspecting what a snapshot contains.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
snapshot_id str

The snapshot to inspect.

required

Returns:

Type Description
dict[str, Any]

Keys: id, flushed_at, commit_message, metadata, manifests, nodes.

Source code in icechunk-python/python/icechunk/repository.py
def inspect_snapshot(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the node tree stored in a snapshot.

    The result contains every node's path, node ID, type (array or group),
    and manifest references. Useful for verifying node identity across
    commits or inspecting what a snapshot contains.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
        ``manifests``, ``nodes``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_snapshot(snapshot_id, pretty=False)
    )
    return result

inspect_snapshot_async async #

inspect_snapshot_async(snapshot_id)

Return the node tree stored in a snapshot.

The result contains every node's path, node ID, type (array or group), and manifest references. Useful for verifying node identity across commits or inspecting what a snapshot contains.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
snapshot_id str

The snapshot to inspect.

required

Returns:

Type Description
dict[str, Any]

Keys: id, flushed_at, commit_message, metadata, manifests, nodes.

Source code in icechunk-python/python/icechunk/repository.py
async def inspect_snapshot_async(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the node tree stored in a snapshot.

    The result contains every node's path, node ID, type (array or group),
    and manifest references. Useful for verifying node identity across
    commits or inspecting what a snapshot contains.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``id``, ``flushed_at``, ``commit_message``, ``metadata``,
        ``manifests``, ``nodes``.
    """
    result: dict[str, Any] = json.loads(
        await self._repository.inspect_snapshot_async(snapshot_id, pretty=False)
    )
    return result

inspect_transaction_log #

inspect_transaction_log(snapshot_id)

Return the record of what changed in a single commit.

Lists the node IDs of every created, deleted, and updated node, the chunk coordinates that were written, and any move operations.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
snapshot_id str

The snapshot whose transaction log to inspect.

required

Returns:

Type Description
dict[str, Any]

Keys: new_groups, new_arrays, deleted_groups, deleted_arrays, updated_groups, updated_arrays, updated_chunks, moved_nodes.

Source code in icechunk-python/python/icechunk/repository.py
def inspect_transaction_log(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the record of what changed in a single commit.

    Lists the node IDs of every created, deleted, and updated node,
    the chunk coordinates that were written, and any move operations.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot whose transaction log to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
        ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
        ``updated_chunks``, ``moved_nodes``.
    """
    result: dict[str, Any] = json.loads(
        self._repository.inspect_transaction_log(snapshot_id, pretty=False)
    )
    return result

inspect_transaction_log_async async #

inspect_transaction_log_async(snapshot_id)

Return the record of what changed in a single commit.

Lists the node IDs of every created, deleted, and updated node, the chunk coordinates that were written, and any move operations.

This is a testing/debugging utility. The return type and structure may change in future versions.

Parameters:

Name Type Description Default
snapshot_id str

The snapshot whose transaction log to inspect.

required

Returns:

Type Description
dict[str, Any]

Keys: new_groups, new_arrays, deleted_groups, deleted_arrays, updated_groups, updated_arrays, updated_chunks, moved_nodes.

Source code in icechunk-python/python/icechunk/repository.py
async def inspect_transaction_log_async(self, snapshot_id: str) -> dict[str, Any]:
    """
    Return the record of what changed in a single commit.

    Lists the node IDs of every created, deleted, and updated node,
    the chunk coordinates that were written, and any move operations.

    This is a testing/debugging utility. The return type and structure
    may change in future versions.

    Parameters
    ----------
    snapshot_id : str
        The snapshot whose transaction log to inspect.

    Returns
    -------
    dict[str, Any]
        Keys: ``new_groups``, ``new_arrays``, ``deleted_groups``,
        ``deleted_arrays``, ``updated_groups``, ``updated_arrays``,
        ``updated_chunks``, ``moved_nodes``.
    """
    raw = await self._repository.inspect_transaction_log_async(
        snapshot_id, pretty=False
    )
    result: dict[str, Any] = json.loads(raw)
    return result

list_branches #

list_branches()

List the branches in the repository.

Returns:

Type Description
set[str]

A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py
def list_branches(self) -> set[str]:
    """
    List the branches in the repository.

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return self._repository.list_branches()

list_branches_async async #

list_branches_async()

List the branches in the repository (async version).

Returns:

Type Description
set[str]

A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py
async def list_branches_async(self) -> set[str]:
    """
    List the branches in the repository (async version).

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return await self._repository.list_branches_async()

list_manifest_files #

list_manifest_files(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to get information for

required

Returns:

Type Description
list[ManifestFileInfo]
Source code in icechunk-python/python/icechunk/repository.py
def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return self._repository.list_manifest_files(snapshot_id)

list_manifest_files_async async #

list_manifest_files_async(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to get information for

required

Returns:

Type Description
list[ManifestFileInfo]
Source code in icechunk-python/python/icechunk/repository.py
async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return await self._repository.list_manifest_files_async(snapshot_id)

list_tags #

list_tags()

List the tags in the repository.

Returns:

Type Description
set[str]

A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py
def list_tags(self) -> set[str]:
    """
    List the tags in the repository.

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return self._repository.list_tags()

list_tags_async async #

list_tags_async()

List the tags in the repository (async version).

Returns:

Type Description
set[str]

A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py
async def list_tags_async(self) -> set[str]:
    """
    List the tags in the repository (async version).

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return await self._repository.list_tags_async()

lookup_branch #

lookup_branch(branch)

Get the tip snapshot ID of a branch.

Parameters:

Name Type Description Default
branch str

The branch to get the tip of.

required

Returns:

Type Description
str

The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py
def lookup_branch(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch.

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return self._repository.lookup_branch(branch)

lookup_branch_async async #

lookup_branch_async(branch)

Get the tip snapshot ID of a branch (async version).

Parameters:

Name Type Description Default
branch str

The branch to get the tip of.

required

Returns:

Type Description
str

The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def lookup_branch_async(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return await self._repository.lookup_branch_async(branch)

lookup_snapshot #

lookup_snapshot(snapshot_id)

Get the SnapshotInfo given a snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to look up

required

Returns:

Type Description
SnapshotInfo
Source code in icechunk-python/python/icechunk/repository.py
def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return self._repository.lookup_snapshot(snapshot_id)

lookup_snapshot_async async #

lookup_snapshot_async(snapshot_id)

Get the SnapshotInfo given a snapshot ID (async version)

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to look up

required

Returns:

Type Description
SnapshotInfo
Source code in icechunk-python/python/icechunk/repository.py
async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID (async version)

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return await self._repository.lookup_snapshot_async(snapshot_id)

lookup_tag #

lookup_tag(tag)

Get the snapshot ID of a tag.

Parameters:

Name Type Description Default
tag str

The tag to get the snapshot ID of.

required

Returns:

Type Description
str

The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py
def lookup_tag(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag.

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return self._repository.lookup_tag(tag)

lookup_tag_async async #

lookup_tag_async(tag)

Get the snapshot ID of a tag (async version).

Parameters:

Name Type Description Default
tag str

The tag to get the snapshot ID of.

required

Returns:

Type Description
str

The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py
async def lookup_tag_async(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return await self._repository.lookup_tag_async(tag)

open classmethod #

open(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
)

Open an existing Icechunk repository.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def open(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_async async classmethod #

open_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
)

Open an existing Icechunk repository asynchronously.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def open_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository asynchronously.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_or_create classmethod #

open_or_create(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    create_version=None,
    check_clean_root=True,
)

Open an existing Icechunk repository or create a new one if it does not exist.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
create_version SpecVersion

Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def open_or_create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : SpecVersion, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.


    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open_or_create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
            check_clean_root=check_clean_root,
        )
    )

open_or_create_async async classmethod #

open_or_create_async(
    storage,
    config=None,
    authorize_virtual_chunk_access=None,
    create_version=None,
    check_clean_root=True,
)

Open an existing Icechunk repository or create a new one if it does not exist (async version).

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
create_version SpecVersion

Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def open_or_create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: SpecVersion | int | None = None,
    check_clean_root: bool = True,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist (async version).

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](../understanding/parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : SpecVersion, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_or_create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
            check_clean_root=check_clean_root,
        )
    )

ops_log #

ops_log()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py
def ops_log(self) -> Iterator[Update]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[Update],
        self._repository.async_ops_log(),
    )
    return res

ops_log_async #

ops_log_async()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py
def ops_log_async(self) -> AsyncIterator[Update]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    return self._repository.async_ops_log()

readonly_session #

readonly_session(
    branch=None, *, tag=None, snapshot_id=None, as_of=None
)

Create a read-only session.

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name Type Description Default
branch str

If provided, the branch to create the session on.

None
tag str

If provided, the tag to create the session on.

None
snapshot_id str

If provided, the snapshot ID to create the session on.

None
as_of datetime | None

When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime

None

Returns:

Type Description
Session

The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def readonly_session(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session.

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        self._repository.readonly_session(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

readonly_session_async async #

readonly_session_async(
    branch=None, *, tag=None, snapshot_id=None, as_of=None
)

Create a read-only session (async version).

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name Type Description Default
branch str

If provided, the branch to create the session on.

None
tag str

If provided, the tag to create the session on.

None
snapshot_id str

If provided, the snapshot ID to create the session on.

None
as_of datetime | None

When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime

None

Returns:

Type Description
Session

The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
async def readonly_session_async(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session (async version).

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        await self._repository.readonly_session_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

rearrange_session #

rearrange_session(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
def rearrange_session(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.rearrange_session(branch))

rearrange_session_async async #

rearrange_session_async(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def rearrange_session_async(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.rearrange_session_async(branch))

reopen #

reopen(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials.

Parameters:

Name Type Description Default
config RepositoryConfig

The new repository configuration. If not provided, uses the existing configuration.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

New virtual chunk access credentials.

None

Returns:

Type Description
Self

A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py
def reopen(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials.

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        self._repository.reopen(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reopen_async async #

reopen_async(
    config=None, authorize_virtual_chunk_access=None
)

Reopen the repository with new configuration or credentials (async version).

Parameters:

Name Type Description Default
config RepositoryConfig

The new repository configuration. If not provided, uses the existing configuration.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

New virtual chunk access credentials.

None

Returns:

Type Description
Self

A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py
async def reopen_async(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials (async version).

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        await self._repository.reopen_async(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reset_branch #

reset_branch(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot.

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name Type Description Default
branch str

The branch to reset.

required
snapshot_id str

The snapshot ID to reset the branch to.

required
from_snapshot_id str | None

If passed, the reset will only be executed if the branch currently points to from_snapshot_id.

None

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def reset_branch(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot.

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

reset_branch_async async #

reset_branch_async(
    branch, snapshot_id, *, from_snapshot_id=None
)

Reset a branch to a specific snapshot (async version).

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name Type Description Default
branch str

The branch to reset.

required
snapshot_id str

The snapshot ID to reset the branch to.

required
from_snapshot_id str | None

If passed, the reset will only be executed if the branch currently points to from_snapshot_id.

None

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def reset_branch_async(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot (async version).

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

rewrite_manifests #

rewrite_manifests(
    message,
    *,
    branch,
    metadata=None,
    commit_method="new_commit",
)

Rewrite manifests for all arrays.

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
branch str

The branch to commit to.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
commit_method CommitMethod

The commit method to use. Defaults to "new_commit". Use "amend" to replace the previous commit. Note that "amend" is only supported for spec version 2 repositories.

'new_commit'

Returns:

Type Description
str

The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py
def rewrite_manifests(
    self,
    message: str,
    *,
    branch: str,
    metadata: dict[str, Any] | None = None,
    commit_method: CommitMethod = "new_commit",
) -> str:
    """
    Rewrite manifests for all arrays.

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    commit_method : CommitMethod, optional
        The commit method to use. Defaults to ``"new_commit"``.
        Use ``"amend"`` to replace the previous commit.
        Note that ``"amend"`` is only supported for spec version 2
        repositories.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return self._repository.rewrite_manifests(
        message, branch=branch, metadata=metadata, commit_method=commit_method
    )

rewrite_manifests_async async #

rewrite_manifests_async(
    message,
    *,
    branch,
    metadata=None,
    commit_method="new_commit",
)

Rewrite manifests for all arrays (async version).

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
branch str

The branch to commit to.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
commit_method CommitMethod

The commit method to use. Defaults to "new_commit". Use "amend" to replace the previous commit. Note that "amend" is only supported for spec version 2 repositories.

'new_commit'

Returns:

Type Description
str

The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py
async def rewrite_manifests_async(
    self,
    message: str,
    *,
    branch: str,
    metadata: dict[str, Any] | None = None,
    commit_method: CommitMethod = "new_commit",
) -> str:
    """
    Rewrite manifests for all arrays (async version).

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    commit_method : CommitMethod, optional
        The commit method to use. Defaults to ``"new_commit"``.
        Use ``"amend"`` to replace the previous commit.
        Note that ``"amend"`` is only supported for spec version 2
        repositories.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return await self._repository.rewrite_manifests_async(
        message, branch=branch, metadata=metadata, commit_method=commit_method
    )

save_config #

save_config()

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def save_config(self) -> None:
    """
    Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

    Returns
    -------
    None
    """
    return self._repository.save_config()

save_config_async async #

save_config_async()

Save the repository configuration to storage (async version).

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def save_config_async(self) -> None:
    """
    Save the repository configuration to storage (async version).

    Returns
    -------
    None
    """
    return await self._repository.save_config_async()

set_default_commit_metadata #

set_default_commit_metadata(metadata)

Set the default commit metadata for the repository. This is useful for providing addition static system conexted metadata to all commits.

When a commit is made, the metadata will be merged with the metadata provided, with any duplicate keys being overwritten by the metadata provided in the commit.

Warning

This metadata is only applied to sessions that are created after this call. Any open writable sessions will not be affected and will not use the new default metadata.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The default commit metadata. Pass an empty dict to clear the default metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the default commit metadata for the repository. This is useful for providing
    addition static system conexted metadata to all commits.

    When a commit is made, the metadata will be merged with the metadata provided, with any
    duplicate keys being overwritten by the metadata provided in the commit.

    !!! warning
        This metadata is only applied to sessions that are created after this call. Any open
        writable sessions will not be affected and will not use the new default metadata.

    Parameters
    ----------
    metadata : dict[str, Any]
        The default commit metadata. Pass an empty dict to clear the default metadata.
    """
    return self._repository.set_default_commit_metadata(metadata)

set_feature_flag #

set_feature_flag(name, setting)

Set a feature flag.

Parameters:

Name Type Description Default
name str

The name of the feature flag.

required
setting bool | None

True to enable, False to disable, None to reset to default.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_feature_flag(self, name: str, setting: bool | None) -> None:
    """
    Set a feature flag.

    Parameters
    ----------
    name : str
        The name of the feature flag.
    setting : bool | None
        True to enable, False to disable, None to reset to default.
    """
    self._repository.set_feature_flag(name, setting)

set_feature_flag_async async #

set_feature_flag_async(name, setting)

Set a feature flag (async version).

Parameters:

Name Type Description Default
name str

The name of the feature flag.

required
setting bool | None

True to enable, False to disable, None to reset to default.

required
Source code in icechunk-python/python/icechunk/repository.py
async def set_feature_flag_async(self, name: str, setting: bool | None) -> None:
    """
    Set a feature flag (async version).

    Parameters
    ----------
    name : str
        The name of the feature flag.
    setting : bool | None
        True to enable, False to disable, None to reset to default.
    """
    await self._repository.set_feature_flag_async(name, setting)

set_metadata #

set_metadata(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name Type Description Default
metadata dict[str, Any]

The value to use as repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    self._repository.set_metadata(metadata)

set_metadata_async async #

set_metadata_async(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name Type Description Default
metadata dict[str, Any]

The value to use as repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    await self._repository.set_metadata_async(metadata)

set_status #

set_status(status)

Set the repository status.

Parameters:

Name Type Description Default
status RepoStatus

The new status for the repository.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_status(self, status: RepoStatus) -> None:
    """
    Set the repository status.

    Parameters
    ----------
    status : RepoStatus
        The new status for the repository.
    """
    self._repository.set_status(status)

set_status_async async #

set_status_async(status)

Set the repository status (async version).

Parameters:

Name Type Description Default
status RepoStatus

The new status for the repository.

required
Source code in icechunk-python/python/icechunk/repository.py
async def set_status_async(self, status: RepoStatus) -> None:
    """
    Set the repository status (async version).

    Parameters
    ----------
    status : RepoStatus
        The new status for the repository.
    """
    await self._repository.set_status_async(status)

total_chunks_storage #

total_chunks_storage(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
def total_chunks_storage(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

total_chunks_storage_async async #

total_chunks_storage_async(
    *,
    max_snapshots_in_memory=50,
    max_compressed_manifest_mem_bytes=512 * 1024 * 1024,
    max_concurrent_manifest_fetches=500,
)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
async def total_chunks_storage_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

transaction #

transaction(
    branch,
    *,
    message,
    metadata=None,
    rebase_with=None,
    rebase_tries=1000,
)

Create a transaction on a branch.

This is a context manager that creates a writable session on the specified branch. When the context is exited, the session will be committed to the branch using the specified message.

Parameters:

Name Type Description Default
branch str

The branch to create the transaction on.

required
message str

The commit message to use when committing the session.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
rebase_with ConflictSolver | None

If other session committed while the current session was writing, use Session.rebase with this solver.

None
rebase_tries int

If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

1000

Yields:

Name Type Description
store IcechunkStore

A Zarr Store which can be used to interact with the data in the repository.

Source code in icechunk-python/python/icechunk/repository.py
@contextmanager
def transaction(
    self,
    branch: str,
    *,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
) -> Iterator[IcechunkStore]:
    """
    Create a transaction on a branch.

    This is a context manager that creates a writable session on the specified branch.
    When the context is exited, the session will be committed to the branch
    using the specified message.

    Parameters
    ----------
    branch : str
        The branch to create the transaction on.
    message : str
        The commit message to use when committing the session.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

    Yields
    -------
    store : IcechunkStore
        A Zarr Store which can be used to interact with the data in the repository.
    """
    session = self.writable_session(branch)
    yield session.store
    session.commit(
        message=message,
        metadata=metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
    )

update_metadata #

update_metadata(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The dict to merge into the repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return self._repository.update_metadata(metadata)

update_metadata_async async #

update_metadata_async(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The dict to merge into the repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return await self._repository.update_metadata_async(metadata)

writable_session #

writable_session(branch)

Create a writable session on a branch.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
def writable_session(self, branch: str) -> Session:
    """
    Create a writable session on a branch.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.writable_session(branch))

writable_session_async async #

writable_session_async(branch)

Create a writable session on a branch (async version).

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def writable_session_async(self, branch: str) -> Session:
    """
    Create a writable session on a branch (async version).

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.writable_session_async(branch))

icechunk.IcechunkStore#

icechunk.store.IcechunkStore #

Bases: Store, SyncMixin

Methods:

Name Description
__init__

Create a new IcechunkStore.

clear

Clear the store.

delete

Remove a key from the store

delete_dir

Delete a prefix

exists

Check if a key exists in the store.

get

Retrieve the value associated with a given key.

get_partial_values

Retrieve possibly partial values from given key_ranges.

is_empty

Check if the directory is empty.

list

Retrieve all keys in the store.

list_dir

Retrieve all keys and prefixes with a given prefix and which do not contain the character

list_prefix

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative

set

Store a (key, value) pair.

set_if_not_exists

Store a key to value if the key is not already present.

set_partial_values

Store values at a given key, starting at byte range_start.

set_virtual_ref

Store a virtual reference to a chunk.

set_virtual_ref_async

Store a virtual reference to a chunk asynchronously.

set_virtual_refs

Store multiple virtual references for the same array.

set_virtual_refs_arr

Store virtual references for an array from flat arrays of locations, offsets, and lengths.

set_virtual_refs_arr_async

Store virtual references for an array from flat arrays (async).

set_virtual_refs_async

Store multiple virtual references for the same array asynchronously.

sync_clear

Clear the store.

Attributes:

Name Type Description
supports_listing bool

Does the store support listing?

supports_partial_writes Literal[False]

Does the store support partial writes?

supports_writes bool

Does the store support writes?

Source code in icechunk-python/python/icechunk/store.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
class IcechunkStore(Store, SyncMixin):
    _store: PyStore

    def __init__(
        self,
        store: PyStore,
        read_only: bool | None = None,
        *args: Any,
        **kwargs: Any,
    ):
        """Create a new IcechunkStore.

        This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
        """
        read_only = read_only if read_only is not None else store.read_only
        super().__init__(read_only=read_only)
        if store is None:
            raise ValueError(
                "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
            )
        self._store = store
        self._is_open = True

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, IcechunkStore):
            return False
        return self._store == value._store

    def __repr__(self) -> str:
        return repr(self._store)

    def __str__(self) -> str:
        return str(self._store)

    def _repr_html_(self) -> str:
        return self._store._repr_html_()

    def __getstate__(self) -> object:
        # for read_only sessions we allow pickling, this allows distributed reads without forking
        session = self._store.session
        if not session.read_only and not session.is_fork:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                "See https://icechunk.io/en/stable/icechunk-python/parallel/#cooperative-distributed-writes. "
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
            )
        d = self.__dict__.copy()
        # we serialize the Rust store as bytes
        d["_store"] = self._store.as_bytes()
        return d

    def __setstate__(self, state: Any) -> None:
        # we have to deserialize the bytes of the Rust store
        store_repr = state["_store"]
        state["_store"] = PyStore.from_bytes(store_repr)
        self.__dict__ = state

    def with_read_only(self, read_only: bool = False) -> Store:
        new_store = IcechunkStore(store=self._store, read_only=read_only)
        new_store._is_open = False
        return new_store

    @property
    def session(self) -> "Session":
        from icechunk.session import ForkSession, Session

        if self._store.session.is_fork:
            return ForkSession(self._store.session)
        else:
            return Session(self._store.session)

    async def clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return await self._store.clear()

    def sync_clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return self._store.sync_clear()

    async def is_empty(self, prefix: str) -> bool:
        """
        Check if the directory is empty.

        Parameters
        ----------
        prefix : str
            Prefix of keys to check.

        Returns
        -------
        bool
            True if the store is empty, False otherwise.
        """
        return await self._store.is_empty(prefix)

    async def get(
        self,
        key: str,
        prototype: BufferPrototype,
        byte_range: ByteRequest | None = None,
    ) -> Buffer | None:
        """Retrieve the value associated with a given key.

        Parameters
        ----------
        key : str
        byte_range : ByteRequest, optional

            ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

            - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
            - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
            - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

        Returns
        -------
        Buffer
        """

        try:
            result = await self._store.get(key, _byte_request_to_tuple(byte_range))
        except KeyError as _e:
            # Zarr python expects None to be returned if the key does not exist
            # but an IcechunkStore returns an error if the key does not exist
            return None

        return prototype.buffer.from_bytes(result)

    async def get_partial_values(
        self,
        prototype: BufferPrototype,
        key_ranges: Iterable[tuple[str, ByteRequest | None]],
    ) -> list[Buffer | None]:
        """Retrieve possibly partial values from given key_ranges.

        Parameters
        ----------
        key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
            Ordered set of key, range pairs, a key may occur multiple times with different ranges

        Returns
        -------
        list of values, in the order of the key_ranges, may contain null/none for missing keys
        """
        # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
        result = await self._store.get_partial_values(list(ranges))
        return [prototype.buffer.from_bytes(r) for r in result]

    async def exists(self, key: str) -> bool:
        """Check if a key exists in the store.

        Parameters
        ----------
        key : str

        Returns
        -------
        bool
        """
        return await self._store.exists(key)

    @property
    def supports_writes(self) -> bool:
        """Does the store support writes?"""
        return self._store.supports_writes

    async def set(self, key: str, value: Buffer) -> None:
        """Store a (key, value) pair.

        Parameters
        ----------
        key : str
        value : Buffer
        """
        if not isinstance(value, Buffer):
            raise TypeError(
                f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
            )
        return await self._store.set(key, value.to_bytes())

    async def set_if_not_exists(self, key: str, value: Buffer) -> None:
        """
        Store a key to ``value`` if the key is not already present.

        Parameters
        -----------
        key : str
        value : Buffer
        """
        return await self._store.set_if_not_exists(key, value.to_bytes())

    def set_virtual_ref(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return self._store.set_virtual_ref(
            key, location, offset, length, checksum, validate_container
        )

    async def set_virtual_ref_async(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk asynchronously.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return await self._store.set_virtual_ref_async(
            key, location, offset, length, checksum, validate_container
        )

    def set_virtual_refs(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return self._store.set_virtual_refs(array_path, chunks, validate_containers)

    async def set_virtual_refs_async(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array asynchronously.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return await self._store.set_virtual_refs_async(
            array_path, chunks, validate_containers
        )

    def set_virtual_refs_arr(
        self,
        array_path: str,
        chunk_grid_shape: tuple[int, ...],
        locations: list[str],
        offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
        lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
        *,
        validate_containers: bool = True,
        arr_offset: tuple[int, ...] | None = None,
        checksum: datetime | str | None = None,
    ) -> list[tuple[int, ...]] | None:
        """Store virtual references for an array from flat arrays of locations, offsets, and lengths.

        More efficient than ``set_virtual_refs`` as it avoids creating
        per-chunk ``VirtualChunkSpec`` Python objects. The locations list
        is iterated in Rust (borrowing strings directly from CPython),
        and the offset/length numpy arrays are accessed via zero-copy.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store.
            Example: "/groupA/groupB/outputs/my-array"
        chunk_grid_shape : tuple[int, ...]
            Shape of the chunk grid (number of chunks per dimension).
            The product must equal the length of the arrays.
            Arrays are assumed to be flattened in C (row-major) order.
        locations : list[str]
            URLs to external files containing chunk data. Empty strings
            represent missing chunks and are silently skipped.
            Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
        offsets : np.ndarray
            1-D uint64 array of byte offsets within each file.
        lengths : np.ndarray
            1-D uint64 array of byte lengths of each chunk.
        validate_containers : bool
            If True, validate that locations match registered virtual
            chunk containers. Default is True.
        arr_offset : tuple[int, ...] | None
            Optional offset to add to computed chunk indices. Useful for
            append operations where new chunks should be written at an
            offset from (0, 0, ...). Must have the same length as
            chunk_grid_shape. Default is None.
        checksum : datetime | str | None
            Optional checksum for all chunks. Can be a datetime
            (last modified time) or a string (ETag). Default is None.

        Returns
        -------
        list[tuple[int, ...]] | None
            If all virtual references were successfully updated, returns None.
            If there were validation errors, returns the chunk indices of
            all failed references.
        """
        return self._store.set_virtual_refs_arr(
            array_path,
            list(chunk_grid_shape),
            locations,
            offsets,
            lengths,
            validate_containers=validate_containers,
            arr_offset=list(arr_offset) if arr_offset is not None else None,
            checksum=checksum,
        )

    async def set_virtual_refs_arr_async(
        self,
        array_path: str,
        chunk_grid_shape: tuple[int, ...],
        locations: list[str],
        offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
        lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
        *,
        validate_containers: bool = True,
        arr_offset: tuple[int, ...] | None = None,
        checksum: datetime | str | None = None,
    ) -> list[tuple[int, ...]] | None:
        """Store virtual references for an array from flat arrays (async).

        Async variant of ``set_virtual_refs_arr``. The vref construction
        still requires the GIL (to borrow strings from the Python list),
        but the store insertion releases it. Use ``asyncio.gather()`` to
        overlap vref building for one array with store insertion for another.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store.
            Example: "/groupA/groupB/outputs/my-array"
        chunk_grid_shape : tuple[int, ...]
            Shape of the chunk grid (number of chunks per dimension).
            The product must equal the length of the arrays.
            Arrays are assumed to be flattened in C (row-major) order.
        locations : list[str]
            URLs to external files containing chunk data. Empty strings
            represent missing chunks and are silently skipped.
            Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
        offsets : np.ndarray
            1-D uint64 array of byte offsets within each file.
        lengths : np.ndarray
            1-D uint64 array of byte lengths of each chunk.
        validate_containers : bool
            If True, validate that locations match registered virtual
            chunk containers. Default is True.
        arr_offset : tuple[int, ...] | None
            Optional offset to add to computed chunk indices. Useful for
            append operations where new chunks should be written at an
            offset from (0, 0, ...). Must have the same length as
            chunk_grid_shape. Default is None.
        checksum : datetime | str | None
            Optional checksum for all chunks. Can be a datetime
            (last modified time) or a string (ETag). Default is None.

        Returns
        -------
        list[tuple[int, ...]] | None
            If all virtual references were successfully updated, returns None.
            If there were validation errors, returns the chunk indices of
            all failed references.
        """
        return await self._store.set_virtual_refs_arr_async(
            array_path,
            list(chunk_grid_shape),
            locations,
            offsets,
            lengths,
            validate_containers=validate_containers,
            arr_offset=list(arr_offset) if arr_offset is not None else None,
            checksum=checksum,
        )

    async def delete(self, key: str) -> None:
        """Remove a key from the store

        Parameters
        ----------
        key : str
        """
        return await self._store.delete(key)

    async def delete_dir(self, prefix: str) -> None:
        """Delete a prefix

        Parameters
        ----------
        prefix : str
        """
        return await self._store.delete_dir(prefix)

    @property
    def supports_partial_writes(self) -> Literal[False]:
        """Does the store support partial writes?

        Partial writes are no longer used by Zarr, so this is always false.
        """
        return self._store.supports_partial_writes  # type: ignore[return-value]

    async def set_partial_values(
        self, key_start_values: Iterable[tuple[str, int, BytesLike]]
    ) -> None:
        """Store values at a given key, starting at byte range_start.

        Parameters
        ----------
        key_start_values : list[tuple[str, int, BytesLike]]
            set of key, range_start, values triples, a key may occur multiple times with different
            range_starts, range_starts (considering the length of the respective values) must not
            specify overlapping ranges for the same key
        """
        # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        # NOTE: currently we only implement the case where the values are bytes
        return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

    @property
    def supports_listing(self) -> bool:
        """Does the store support listing?"""
        return self._store.supports_listing

    @property
    def supports_consolidated_metadata(self) -> bool:
        return self._store.supports_consolidated_metadata

    @property
    def supports_deletes(self) -> bool:
        return self._store.supports_deletes

    def list(self) -> AsyncIterator[str]:
        """Retrieve all keys in the store.

        Returns
        -------
        AsyncIterator[str, None]
        """
        # This method should be async, like overridden methods in child classes.
        # However, that's not straightforward:
        # https://stackoverflow.com/questions/68905848

        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list()

    def list_prefix(self, prefix: str) -> AsyncIterator[str]:
        """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
        to the root of the store.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_prefix(prefix)

    def list_dir(self, prefix: str) -> AsyncIterator[str]:
        """
        Retrieve all keys and prefixes with a given prefix and which do not contain the character
        “/” after the given prefix.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_dir(prefix)

    async def getsize(self, key: str) -> int:
        return await self._store.getsize(key)

    async def getsize_prefix(self, prefix: str) -> int:
        return await self._store.getsize_prefix(prefix)

supports_listing property #

supports_listing

Does the store support listing?

supports_partial_writes property #

supports_partial_writes

Does the store support partial writes?

Partial writes are no longer used by Zarr, so this is always false.

supports_writes property #

supports_writes

Does the store support writes?

__init__ #

__init__(store, read_only=None, *args, **kwargs)

Create a new IcechunkStore.

This should not be called directly, instead use the create, open_existing or open_or_create class methods.

Source code in icechunk-python/python/icechunk/store.py
def __init__(
    self,
    store: PyStore,
    read_only: bool | None = None,
    *args: Any,
    **kwargs: Any,
):
    """Create a new IcechunkStore.

    This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
    """
    read_only = read_only if read_only is not None else store.read_only
    super().__init__(read_only=read_only)
    if store is None:
        raise ValueError(
            "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
        )
    self._store = store
    self._is_open = True

clear async #

clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py
async def clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return await self._store.clear()

delete async #

delete(key)

Remove a key from the store

Parameters:

Name Type Description Default
key str
required
Source code in icechunk-python/python/icechunk/store.py
async def delete(self, key: str) -> None:
    """Remove a key from the store

    Parameters
    ----------
    key : str
    """
    return await self._store.delete(key)

delete_dir async #

delete_dir(prefix)

Delete a prefix

Parameters:

Name Type Description Default
prefix str
required
Source code in icechunk-python/python/icechunk/store.py
async def delete_dir(self, prefix: str) -> None:
    """Delete a prefix

    Parameters
    ----------
    prefix : str
    """
    return await self._store.delete_dir(prefix)

exists async #

exists(key)

Check if a key exists in the store.

Parameters:

Name Type Description Default
key str
required

Returns:

Type Description
bool
Source code in icechunk-python/python/icechunk/store.py
async def exists(self, key: str) -> bool:
    """Check if a key exists in the store.

    Parameters
    ----------
    key : str

    Returns
    -------
    bool
    """
    return await self._store.exists(key)

get async #

get(key, prototype, byte_range=None)

Retrieve the value associated with a given key.

Parameters:

Name Type Description Default
key str
required
byte_range ByteRequest

ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

  • RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
  • OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
  • SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.
None

Returns:

Type Description
Buffer
Source code in icechunk-python/python/icechunk/store.py
async def get(
    self,
    key: str,
    prototype: BufferPrototype,
    byte_range: ByteRequest | None = None,
) -> Buffer | None:
    """Retrieve the value associated with a given key.

    Parameters
    ----------
    key : str
    byte_range : ByteRequest, optional

        ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

        - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
        - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
        - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

    Returns
    -------
    Buffer
    """

    try:
        result = await self._store.get(key, _byte_request_to_tuple(byte_range))
    except KeyError as _e:
        # Zarr python expects None to be returned if the key does not exist
        # but an IcechunkStore returns an error if the key does not exist
        return None

    return prototype.buffer.from_bytes(result)

get_partial_values async #

get_partial_values(prototype, key_ranges)

Retrieve possibly partial values from given key_ranges.

Parameters:

Name Type Description Default
key_ranges Iterable[tuple[str, tuple[int | None, int | None]]]

Ordered set of key, range pairs, a key may occur multiple times with different ranges

required

Returns:

Type Description
list of values, in the order of the key_ranges, may contain null/none for missing keys
Source code in icechunk-python/python/icechunk/store.py
async def get_partial_values(
    self,
    prototype: BufferPrototype,
    key_ranges: Iterable[tuple[str, ByteRequest | None]],
) -> list[Buffer | None]:
    """Retrieve possibly partial values from given key_ranges.

    Parameters
    ----------
    key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
        Ordered set of key, range pairs, a key may occur multiple times with different ranges

    Returns
    -------
    list of values, in the order of the key_ranges, may contain null/none for missing keys
    """
    # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
    result = await self._store.get_partial_values(list(ranges))
    return [prototype.buffer.from_bytes(r) for r in result]

is_empty async #

is_empty(prefix)

Check if the directory is empty.

Parameters:

Name Type Description Default
prefix str

Prefix of keys to check.

required

Returns:

Type Description
bool

True if the store is empty, False otherwise.

Source code in icechunk-python/python/icechunk/store.py
async def is_empty(self, prefix: str) -> bool:
    """
    Check if the directory is empty.

    Parameters
    ----------
    prefix : str
        Prefix of keys to check.

    Returns
    -------
    bool
        True if the store is empty, False otherwise.
    """
    return await self._store.is_empty(prefix)

list #

list()

Retrieve all keys in the store.

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list(self) -> AsyncIterator[str]:
    """Retrieve all keys in the store.

    Returns
    -------
    AsyncIterator[str, None]
    """
    # This method should be async, like overridden methods in child classes.
    # However, that's not straightforward:
    # https://stackoverflow.com/questions/68905848

    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list()

list_dir #

list_dir(prefix)

Retrieve all keys and prefixes with a given prefix and which do not contain the character “/” after the given prefix.

Parameters:

Name Type Description Default
prefix str
required

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list_dir(self, prefix: str) -> AsyncIterator[str]:
    """
    Retrieve all keys and prefixes with a given prefix and which do not contain the character
    “/” after the given prefix.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_dir(prefix)

list_prefix #

list_prefix(prefix)

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative to the root of the store.

Parameters:

Name Type Description Default
prefix str
required

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list_prefix(self, prefix: str) -> AsyncIterator[str]:
    """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
    to the root of the store.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_prefix(prefix)

set async #

set(key, value)

Store a (key, value) pair.

Parameters:

Name Type Description Default
key str
required
value Buffer
required
Source code in icechunk-python/python/icechunk/store.py
async def set(self, key: str, value: Buffer) -> None:
    """Store a (key, value) pair.

    Parameters
    ----------
    key : str
    value : Buffer
    """
    if not isinstance(value, Buffer):
        raise TypeError(
            f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
        )
    return await self._store.set(key, value.to_bytes())

set_if_not_exists async #

set_if_not_exists(key, value)

Store a key to value if the key is not already present.

Parameters:

Name Type Description Default
key str
required
value Buffer
required
Source code in icechunk-python/python/icechunk/store.py
async def set_if_not_exists(self, key: str, value: Buffer) -> None:
    """
    Store a key to ``value`` if the key is not already present.

    Parameters
    -----------
    key : str
    value : Buffer
    """
    return await self._store.set_if_not_exists(key, value.to_bytes())

set_partial_values async #

set_partial_values(key_start_values)

Store values at a given key, starting at byte range_start.

Parameters:

Name Type Description Default
key_start_values list[tuple[str, int, BytesLike]]

set of key, range_start, values triples, a key may occur multiple times with different range_starts, range_starts (considering the length of the respective values) must not specify overlapping ranges for the same key

required
Source code in icechunk-python/python/icechunk/store.py
async def set_partial_values(
    self, key_start_values: Iterable[tuple[str, int, BytesLike]]
) -> None:
    """Store values at a given key, starting at byte range_start.

    Parameters
    ----------
    key_start_values : list[tuple[str, int, BytesLike]]
        set of key, range_start, values triples, a key may occur multiple times with different
        range_starts, range_starts (considering the length of the respective values) must not
        specify overlapping ranges for the same key
    """
    # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    # NOTE: currently we only implement the case where the values are bytes
    return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

set_virtual_ref #

set_virtual_ref(
    key,
    location,
    *,
    offset,
    length,
    checksum=None,
    validate_container=True,
)

Store a virtual reference to a chunk.

Parameters:

Name Type Description Default
key str

The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'

required
location str

The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'

required
offset int

The offset in bytes from the start of the file location in storage the chunk starts at

required
length int

The length of the chunk in bytes, measured from the given offset

required
checksum str | datetime | None

The etag or last_medified_at field of the object

None
validate_container bool

If set to true, fail for locations that don't match any existing virtual chunk container

True
Source code in icechunk-python/python/icechunk/store.py
def set_virtual_ref(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return self._store.set_virtual_ref(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_ref_async async #

set_virtual_ref_async(
    key,
    location,
    *,
    offset,
    length,
    checksum=None,
    validate_container=True,
)

Store a virtual reference to a chunk asynchronously.

Parameters:

Name Type Description Default
key str

The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'

required
location str

The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'

required
offset int

The offset in bytes from the start of the file location in storage the chunk starts at

required
length int

The length of the chunk in bytes, measured from the given offset

required
checksum str | datetime | None

The etag or last_medified_at field of the object

None
validate_container bool

If set to true, fail for locations that don't match any existing virtual chunk container

True
Source code in icechunk-python/python/icechunk/store.py
async def set_virtual_ref_async(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk asynchronously.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return await self._store.set_virtual_ref_async(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_refs #

set_virtual_refs(
    array_path, chunks, *, validate_containers=True
)

Store multiple virtual references for the same array.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunks (list[VirtualChunkSpec],)

The list of virtual chunks to add

required
validate_containers bool

If set to true, ignore virtual references for locations that don't match any existing virtual chunk container

True

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
def set_virtual_refs(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return self._store.set_virtual_refs(array_path, chunks, validate_containers)

set_virtual_refs_arr #

set_virtual_refs_arr(
    array_path,
    chunk_grid_shape,
    locations,
    offsets,
    lengths,
    *,
    validate_containers=True,
    arr_offset=None,
    checksum=None,
)

Store virtual references for an array from flat arrays of locations, offsets, and lengths.

More efficient than set_virtual_refs as it avoids creating per-chunk VirtualChunkSpec Python objects. The locations list is iterated in Rust (borrowing strings directly from CPython), and the offset/length numpy arrays are accessed via zero-copy.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunk_grid_shape tuple[int, ...]

Shape of the chunk grid (number of chunks per dimension). The product must equal the length of the arrays. Arrays are assumed to be flattened in C (row-major) order.

required
locations list[str]

URLs to external files containing chunk data. Empty strings represent missing chunks and are silently skipped. Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]

required
offsets ndarray

1-D uint64 array of byte offsets within each file.

required
lengths ndarray

1-D uint64 array of byte lengths of each chunk.

required
validate_containers bool

If True, validate that locations match registered virtual chunk containers. Default is True.

True
arr_offset tuple[int, ...] | None

Optional offset to add to computed chunk indices. Useful for append operations where new chunks should be written at an offset from (0, 0, ...). Must have the same length as chunk_grid_shape. Default is None.

None
checksum datetime | str | None

Optional checksum for all chunks. Can be a datetime (last modified time) or a string (ETag). Default is None.

None

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references were successfully updated, returns None. If there were validation errors, returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
def set_virtual_refs_arr(
    self,
    array_path: str,
    chunk_grid_shape: tuple[int, ...],
    locations: list[str],
    offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
    lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
    *,
    validate_containers: bool = True,
    arr_offset: tuple[int, ...] | None = None,
    checksum: datetime | str | None = None,
) -> list[tuple[int, ...]] | None:
    """Store virtual references for an array from flat arrays of locations, offsets, and lengths.

    More efficient than ``set_virtual_refs`` as it avoids creating
    per-chunk ``VirtualChunkSpec`` Python objects. The locations list
    is iterated in Rust (borrowing strings directly from CPython),
    and the offset/length numpy arrays are accessed via zero-copy.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store.
        Example: "/groupA/groupB/outputs/my-array"
    chunk_grid_shape : tuple[int, ...]
        Shape of the chunk grid (number of chunks per dimension).
        The product must equal the length of the arrays.
        Arrays are assumed to be flattened in C (row-major) order.
    locations : list[str]
        URLs to external files containing chunk data. Empty strings
        represent missing chunks and are silently skipped.
        Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
    offsets : np.ndarray
        1-D uint64 array of byte offsets within each file.
    lengths : np.ndarray
        1-D uint64 array of byte lengths of each chunk.
    validate_containers : bool
        If True, validate that locations match registered virtual
        chunk containers. Default is True.
    arr_offset : tuple[int, ...] | None
        Optional offset to add to computed chunk indices. Useful for
        append operations where new chunks should be written at an
        offset from (0, 0, ...). Must have the same length as
        chunk_grid_shape. Default is None.
    checksum : datetime | str | None
        Optional checksum for all chunks. Can be a datetime
        (last modified time) or a string (ETag). Default is None.

    Returns
    -------
    list[tuple[int, ...]] | None
        If all virtual references were successfully updated, returns None.
        If there were validation errors, returns the chunk indices of
        all failed references.
    """
    return self._store.set_virtual_refs_arr(
        array_path,
        list(chunk_grid_shape),
        locations,
        offsets,
        lengths,
        validate_containers=validate_containers,
        arr_offset=list(arr_offset) if arr_offset is not None else None,
        checksum=checksum,
    )

set_virtual_refs_arr_async async #

set_virtual_refs_arr_async(
    array_path,
    chunk_grid_shape,
    locations,
    offsets,
    lengths,
    *,
    validate_containers=True,
    arr_offset=None,
    checksum=None,
)

Store virtual references for an array from flat arrays (async).

Async variant of set_virtual_refs_arr. The vref construction still requires the GIL (to borrow strings from the Python list), but the store insertion releases it. Use asyncio.gather() to overlap vref building for one array with store insertion for another.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunk_grid_shape tuple[int, ...]

Shape of the chunk grid (number of chunks per dimension). The product must equal the length of the arrays. Arrays are assumed to be flattened in C (row-major) order.

required
locations list[str]

URLs to external files containing chunk data. Empty strings represent missing chunks and are silently skipped. Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]

required
offsets ndarray

1-D uint64 array of byte offsets within each file.

required
lengths ndarray

1-D uint64 array of byte lengths of each chunk.

required
validate_containers bool

If True, validate that locations match registered virtual chunk containers. Default is True.

True
arr_offset tuple[int, ...] | None

Optional offset to add to computed chunk indices. Useful for append operations where new chunks should be written at an offset from (0, 0, ...). Must have the same length as chunk_grid_shape. Default is None.

None
checksum datetime | str | None

Optional checksum for all chunks. Can be a datetime (last modified time) or a string (ETag). Default is None.

None

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references were successfully updated, returns None. If there were validation errors, returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
async def set_virtual_refs_arr_async(
    self,
    array_path: str,
    chunk_grid_shape: tuple[int, ...],
    locations: list[str],
    offsets: "np.ndarray[Any, np.dtype[np.uint64]]",
    lengths: "np.ndarray[Any, np.dtype[np.uint64]]",
    *,
    validate_containers: bool = True,
    arr_offset: tuple[int, ...] | None = None,
    checksum: datetime | str | None = None,
) -> list[tuple[int, ...]] | None:
    """Store virtual references for an array from flat arrays (async).

    Async variant of ``set_virtual_refs_arr``. The vref construction
    still requires the GIL (to borrow strings from the Python list),
    but the store insertion releases it. Use ``asyncio.gather()`` to
    overlap vref building for one array with store insertion for another.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store.
        Example: "/groupA/groupB/outputs/my-array"
    chunk_grid_shape : tuple[int, ...]
        Shape of the chunk grid (number of chunks per dimension).
        The product must equal the length of the arrays.
        Arrays are assumed to be flattened in C (row-major) order.
    locations : list[str]
        URLs to external files containing chunk data. Empty strings
        represent missing chunks and are silently skipped.
        Example: ["s3://bucket/file1.nc", "s3://bucket/file2.nc"]
    offsets : np.ndarray
        1-D uint64 array of byte offsets within each file.
    lengths : np.ndarray
        1-D uint64 array of byte lengths of each chunk.
    validate_containers : bool
        If True, validate that locations match registered virtual
        chunk containers. Default is True.
    arr_offset : tuple[int, ...] | None
        Optional offset to add to computed chunk indices. Useful for
        append operations where new chunks should be written at an
        offset from (0, 0, ...). Must have the same length as
        chunk_grid_shape. Default is None.
    checksum : datetime | str | None
        Optional checksum for all chunks. Can be a datetime
        (last modified time) or a string (ETag). Default is None.

    Returns
    -------
    list[tuple[int, ...]] | None
        If all virtual references were successfully updated, returns None.
        If there were validation errors, returns the chunk indices of
        all failed references.
    """
    return await self._store.set_virtual_refs_arr_async(
        array_path,
        list(chunk_grid_shape),
        locations,
        offsets,
        lengths,
        validate_containers=validate_containers,
        arr_offset=list(arr_offset) if arr_offset is not None else None,
        checksum=checksum,
    )

set_virtual_refs_async async #

set_virtual_refs_async(
    array_path, chunks, *, validate_containers=True
)

Store multiple virtual references for the same array asynchronously.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunks (list[VirtualChunkSpec],)

The list of virtual chunks to add

required
validate_containers bool

If set to true, ignore virtual references for locations that don't match any existing virtual chunk container

True

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
async def set_virtual_refs_async(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array asynchronously.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return await self._store.set_virtual_refs_async(
        array_path, chunks, validate_containers
    )

sync_clear #

sync_clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py
def sync_clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return self._store.sync_clear()

Exceptions#

icechunk.IcechunkError #

Bases: Exception

Base class for all Icechunk errors

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class IcechunkError(Exception):
    """Base class for all Icechunk errors"""

    @property
    def message(self) -> str: ...

icechunk.ConflictError #

Bases: Exception

An error that occurs when a conflict is detected

Methods:

Name Description
__new__

Create a new ConflictError.

Attributes:

Name Type Description
actual_parent str

The actual parent snapshot ID of the branch that the session attempted to commit to.

expected_parent str

The expected parent snapshot ID.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ConflictError(Exception):
    """An error that occurs when a conflict is detected"""

    def __new__(
        cls,
        expected_parent: str | None = None,
        actual_parent: str | None = None,
    ) -> ConflictError:
        """
        Create a new ConflictError.

        Parameters
        ----------
        expected_parent: str | None
            The expected parent snapshot ID.
        actual_parent: str | None
            The actual parent snapshot ID of the branch.
        """
        ...

    @property
    def expected_parent(self) -> str:
        """The expected parent snapshot ID.

        This is the snapshot ID that the session was based on when the
        commit operation was called.
        """
        ...
    @property
    def actual_parent(self) -> str:
        """
        The actual parent snapshot ID of the branch that the session attempted to commit to.

        When the session is based on a branch, this is the snapshot ID of the branch tip. If this
        error is raised, it means the branch was modified and committed by another session after
        the session was created.
        """
        ...
    ...

actual_parent property #

actual_parent

The actual parent snapshot ID of the branch that the session attempted to commit to.

When the session is based on a branch, this is the snapshot ID of the branch tip. If this error is raised, it means the branch was modified and committed by another session after the session was created.

expected_parent property #

expected_parent

The expected parent snapshot ID.

This is the snapshot ID that the session was based on when the commit operation was called.

__new__ #

__new__(expected_parent=None, actual_parent=None)

Create a new ConflictError.

Parameters:

Name Type Description Default
expected_parent str | None

The expected parent snapshot ID.

None
actual_parent str | None

The actual parent snapshot ID of the branch.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(
    cls,
    expected_parent: str | None = None,
    actual_parent: str | None = None,
) -> ConflictError:
    """
    Create a new ConflictError.

    Parameters
    ----------
    expected_parent: str | None
        The expected parent snapshot ID.
    actual_parent: str | None
        The actual parent snapshot ID of the branch.
    """
    ...

icechunk.RebaseFailedError #

Bases: IcechunkError

An error that occurs when a rebase operation fails

Methods:

Name Description
__new__

Create a new RebaseFailedError.

Attributes:

Name Type Description
conflicts list[Conflict]

The conflicts that occurred during the rebase operation

snapshot str

The snapshot ID that the session was rebased to

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class RebaseFailedError(IcechunkError):
    """An error that occurs when a rebase operation fails"""

    def __new__(cls, snapshot: str, conflicts: list[Conflict]) -> RebaseFailedError:
        """
        Create a new RebaseFailedError.

        Parameters
        ----------
        snapshot: str
            The snapshot ID that the session was rebased to.
        conflicts: list[Conflict]
            The conflicts that occurred during the rebase operation.
        """
        ...

    @property
    def snapshot(self) -> str:
        """The snapshot ID that the session was rebased to"""
        ...

    @property
    def conflicts(self) -> list[Conflict]:
        """The conflicts that occurred during the rebase operation

        Returns:
            list[Conflict]: The conflicts that occurred during the rebase operation
        """
    ...

conflicts property #

conflicts

The conflicts that occurred during the rebase operation

Returns: list[Conflict]: The conflicts that occurred during the rebase operation

snapshot property #

snapshot

The snapshot ID that the session was rebased to

__new__ #

__new__(snapshot, conflicts)

Create a new RebaseFailedError.

Parameters:

Name Type Description Default
snapshot str

The snapshot ID that the session was rebased to.

required
conflicts list[Conflict]

The conflicts that occurred during the rebase operation.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __new__(cls, snapshot: str, conflicts: list[Conflict]) -> RebaseFailedError:
    """
    Create a new RebaseFailedError.

    Parameters
    ----------
    snapshot: str
        The snapshot ID that the session was rebased to.
    conflicts: list[Conflict]
        The conflicts that occurred during the rebase operation.
    """
    ...

Top-level utilities#

icechunk.print_debug_info #

print_debug_info()
Source code in icechunk-python/python/icechunk/__init__.py
def print_debug_info() -> None:
    import platform
    from importlib import import_module

    print(f"platform:  {platform.platform()}")
    print(f"python:  {platform.python_version()}")
    print(f"icechunk:  {__version__}")
    for package in ["zarr", "numcodecs", "xarray", "virtualizarr"]:
        try:
            print(f"{package}:  {import_module(package).__version__}")
        except ModuleNotFoundError:
            continue

icechunk.upgrade_icechunk_repository #

upgrade_icechunk_repository(
    repo,
    *,
    dry_run,
    delete_unused_v1_files=True,
    prefetch_concurrency=None,
)

Migrate a repository to the latest version of Icechunk.

This is an administrative operation, and must be executed in isolation from other readers and writers. Other processes running concurrently on the same repo may see undefined behavior.

At this time, this function supports only migration from Icechunk spec version 1 to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

The operation is usually fast, but it can take several minutes if there is a very large version history (thousands of snapshots).

Returns a new Repository object. The original repo object should not be used after calling this function.

Parameters:

Name Type Description Default
repo Repository

The repository to upgrade.

required
dry_run bool

If True, perform a dry run without actually upgrading. If False, perform the upgrade.

required
delete_unused_v1_files bool

If True (the default), delete unused v1 files after upgrading.

True
prefetch_concurrency int or None

Number of snapshots to prefetch concurrently during migration. Defaults to 64 if not specified. Lower this value for repos that cannot fit many snapshots in memory.

None

Returns:

Type Description
Repository

A freshly opened repository with the updated spec version.

Source code in icechunk-python/python/icechunk/__init__.py
def upgrade_icechunk_repository(
    repo: Repository,
    *,
    dry_run: bool,
    delete_unused_v1_files: bool = True,
    prefetch_concurrency: int | None = None,
) -> Repository:
    """
    Migrate a repository to the latest version of Icechunk.

    This is an administrative operation, and must be executed in isolation from
    other readers and writers. Other processes running concurrently on the same
    repo may see undefined behavior.

    At this time, this function supports only migration from Icechunk spec version 1
    to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

    The operation is usually fast, but it can take several minutes if there is a very
    large version history (thousands of snapshots).

    Returns a new Repository object. The original repo object should not be used
    after calling this function.

    Parameters
    ----------
    repo : Repository
        The repository to upgrade.
    dry_run : bool
        If True, perform a dry run without actually upgrading. If False, perform
        the upgrade.
    delete_unused_v1_files : bool, optional
        If True (the default), delete unused v1 files after upgrading.
    prefetch_concurrency : int or None, optional
        Number of snapshots to prefetch concurrently during migration.
        Defaults to 64 if not specified. Lower this value for repos that
        cannot fit many snapshots in memory.

    Returns
    -------
    Repository
        A freshly opened repository with the updated spec version.
    """
    new_repo = _upgrade_icechunk_repository(
        repo._repository,
        dry_run=dry_run,
        delete_unused_v1_files=delete_unused_v1_files,
        prefetch_concurrency=prefetch_concurrency,
    )
    if not dry_run:
        repo._repository = _InvalidatedRepository()  # type: ignore[assignment]
    return Repository(new_repo)

icechunk.supported_spec_versions #

supported_spec_versions()
Source code in icechunk-python/python/icechunk/__init__.py
def supported_spec_versions() -> list[SpecVersion]:
    return [SpecVersion.v2, SpecVersion.v1]