xet-core

mirror of https://github.com/huggingface/xet-core.git synced 2026-06-04 13:30:29 +08:00

Author	SHA1	Message	Date
Assaf Vayner	693945c84c	rm misleading token refresh log (#787 ) Very chatty log for when there is a valid token, but says token refresh occurred which may not actually be true. downleveled and corrected this log and added an actual info on token refresh success (error gets logged by the caller) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: only adjusts logging around CAS auth token refresh without changing token refresh logic or request behavior. > > Overview > Removes the unconditional "token refresh successful" info log from `AuthMiddleware::get_token` so normal requests that reuse a still-valid token no longer claim a refresh happened. > > Adds an `info!` log in `TokenProvider::get_valid_token` that emits only after a successful refresh (including the new expiry), while keeping failure logging in the middleware. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `2a9215c163`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-04-09 15:27:46 -07:00
Hoyt Koepke	9bdaf1e305	Fix for progress reporting bug (#788 ) Currently, there's a bug in the internal progress reporting where += is applied to a value that is intended to be a delta from the last report but is actually the amount sent so far. This causes hugely inflated values for the amount of data sent on large files. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes how `bytes_sent_so_far` is accumulated during partial progress updates, which can affect when/if concurrency adjustments trigger. Logic is localized but impacts adaptive concurrency behavior under high-volume transfers. > > Overview > Fixes a progress-reporting bug in `AdaptiveConcurrencyController` where repeated cumulative byte counts from partial updates could be added multiple times, vastly inflating `bytes_sent_so_far`. > > Each `ConnectionPermit` now tracks `max_bytes_reported` and the controller only adds the incremental delta (`n_bytes - previous_max`), preventing overcounting while preserving existing partial/final reporting behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `fdb2cd6c6c`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-07 15:54:49 -07:00
Di Xiao	7989267d1e	🔒Pin the unpinned GitHub Actions to commit SHAs (#781 ) During the last pin sweep there were a couple actions left unpinned. This pins them. \| Workflow \| Action \| Avant \| Après \| SHA \| \|---\|---\|---\|---\|---\| \| `cache-rust-build/action.yml` \| `actions/cache` \| `v5` \| `v5` \| `668228422ae6…` \| \| `windows-codesign/action.yml` \| `azure/trusted-signing-action` \| `v0` \| `v0` \| `1d365fec1286…` \| \| `pre-release-testing.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \|	2026-04-06 14:27:49 -07:00
Assaf Vayner	08377eab3c	Upgrade crates version to 1.5.1 (#782 ) ## Summary - Bump workspace version from 1.5.0 to 1.5.1 - Update all internal dependency version references to match <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk version-only bump across workspace manifests and lockfiles with no code/behavior changes in the diff. > > Overview > Bumps the workspace package version from `1.5.0` to `1.5.1` and aligns internal crate dependency version pins (`xet-runtime`, `xet-core-structures`, `xet-client`, `xet-data`, `hf-xet`) to match. > > Updates lockfiles (`Cargo.lock` plus `hf_xet` and wasm lockfiles) so published/embedded artifacts resolve to the `1.5.1` crate set (including bringing wasm lockfiles up to `1.5.1`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `e8563700a0`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 14:03:02 -07:00
Assaf Vayner	062138f845	Improve hf-xet crate documentation (#780 ) ## Summary - Expand the crate-level docs (`lib.rs`) with a proper introduction explaining what Xet storage is and an end-to-end upload+download example on the landing page - Add runtime detection and `XetConfig` guidance to `XetSessionBuilder` docs - Describe all three upload methods (path, bytes, stream) in the `xet_session` module docs - Add dedicated streaming upload and streaming download code examples - Add module doc comment to `legacy/` pointing users to `xet_session` ## Test plan - [x] `cargo check -p hf-xet` passes - [x] `cargo doc -p hf-xet --no-deps` builds with no new warnings (50 pre-existing warnings unchanged) - [ ] Review rendered docs on docs.rs after merge <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Documentation-only changes (module/crate docs and examples) with no functional code or behavior modifications, so runtime risk is low aside from potential doc/example inaccuracies. > > Overview > Improves public-facing documentation across `lib.rs`, `xet_session`, and `legacy` by adding a clearer introduction to Xet/CAS, end-to-end upload+download quickstart examples, and guidance on using `XetSessionBuilder` (runtime detection and `XetConfig`). > > Expands `xet_session` docs to describe the three upload modes (path/bytes/stream) and adds dedicated streaming upload and streaming download examples, while clarifying that `legacy` is maintained for backwards compatibility and steering new users to `xet_session`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `6e926857a9`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 13:55:29 -07:00
Assaf Vayner	cd55fb123c	no error log on 416 on reconstruction (#779 ) Fixes 778 Skip the error log in favor of info log on 416 when reconstruction API is called <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: behavior change is narrowly scoped to `get_reconstruction` requests, treating HTTP 416 as an expected terminal condition and altering log severity without impacting other status handling. > > Overview > Improves reconstruction fetch behavior by marking HTTP `416 Range Not Satisfiable` as an expected end-of-data condition: `RemoteClient::get_reconstruction_impl` now opts into this via `RetryWrapper::with_expected_416()`, and `RetryWrapper` special-cases 416 to log at info level while returning a non-retryable failure. > > Updates the WASM `Cargo.lock` to bump `xet-*` crate versions from `1.4.0` to `1.5.0`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `2f1a3b6ef0`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 13:42:05 -07:00
Di Xiao	1f7400cc4b	Drop xet-core-structure from xet-runtime dev dep (#776 )	2026-04-03 13:56:26 -07:00
Di Xiao	950807ba43	Upgrade crates version to 1.5.0 (#775 ) Update workspace version to `1.5.0` and intra-workspace dependency versions to `1.5.0`	2026-04-03 13:39:50 -07:00
Hoyt Koepke	0d9f78aaf4	Add README.md files and Cargo.toml updates needed for publishing hf-xet (#773 ) This PR adds crates.io-facing metadata (homepage, readme, keywords, categories) for the publishable crates, along with crate README files and concise crate-level docs so crates.io and docs.rs pages have better context.	2026-04-03 12:34:47 -07:00
Hoyt Koepke	014ff2d75b	Fix for FD leak (#774 ) Currently, the tests can fail intermittently due to a subtle fd leak in how the session and the runtimes interact. This causes tests using the sessions to quickly run out of file handles. There were two different issues: 1. XetSessionInner tracked active upload commits and file download groups in strong-reference maps, and those child objects held a clone of the session. That created a second cycle (session -> child -> session) that prevented cleanup of commit/download resources and the runtime handles. This is dropped. (Note that all abort/sigint-cancellation behavior is handled automatically through TaskRuntime; the session classes don't need any explicit code for it outside of that). 2. The static thread-local reference to the tokio runtime prevented the tokio runtime from getting cleaned up when it was created explicitly and not aborted. In addition, JoinHandle objects hold a reference back to the runtime, so if these are not aborted or joined, then they also prevent the runtime from shutting down. The FD tracking code was left in but feature gated behind feature `fd-track`. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes runtime/session lifetime management (TLS runtime refs, shutdown behavior, and session child ownership), which can affect task cancellation and runtime teardown across the library. > > Overview > Fixes intermittent file-descriptor leaks by breaking ownership cycles between `XetSession` and child upload/download objects and by ensuring `XetRuntime` can actually drop/shutdown when the last external reference is released. > > Adds an opt-in `fd-track` feature with lightweight FD counting/scoped tracing, plus new leak-focused tests, and tightens local CAS DB/shard manager caching to avoid duplicate `redb` opens (canonicalized paths, weak cached handles, and cleanup on drop). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `041426e73e`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-02 18:28:26 -07:00
Pauline Bailly-Masson	2659c69892	🔒 Pin GitHub Actions to commit SHAs (#772 ) ## 🔒 Pin GitHub Actions to commit SHAs This PR pins all GitHub Actions to their exact commit SHA instead of mutable tags or branch names. Why? Pinning to a SHA prevents supply chain attacks where a tag (e.g. `v4`) could be moved to point to malicious code. ### Changes \| Workflow \| Action \| Avant \| Après \| SHA \| \|---\|---\|---\|---\|---\| \| `hf-xet-tests.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `hf-xet-tests.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `hf-xet-tests.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `hf-xet-tests.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| \| `release.yml` \| `actions/attest-build-provenance` \| `v3` \| `v3` \| `977bb373ede9…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `stable` \| `nightly` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `bnjbvr/cargo-machete` \| `main` \| `main` \| `b81ce1560c5f…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `nightly` \| `nightly` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `lando/code-sign-action` \| `v3` \| `v3` \| `a5703d3b5486…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| > 🤖 Generated by `/github-actions-audit` — [security/pin-actions-to-sha] Closes huggingface/tracking-issues#291 Co-authored-by: di <di@huggingface.co>	2026-04-02 11:23:49 -07:00
Di Xiao	1f0918c33e	Refactor XetSession commit / group CAS endpoint and auth configuration (#771 ) There's no publicly documented Xet CAS endpoint. To interact with Xet CAS, all public clients need to obtain a CAS endpoint from the same route to obtain a CAS token. Currently users need to 1. first construct a CAS token URL with respect to a certain operation ("read" or "write", targeted repo type, targeted repo, targeted revision), 2. send a request to this URL to get a CAS token and CAS endpoint, 3. use the CAS endpoint to build a `XetSession`, 4. use the `XetSession` instance and the CAS token and CAS token URL to build an upload or download group. This is a rather completed setup. This PR address this blocker by eagerly "refresh"-ing the CAS token if no CAS endpoint is provided, thus users can 1. build a `XetSession`, 2. construct a CAS token URL with respect to a certain operation ("read" or "write", targeted repo type, targeted repo, targeted revision), 3. use the `XetSession` instance and the CAS token URL to build an upload or download group. So effectively, there will be two common patterns: Pattern A: endpoint known ahead of time — no eager refresh, token_info is used as-is ``` let session = XetSessionBuilder::new().build()?; let commit = session .new_upload_commit()? .with_endpoint(cas_url) .with_token_info(token, expiry) .with_token_refresh_url(refresh_url, /Auth headers/) .build_blocking()?; ``` Pattern B: endpoint unknown — build call fetches it; token_info seeded from response ``` let session = XetSessionBuilder::new().build()?; let commit = session .new_upload_commit()? .with_token_refresh_url(token_refresh_url, /Auth headers/) .build_blocking()?; ``` Other changes: 1. `with_endpoint()` and `with_custom_headers()` configuration is moved from the `XetSession` level down to the operation level, because we can actually have multiple operations with different CAS endpoints co-exist in the same session instance. 2. Builder for different operations `XetUploadCommit`, `XetFileDownloadGroup`, `XetDownloadStreamGroup` are refactored to share common code under `struct AuthGroupBuilder<G>`.	2026-04-02 11:07:07 -07:00
Assaf Vayner	20198a9081	Remove prometheus dependency and metrics (#769 ) ## Summary - Remove the `prometheus` crate dependency from the workspace and `xet_data` - Delete `prometheus_metrics.rs` which defined 3 IntCounter metrics (CAS bytes produced, bytes cleaned, bytes smudged) - Remove metric increment calls from `file_upload_session.rs` and `file_download_session.rs` - Fix Windows CI flake: redb "Database already open" error in `test_single_large` These metrics were collected but never exposed via any HTTP endpoint or text encoder, making them effectively dead code. ## Test plan - [x] `cargo +nightly fmt` — clean - [x] `cargo clippy --all-targets` — no new warnings - [x] `cargo test -p xet-data` — 17/17 pass - [x] `cargo test -p xet-data --features simulation --test test_clean_smudge` — 14/14 pass (including `test_single_large`) - [x] WASM builds (`hf_xet_wasm`, `hf_xet_thin_wasm`) — both succeed <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: this removes unused Prometheus metrics plumbing and related dependencies without changing the core upload/download logic. Main risk is loss of any downstream reliance on these counters at build time (e.g., feature flags or imports). > > Overview > Removes the `prometheus` dependency from the workspace and `xet_data`, and updates lockfiles accordingly (including WASM-related lockfiles). > > Deletes `xet_data`’s `prometheus_metrics` module and strips the associated counter increments from `FileUploadSession` and `FileDownloadSession`, leaving the data processing behavior unchanged aside from no longer recording these metrics. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `c6c866b7ca`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-01 14:56:58 -07:00
Adrien	3d377bdffb	feat: optional chunk cache in download path for cross-file dedup (#731 ) ## Context These changes support the hf-mount project, which needs cross-file chunk deduplication during downloads. ## Summary - Adds an optional `ChunkCache` to the download path (`FileDownloadSession`, `FileReconstructor`, `XorbBlock`). When provided, xorb blocks are looked up in cache before HTTP requests and stored after download. - Cache hits skip permit acquisition, so they don't consume network concurrency slots. This enables cross-file deduplication for mount-style workloads. - Breaking change to `FileDownloadSession::new()` and `from_client()` signatures (new `chunk_cache: Option<Arc<dyn ChunkCache>>` parameter). All existing callers pass `None`. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches the core download/reconstruction path and changes session constructor signatures; cache-hit/miss behavior affects concurrency permits and progress reporting. Risk is mitigated by being opt-in (`None` for existing callers) but incorrect cache keys or offsets could corrupt reconstructed output or skew progress. > > Overview > Adds an optional `ChunkCache` to the download pipeline to enable cross-file xorb/chunk dedup during reconstruction. > > `FileDownloadSession` now accepts/stores `chunk_cache` and wires it into `FileReconstructor`, which passes it down into `FileTerm`/`XorbBlock` retrieval. `XorbBlock::retrieve_data` now checks the cache before acquiring CAS download permits (so cache hits avoid consuming network concurrency), and writes downloaded blocks back to the cache asynchronously on a best-effort basis (logging failures). > > This also introduces a small refactor (`build_chunk_offsets`) and updates all call sites/tests/examples to the new `FileDownloadSession::new(..., chunk_cache)` / `from_client(..., chunk_cache)` signatures (currently passing `None`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `f4fdea5175`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-01 17:29:28 +02:00
Hoyt Koepke	3051478cdd	Allow shard expiration to be set on global dedup queries for GC simulation (#762 ) Currently, simulation global dedup shard queries return full shard bytes with no configurable shard footer expiration, and simulation control knobs are split between partially implemented paths. This PR adds global dedup shard expiration control to simulation clients and servers, and extends /simulation/set_config to cover shard expiration, max range splitting, V2 reconstruction disabling, API delay, and URL expiration in one path. This enables rapid simulation of the GC paths by setting the global dedup expiration to a sub-epoch value. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches simulation client/server APIs and shard serialization behavior (including new trait methods and HTTP knobs), so downstream implementors and tests may break if not updated. Changes are scoped to simulation/GC tooling paths but affect how global-dedup shard bytes are produced and validated. > > Overview > Adds a new simulation control to set global-dedup shard expiration: `DirectAccessClient::set_global_dedup_shard_expiration` now makes `query_for_global_dedup_shard` optionally return minimal shard bytes (file section stripped) with `shard_key_expiry = now + expiration` (sub-second durations round up). > > Extends `MDBMinimalShard` serialization with `serialize_xorb_subset_with_expiry` to write an optional `shard_key_expiry` footer, and updates `LocalClient`/`MemoryClient` to use it when expiration is enabled. > > Unifies and expands runtime simulation knobs under `/simulation/set_config` (global dedup expiration, max ranges per fetch, disable V2 reconstruction, API delay, URL expiration) and updates `SimulationControlClient` to apply them via a retried async POST. Also moves integrity/reachability checks to `DeletionControlableClient`, adds `verify_all_reachable`, and wires new `/simulation/verify_all_reachable` with 501 behavior when no deletion client is configured. > > Separately, introduces simulation-only xorb cut thresholds (`XORB_CUT_THRESHOLD_*`) driven by new `xet_runtime` xorb config overrides, and updates upload/dedup code paths to use these thresholds. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `42bd9c3f4f`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 18:35:19 -07:00
Assaf Vayner	7d97aa3066	Replace heed (LMDB) with redb in local CAS simulation (#766 ) This is an optional change. basically heed imports a bunch of deps and it's also using lmdb that may require more compilation/linking steps in tests. we use it for such a small subset of operations in testing I thought we might try an even thinner rust-native dep instead. that's what redb is. ## Summary - Replace `heed` (C LMDB bindings) with `redb` (pure Rust embedded KV store) in `LocalClient` - Removes C dependency, `unsafe` block, Windows retry workaround, and custom `Drop` impl - Introduces `RedbHash` newtype wrapper for `MerkleHash` to satisfy orphan rules on redb's `Key`/`Value` traits - Net reduction of ~130 lines; all 147 existing tests pass ## Test plan - [x] `cargo check -p xet-client --features simulation` — clean - [x] `cargo test -p xet-client --features simulation` — 147 passed, 0 failed - [x] `cargo clippy -p xet-client --features simulation` — clean - [x] `cargo +nightly fmt` — clean <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Swaps the embedded KV store used for shard dedup/deletion metadata in the local CAS simulation, which can affect test behavior and on-disk state/locking semantics (especially with concurrent clients). Scope is contained to simulation/test code and dependency graph changes. > > Overview > Switches `LocalClient`’s disk-backed global-dedup and file deletion status storage from `heed`/LMDB to `redb`, including new `RedbHash` serialization, `TableDefinition`s, and updated read/write transaction flows. > > Adds a small global database-handle cache to avoid `redb` exclusive-lock conflicts across multiple `LocalClient` instances, and removes the prior LMDB-specific open/retry logic and custom `Drop` close path. Workspace dependencies/lockfiles are updated to drop `heed`/LMDB-related crates and add `redb`, and `.gitignore` now ignores `.worktrees/`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `02d39864d9`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 15:23:18 -07:00
Assaf Vayner	86935b4117	Move test-only deps to dev-dependencies in git_xet (#767 ) ## Summary - Move `russh`, `rand_core`, and `tempfile` from regular dependencies to dev-dependencies in `git_xet`, since they are only used in test code - `russh` and `rand_core` are also declared as optional regular deps activated by the `git-xet-for-integration-test` feature flag, since the integration test SSH server is compiled into the library under that feature - Gate `test_utils/ssh_server` module and related exports behind `#[cfg(any(test, feature = "git-xet-for-integration-test"))]` - Gate `tests/test_ssh.rs` integration test file behind `#![cfg(feature = "git-xet-for-integration-test")]` ## Test plan - [x] `cargo check -p git_xet` passes (no features) - [x] `cargo test -p git_xet --no-run` passes (no features) - [x] `cargo test -p git_xet --features git-xet-for-integration-test --no-run` passes <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: primarily Cargo dependency/feature and `cfg` gating changes, with no production logic changes; risk is limited to build/test configuration and feature-flagged integration test coverage. > > Overview > Reduces default build dependencies for `git_xet`. Moves `russh`, `rand_core`, and `tempfile` into `dev-dependencies`, and keeps `russh`/`rand_core` available as optional deps enabled only by the `git-xet-for-integration-test` feature. > > Gates SSH test helpers and integration tests behind a feature flag. Exposes `GitLFSAuthenticateResponse*` and the local SSH test server only under `#[cfg(test)]` or `feature = "git-xet-for-integration-test"`, and makes `tests/test_ssh.rs` compile only when that feature is enabled. > > Separately, cleans up workspace manifests/lockfiles by moving some crates (`half`, `regex`, `futures-util`) to dev-deps where they’re only needed for tests/benches, and adds `.worktrees/` to `.gitignore`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `cdc30a5a8f`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 13:31:20 -07:00
Di Xiao	126f30b981	Prevent multiple XetSession (XetRuntime) attach to the same external tokio runtime (#757 ) When `XetRuntime` wraps an external tokio handle (External mode), it registers the handle ID in `EXTERNAL_RUNTIME_REGISTRY` so `XetRuntime::current()` can look up the correct instance from any task, and thus obtain the associated `XetConfig`. Previously, a second `from_external_with_config` call with the same handle would silently overwrite the registry entry, breaking the first runtime's `current()` lookups. As a result, tasks spawned off the first `XetRuntime` can no longer access their specific `XetRuntime` and its configs, and this is not expected behavior. This PR makes the second call fail with an explicit error instead. This PR checks if `EXTERNAL_RUNTIME_REGISTRY` already contains an entry with key being the Id of the tokio runtime Id it tries to attach to, and returns `RuntimeError::InvalidRuntime` error if it does to prevent the above issue.	2026-03-30 11:36:02 -07:00
Hoyt Koepke	29acd7a981	Fix for download streams swallowing errors into generic "Channel closed" message. (#765 ) Previously, when an error happens, the channel stream can close before the error gets propagated to the user-facing iterators; when this happens, it's random whether the channel closed error or the original error gets surfaced. This PR ensures that the actual error causing the shutdown gets surfaced to the user. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adjusts error handling across async/blocked stream consumption and the sequential writer thread, affecting concurrency and shutdown paths. Risk is moderate due to potential behavior changes in edge cases when channels close during failures. > > Overview > Prevents download streaming APIs from masking reconstruction failures as generic "channel closed" errors. > > When a per-chunk `oneshot` receiver is dropped/closed, `DownloadStream::{next,blocking_next}` and the sequential writer thread (`SyncWriterThread::next_write`) now first call `run_state.check_error()` to surface the actual underlying error before falling back to an internal writer error. > > Wires `RunState` into `SyncWriterThread` so the background writer path can perform the same error propagation check. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `e33b30f076`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-30 10:07:44 -07:00
Hoyt Koepke	a81fc5800a	Rerun hub tests on failures (#761 ) Occasionally it seems the tests directly against hub can fail due to intermittent issues. This PR causes these tests to be run up to two additional times in the CI on failure, passing if they succeed on subsequent attempts.	2026-03-30 09:59:11 -07:00
Adrien	f781498b68	fix: truncate local file on full-file download to prevent corruption (#764 ) ## Summary - Fixes a data corruption bug where downloading a file smaller than an existing local file left stale trailing bytes intact - The file was opened with `truncate(false)` unconditionally (needed for concurrent partial-range writes), but full-file downloads now use `truncate(true)` - Adds regression test `test_full_file_truncates_larger_existing_file` Ref: https://github.com/huggingface/huggingface_hub/issues/3995 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes on-disk write semantics for reconstructed downloads by optionally truncating the destination file, which affects data integrity and could impact concurrent/partial-write callers if misused. > > Overview > Fixes a corruption case where full-file downloads could leave stale trailing bytes when writing over an existing larger file by adding a `truncate_file` flag to `FileReconstructor::reconstruct_to_file` and wiring it to `OpenOptions::truncate()`. > > Updates full-file download flow (`FileDownloadSession::download_file_with_id`) to pass `truncate_file=true`, while keeping benchmarks/tests and range/concurrent write paths passing `false` to preserve existing behavior for partial/concurrent writes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `ed33dab9a1`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: di <di@huggingface.co>	2026-03-30 17:57:23 +02:00
Di Xiao	15011cb230	XetSession uses direct token refresh route instead of a callback (#751 ) This PR makes two significant, breaking API redesign: 1. Auth tokens move from session-level (shared by all operations) to per-operation level (per `UploadCommit`, `FileDownloadGroup`, and `DownloadStreamGroup`). This enables uploads and downloads from the same session to carry different access-level tokens — a sensible design for HF's write-vs-read token split. 2. Instead of letting users provide a callback to refresh tokens, this new API now let users provide a token refresh URL and access credential in an HTTP header map. ### Why 1. CAS JWT have short life, but `XetSession` is intended to be held long time -- thus it makes more sense to configure CAS auth on the operation level (`UploadCommit` or `FileDownloadGroup` or `DownloadStreamGroup`) and it will be discarded once the operation is done. 2. For different access level (write vs. read) and different operation target (repo and commit), CAS JWT token will be different and the token refresh URL will be different. `UploadCommit` and `FileDownloadGroup` and `DownloadStreamGroup` they each also function as a single auth group. 3. Providing an URL is considered easier than writing a callback, and is more safe when crossing the GIL Python - Rust boundary. Examples: ``` // Upload token (write access) let mut upload_headers = HeaderMap::new(); upload_headers.insert("Authorization", "Bearer hub-write-token".parse().unwrap()); let commit = session .new_upload_commit()? .with_token_info("CAS_WRITE_JWT", 900) .with_token_refresh_url("https://huggingface.co/api/repos/token/write", upload_headers) .build_blocking()?; ``` ``` // File download token (read access) let mut dl_headers = HeaderMap::new(); dl_headers.insert("Authorization", "Bearer hub-read-token".parse().unwrap()); let group = session .new_file_download_group()? .with_token_info("CAS_READ_JWT", 900) .with_token_refresh_url("https://huggingface.co/api/repos/token/read", dl_headers) .build_blocking()?; ``` Secondary changes include: - `DirectRefreshRouteTokenRefresher` consolidated into `xet_client::cas_client::auth`. - HTTP client module moved from `cas_client` to `xet_client::common` for shared use between `xet_client::cas_client` and `xet_client::hub_client`. - New `DownloadStreamGroup` type (streaming downloads moved off `XetSession`). - Fix Session ID type regression: this was fixed once in https://github.com/huggingface/xet-core/pull/738 but regressed again, seems AI agents don't learn. - HTTP client cache key now incorporates custom headers	2026-03-30 08:39:25 -07:00
Hoyt Koepke	051dee52ea	WASM wrappers for MerkleHashSubtree (#758 ) This PR exposes the new MerkleHashSubtree class for managing groups of hashes to WASM.	2026-03-30 08:09:14 -07:00
Assaf Vayner	9c0cb6e4c8	Reduce workspace dependencies (batches 1-3) (#746 ) ## Summary - Remove unused dependencies: warp (zero imports), paste (zero invocations), tower-service (zero imports), and heed misplacement in xet_core_structures - Move mockall to dev-dependencies in xet_client by gating `#[automock]` with `#[cfg_attr(test, automock)]` - Feature-gate simulation module behind `simulation` cargo feature in xet_client, making axum, heed, humantime, futures-util, human-bandwidth, and tower-http optional - Replace duration-str with humantime (~2 deps vs ~78 transitive deps) across xet_runtime, xet_client simulation, and simulation crate ## Impact \| Metric \| Before \| After \| Change \| \|---\|---\|---\|---\| \| hf-xet production deps \| 371 \| 321 \| -50 \| \| Workspace total \| 575 \| 569 \| -6 \| ## Test plan - [x] `cargo check --workspace` passes - [x] `cargo check -p hf-xet` passes (without simulation feature — key validation) - [x] `cargo test --workspace` — all tests pass (4 pre-existing auth test failures in git_xet unrelated to this PR) - [x] `cargo tree -p hf-xet -e normal --prefix none \| sort -u \| wc -l` confirms 321 deps 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk because it changes dependency graph and Cargo feature gating (notably `xet-client` simulation modules and CI test features), which can affect build/test behavior across targets despite minimal runtime logic changes. > > Overview > Reduces workspace dependency surface by removing `duration-str` (replaced with `humantime`) and trimming other transitive-heavy crates; updates lockfiles accordingly across the workspace, `hf_xet`, and WASM builds. > > Introduces/propagates a `simulation` Cargo feature: `xet-client`’s simulation server-related deps become optional and are only compiled/exported when `feature = "simulation"` is enabled; `git_xet` adds a `simulation` feature that forwards to dependent crates, and CI now runs tests with `strict simulation git-xet-for-integration-test`. > > Minor repo hygiene updates include ignoring `.claude/` in `.gitignore` and wiring the `simulation` crate to depend on `xet-client` with `features = ["simulation"]` (plus swapping its duration parsing helper to `humantime`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `6abc194398`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 09:54:36 -07:00
Hoyt Koepke	69962587b5	Composable Hash Functionality (#745 ) Currently, computing aggregate chunk hashes across independently processed ranges requires recomputing over the full concatenated chunk list. This PR introduces ChunkHashRange, a composable representation that can hash contiguous partial ranges and merge them while preserving equivalence with the existing xorb_hash / file_hash behavior. This allows an intermediate representation of the hash ranges that can be merged in arbitrary order to get the final hash. It also uses O(log(n)) storage and all operations are done in linear time. Serialization and Deserialization are fully supported. The main use case for this is in doing partial file edits. Previously, to edit the middle of a large file, the client would have to know all the hashes for the full file, even if only a few in the middle were changed. With a large file, this can still be 100s of MB; the chunk metadata size is roughly 1/1000 of the data size. With this change, we can now transmit the unmodified parts of a file in O(log(n)) storage but still be able to build the entire function hash; now a sequence of 10M chunks takes the equivalent storage of ~500 chunks or so. Along the way, we also added in an optimization for the merge step to avoid an allocation, yielding a 2x speedup. --------- Co-authored-by: Hoyt Koepke <hoytak@xethub.com>	2026-03-27 08:38:59 -07:00
Hoyt Koepke	c90f0a7bd9	Session API Polish; unify task handling/cancellation behavior. (#747 ) Previously, upload and download paths each had their own ad-hoc state tracking, cancellation, and runtime bridging logic. TaskRuntime consolidates this into a single type that owns a CancellationToken tree, tracks Running/Finished/Cancelled state with recursive propagation to children, and provides bridge_async/bridge_sync wrappers that automatically wire up tokio::select! cancellation. Session → commit/group → per-file handles form a parent-child token tree, so aborting a session cancels all descendant work. The upload path gets new UploadFileHandle and UploadStreamHandle wrapper types (replacing the old UploadTaskHandle), with inner/wrapper pattern for cheap cloning. UploadCommit::commit() now returns a CommitReport containing aggregate dedup metrics, progress, and per-file FileMetadata. The download path mirrors this structure: FileDownloadGroup uses TaskRuntime for state gating and owns bespoke DownloadTaskHandle instances with per-task status and result access. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > High Risk > High risk due to a breaking redesign of the public `xet_session` API (new handle/report types and renamed methods) plus new cancellation/state machinery that changes how uploads/downloads are coordinated and terminated. > > Overview > Redesigns `xet_pkg::xet_session` around a new hierarchical `TaskRuntime` (using `tokio-util` cancellation tokens) to unify state, bridging, and cancellation across session → commit/group → per-file handles. > > Replaces the old task-handle/result model (`tasks.rs`, `UploadResult`/`DownloadResult`, `TaskStatus`, group/session state enums) with explicit handle/report types: `XetFileUpload`, `XetStreamUpload`, `XetFileDownload`, `XetCommitReport`, and `XetDownloadGroupReport`, and standardizes task state via `XetTaskState`. > > Adjusts APIs and error semantics: `commit()` now returns an aggregate report (dedup metrics + progress + per-file metadata) and no longer consumes `self`; progress methods become infallible (`progress()`); cancellations/errors are consolidated (`AlreadyCompleted`, `UserCancelled`, `KeyboardInterrupt`, `TaskError`/`PreviousTaskError`) with updated Python exception mapping. `xet_data` now returns per-file `DeduplicationMetrics` from upload tasks and adds a zero-copy `SingleFileCleaner::add_data_from_bytes`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `153a3ebbbe`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-27 07:54:37 -07:00
Di Xiao	b3edd92a10	Split out cargo bench compile check (#753 ) Acknowledged that running "cargo bench --no-run" on every test platform is slow. This PR - extracts benchmark compilation verification from the Linux and macOS build_and_test jobs into a dedicated `check-bench-compiles` job so it runs in parallel with the cargo test jobs; - also skips compiling "git_xet" in release mode which itself doesn't contain benchmarks and takes the longest to compile due to optimized linking; - also removes unused clippy component installs from Windows and macOS toolchain setup. See below that the `check-bench-compiles` job finishes faster than `build_and_test-linux` and `build_and_test-win`, so it's not introducing extra wait time.	2026-03-25 22:25:20 -07:00
Di Xiao	101837f691	Remove cargo bench from Windows CI (#752 ) As discussed, removing this step from Windows CI because it's just too slow on Windows.	2026-03-23 17:23:42 -07:00
Hoyt Koepke	ffee6a978c	Move session runtime decisions to XetRuntime (#742 ) Currently, session runtime routing is split between XetSession and XetRuntime. This PR centralizes runtime routing in XetRuntime, moving all wrapper structs there. Now, bridge_async / bridge_sync work universally to bring from async and sync runtimes. This PR also changes the default behavior to having the default new() method auto-detect whether the process can run inside an existing tokio runtime with valid features enabled vs. creating a new one. Also, then, with_tokio_handle() errors out if the provided tokio handle doesn't have the correct features.	2026-03-20 20:23:15 -07:00
Adrien	7b33764330	feat: make start_clean size parameter optional for streaming uploads (#732 ) ## Context These changes support the hf-mount project, where FUSE streaming uploads don't know the file size in advance. ## Summary - Changes the `size` parameter of `FileUploadSession::start_clean()` from `u64` to `Option<u64>`. Passing `None` signals that the final file size is unknown (FUSE streaming uploads), which prevents `debug_assert` panics when `completed_bytes` exceeds the initially declared `total_bytes=0`. - Propagates `Option<u64>` to the public API: `UploadCommit::upload_file()` and `upload_file_blocking()` now take `file_size: Option<u64>`. - All existing callers are updated to wrap the size argument in `Some(...)`. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Public upload APIs now accept an optional file size, which is a breaking signature change and could affect downstream callers and progress tracking behavior when size is `None`. Implementation changes are small but touch core upload session and commit interfaces. > > Overview > Enables streaming uploads where the final file size is not known up front by changing `FileUploadSession::start_clean` to take `Option<u64>` and treating `None` as an unknown size for progress/completion tracking. > > Propagates this optional-size API through `UploadCommit::upload_file` / `upload_file_blocking` and updates all internal callers, examples, and tests to pass `Some(size)` when the size is known, along with doc updates reflecting the new `None` semantics. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `8b41e11e24`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-20 23:33:18 +01:00
Hoyt Koepke	332a456e1d	Add ordered and unordered download streaming to session interface (#729 ) This PR adds ordered and unordered download streams on XetSession, including optional byte-range support and per-stream progress reporting. Blocking and async variants are supported. On the reconstruction side, this introduces UnorderedWriter and UnorderedDownloadStream in xet_data, and extends the FileDownloadSession stream APIs to take optional source ranges. Ordered and unordered streams now share the same session-facing access pattern for async and blocking callers. This PR also renames DownloadGroup to FileDownloadGroup; the stream data uses the per-session memory pool but don't count towards the maximum number of concurrent downloads in progress. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches core file reconstruction/writer plumbing (including `DataWriter` ownership and new unordered writer/stream paths) and changes public session APIs, so regressions could impact download correctness, cancellation, or progress reporting. > > Overview > Adds first-class ordered and unordered streaming download APIs to `xet_pkg::xet_session`, including async and blocking variants, optional source-relative byte ranges, and per-stream progress via new `XetDownloadStream` / `XetUnorderedDownloadStream` wrappers. > > On the data layer, introduces an unordered reconstruction path (`UnorderedWriter` + `UnorderedDownloadStream`) and refactors streaming to spawn reconstruction tasks immediately but gate execution behind `start()`; stream abort callbacks are now registered per-stream and automatically unregistered on drop to avoid callback accumulation. > > Updates the reconstruction writer contract by making `DataWriter::finish` consume the writer (and shifting `DataWriter` to `&mut self` usage), adjusts `SequentialWriter` accordingly, and adds Criterion-based reconstruction benchmarks plus extensive unordered-stream tests. Also renames session `DownloadGroup` to `FileDownloadGroup` (and constructors) and updates call sites/examples. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `e02890aa4b`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-20 14:40:18 -07:00
Hoyt Koepke	602d7679f6	Add `cargo smoke-test` for rapid full-workspace testing. (#741 ) Currently, the full test validation is rather heavy, but running local tests often fails to catch many issues due to the tests that probe the full stack. This PR adds a smoke-test path that runs a meaningful subset of the tests across the workspace that covers most errors. This runs in about 1/8 of the time as cargo test, so it's useful to use in speeding up AI model iteration. In addition, a few intermittent failures were also fixed. There should be no runtime functionality change. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk since changes are limited to Cargo configuration and test gating; no production code paths are modified. Main risk is accidentally skipping too much coverage or misconfiguring feature flags in CI/local workflows. > > Overview > Adds a new `cargo smoke-test` workflow by introducing a `smoke-test` Cargo profile and a `cargo` alias that runs `test` with per-crate `smoke-test` features enabled. > > Defines `smoke-test` features across multiple crates and uses `#[cfg_attr(feature = "smoke-test", ignore)]` / `#[cfg(... not(feature = "smoke-test"))]` to skip long-running, concurrency-heavy, or full-stack integration tests during smoke runs. > > Tightens test robustness by making `SafeFileCreator` permission assertions umask-tolerant (require owner read/write rather than an exact `0o644`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `5d53009652`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Hoyt Koepke <hoytak@xethub.com>	2026-03-20 13:32:38 -07:00
Hoyt Koepke	7aea9fac07	Fix simulation deletion controls and soft-delete behavior for GC simulation (#736 ) Currently, GC simulation coverage depends on deletion-control operations that were only partially wired through the disk-backed HTTP simulation path, and deletion behavior in LocalClient needed to preserve shard-hash stability across GC epochs. This PR adds shard dedup-entry cleanup to the deletion-control surface and updates the file deletion behavior so shard files are not rewritten. Note: this introduces a breaking change against LocalClient in that current LocalClient repositories won't persist across this commit. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes affect disk-backed simulation CAS behavior and persistence (new LMDB tables + deletion semantics) and add a new deletion-control API used by GC; regressions could break existing local test data or GC integration paths. > > Overview > Enables correct GC integration testing against the disk-backed simulation server by switching `LocalClient::delete_file_entry` to a soft-delete backed by a new LMDB `file_status_table`, and updating listing/reconstruction/direct-file-access paths to hide and reject deleted files without rewriting shard files. > > Extends the deletion-control surface with `remove_shard_dedup_entries` (plus a new `DELETE /simulation/shards/{hash}/dedup_entries` route and client support) and fixes `LocalTestServerBuilder` to actually wire a `deletion_client` for disk-backed servers so `/simulation/` deletion routes stop returning `501`. > > Reworks `verify_integrity` to validate XORB references across shards* (global dedup aware) and skip soft-deleted files, adds targeted unit/integration tests for the new behaviors, and tightens log cleanup to avoid protecting stale logs on PID reuse by comparing process start time to the log’s embedded timestamp. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `fdca297600`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-20 10:02:21 -07:00
Hoyt Koepke	b137daf88e	Implement optional range specification for downloads. (#735 ) Currently, download metadata assumes file_size is always known, which forces callers to provide a size even when only a hash is available. This PR changes XetFileInfo.file_size to Option<u64> -- with serialization compatibility -- and propagates that through so hash-only downloads are a supported path while known-size flows continue to work as before. On the download path, this updates the reconstructor setup and range handling so progress can start without a final total and then finalize when EOF is discovered. For known-size full-file downloads, it now validates the reconstructed byte count and returns DataError::SizeMismatch when expected and actual size differ. In addition, open ended ranges (e.g. `start..` and `..end`) are now supported through all APIs. This also adds coverage for range-based writer/stream downloads and unknown-size round trips in session-level tests. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk because it changes a widely used API type (`XetFileInfo.file_size`) and adjusts download/reconstruction behavior, which can affect progress reporting and error handling across Rust and Python bindings. > > Overview > Enables hash-only downloads by changing `XetFileInfo.file_size` from `u64` to `Option<u64>` (serde backward-compatible) and adding `XetFileInfo::new_hash_only`, then propagating the optional size through `xet_pkg` and `hf_xet` (Python `PyXetDownloadInfo.file_size` and `PyPointerFile.filesize`). > > Extends download APIs to accept open-ended ranges via `RangeBounds<u64>` (e.g. `start..`, `..end`, `..`) and updates reconstructor/progress behavior to handle unknown totals, while adding `DataError::SizeMismatch` and validating reconstructed byte counts for full downloads and bounded ranges. > > Adds substantial new unit/integration test coverage for range variants, unknown-size round trips, and size-mismatch errors, plus minor CLI output adjustments to print unknown sizes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `4d25896c51`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-20 00:14:52 -07:00
Hoyt Koepke	749c28b086	Error unification and cleanup (#737 ) This PR performs some housecleaning and removes some technical debt around using different error types, unifying them with the python interface. - Our client code tended to do a lot with anyhow errors as an artifact of first using them before switching to thiserror. This PR cleans these up in favor of using ClientError or other named error types directly. - It also removes all the aliases to the old error type names present in the packages before the refactoring, now settling into ClientError, FormatError, DataError, and RuntimeError, with XetError being the error type exposed publicly. - Also, currently, xet_session exposes SessionError as an alias of XetError, which adds an extra public type name without adding behavior. This PR removes that alias and standardizes the public API/docs onto XetError directly. -It also tightens Python-facing error behavior and moves the python handling to the XetError class directly, hidden behind a python feature flag. Using these types, hf_xet now registers XetObjectNotFoundError and XetAuthenticationError exception classes for authentication and the not-found cases. These inherit from the current exception classes, so all behavior is preserved. - In addition, the From for PyErr mapping routes timeout/network/auth/not-found categories to more appropriate Python exception types than simply RuntimeError. This is primarily an API-surface cleanup plus error-classification alignment. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > API-breaking error-surface changes (removal of legacy alias modules and signature changes like `CredentialHelper::fill_credential`) may require downstream code updates, especially where errors are matched/converted. Runtime behavior should be mostly unchanged, but error mapping/propagation paths (including Python exceptions) are widely touched across crates. > > Overview > This PR unifies error types across the workspace by removing legacy re-export/alias modules (e.g. `CasClientError`, `CasTypesError`, `DataProcessingError`, `SessionError`) and updating call sites to use canonical errors like `xet_client::ClientError`, `xet_core_structures::CoreError`, and `xet_data::DataError` directly. > > It updates CAS client code to standardize on `crate::error::Result`/`ClientError`, including deleting `cas_client/error.rs`, adjusting error conversions in retry/http middleware paths, and updating simulation/local-server code to map `ClientError` to HTTP responses. > > Python bindings (`hf_xet`) now convert failures via `XetError` (with `xet_pkg` built with `python` support), register custom exceptions on module init, and refine argument-validation errors to `PyValueError` while routing network/timeout/auth/not-found to more appropriate Python exception classes. > > Misc cleanup: `git_xet` now depends on `xet-data`, simulation binaries switch to `anyhow::Result`/`bail!`, and lockfiles are updated for new/updated dependencies (notably `pyo3`/`inventory`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `f3d056a909`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-19 16:34:28 -07:00
Di Xiao	44566cc288	Early check RuntimeMode in blocking apis (#739 ) Since a previous PR merges the async and blocking APIs under one struct, the blocking APIs become accessible from `UploadCommit` / `DownloadGroup` created by the async APIs. This PR adds the similar `External` runtime mode checking to these blocking APIs as done for the `new_upload_commit_blocking` and `new_download_group_blocking` functions, so that they return an `Err` gracefully if possible instead of panic. However, this doesn't guard users from "deliberately" creating an `UploadCommit` / `DownloadGroup` instance with `Owned` runtime mode and send it to an async context and call the blocking APIs, in which case it will still panic. Added unit tests and updated docs for the above changes.	2026-03-19 13:31:55 -07:00
Di Xiao	fb83178d28	Fix session id regression (#738 ) The session id was replaced from `ulid` to `UniqueID` (a self incrementing u64 in memory) in a previous PR but it's not correct. The session id is used on CAS server logs and traces and CDN logs to identity a related group of activity (for debugging and etc. purposes) and it needs to be globally unique (thus using `ulid`) instead of locally unique.	2026-03-19 13:31:40 -07:00
Di Xiao	e25ee85c14	Fix a compilation failure (#740 ) Fix a compilation failure on a new function introduced by https://github.com/huggingface/xet-core/pull/726 caught by the new introduced CI step `cargo bench --no-run`.	2026-03-19 12:03:33 -07:00
Di Xiao	9f68537319	Resolve all Dependabot alerts (#733 ) This PR should resolve all Dependabot alerts by upgrading deps and switching out some deprecated crate for suggested alternatives, e.g. `tempdir -> tempfile`. Supersede PR #721. Fix issue #722	2026-03-19 09:33:56 -07:00
Di Xiao	4d24627180	Fix bench code compilation after repo restructuring (#728 ) The last repo restructuring didn't update several bench code that are not compiled by default as part of "cargo build". This PR fixes those compilation errors and warning, and adds "cargo bench --no-run" to CI which checks compilation but doesn't actually run benchmarks.	2026-03-19 09:28:57 -07:00
Hoyt Koepke	6fb97241f3	Integration test suite on top of xet session interface. (#727 ) This PR adds a full integration test suite on top of the xet session interface that mimics the integration tests in xet_data/tests/. This one additionally tests alternate asynchronous runtimes to ensure that the bridge to the internal tokio runtime works correctly as well.	2026-03-18 18:08:06 -07:00
Hoyt Koepke	506fc28291	Simplify progress tracking + Unify Task ID tracking + Legacy Interface (#726 ) Currently, progress tracking is split between callback-driven and snapshot-driven paths, making session and task wiring across xet_data, xet_pkg, hf_xet, and git_xet harder to keep consistent. This PR moves upload/download progress to a polling snapshot model backed by atomics. It also switches task identifiers to a UniqueID common with the progress tracking throughout the session APIs. This PR also updates the rate estimation to use the lighter weight exponentially weighted moving averages model, so this can be done at a low level. To preserve compatibility for existing callback consumers, callback-oriented upload/download progress tracking APIs are moved under xet_pkg::legacy and bridged from polling snapshots via a callback based updaters. hf_xet and git_xet are updated to use that legacy bridge layer, so current integrations keep working until everything is fully switched over to the XetSession method.	2026-03-18 18:07:43 -07:00
Rajat Arya	c0f7980616	feat: smoke tests using hf CLI with bucket and large-file coverage (#710 ) ## Summary - Rewrites smoke tests to drive everything through the `hf` CLI rather than the huggingface_hub Python API, covering the actual user-facing surface area of hf-xet - Moves smoke tests and diagnostic scripts into a `scripts/` directory for cleaner repo layout - Adds storage bucket test suite exercising the full bucket lifecycle - Adds 50 MB and 100 MB files to repo upload/download tests ## Test matrix (14 tests, all passing) Repository tests (`hf upload` / `hf download`) - Upload single file, upload folder - Download individual files + SHA-256 verify - Download entire repo + SHA-256 verify - Overwrite file and verify new content served - Delete file and confirm absent Bucket tests (`hf buckets`) - `cp` upload / download + verify - `sync` upload / download + verify - Recursive list confirms expected paths - Overwrite via `cp` + verify - `sync --delete` removes extraneous remote files - `rm` + confirm absent from listing ## Test plan - [x] Run `HF_TOKEN=... ./scripts/smoke_tests/run.sh` and confirm all 14 tests pass - [x] Run `./scripts/smoke_tests/run.sh --skip-buckets` for repo-only path - [x] Run with `--hf-xet-version <version>` to confirm PyPI cache bypass works 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 19:07:05 -07:00
Hoyt Koepke	69c714c01d	Update config groups to handle more of the data management values. (#702 ) This PR moves some config values that were part of the data configuration into XetConfig, specifically the compression_policy, staging_subdir, session_dir_name, and global_dedup_query_enabled. This also consolidates the remaining values into a single struct with endpoint and authentication information.	2026-03-16 16:06:46 -07:00
Hoyt Koepke	ed182125fa	Add optional Sha256 hash propegation through XetFileInfo object. (#718 ) Currently, the SHA-256 hash of uploaded file content is computed internally during the upload pipeline but not surfaced to callers. Downstream consumers — e.g. OpenDAL's Hugging Face backend — need the SHA-256 to commit files to the Hub API. This PR adds an optional sha256 field to XetFileInfo, the session-layer FileMetadata, and the Python-exposed PyXetUploadInfo. The field is populated from the already-computed hash when Sha256Policy::Compute or Sha256Policy::Provided is used, and left None for downloads and when Sha256Policy::Skip is used. Serde attributes (default, skip_serializing_if) ensure backward-compatible serialisation — existing serialised data without the field deserialises cleanly. Needed for the functionality in https://github.com/huggingface/xet-core/pull/642.	2026-03-16 16:05:49 -07:00
Hoyt Koepke	40ba7b8911	Fix race condition in task status transitions on abort. (#720 ) Currently, the Queued → Running status transition in spawned upload tasks is unconditional — it overwrites whatever the current status is, including Cancelled set by a concurrent abort() call. This creates a race window: if abort() sets Cancelled between the semaphore acquisition and the status write, the task overwrites it with Running, then the completion guard (if matches!(*s, TaskStatus::Running)) passes and sets Completed. The result is a task that was aborted but reports Completed. This PR makes the Queued → Running transition conditional, matching the already-guarded Running → Completed/Failed transition. If the status is no longer Queued when the task starts, it bails early with SessionError::Aborted. This closes the race window — all three status transitions are now properly guarded against concurrent abort(). This was observed as a flaky test failure on Windows CI (test_abort_while_state_lock_held_skips_state_update_but_drains_tasks).	2026-03-16 14:17:57 -07:00
Hoyt Koepke	9caf7fcc44	V2 reconstruction with client-side optional single range splitting (#703 ) This PR introduces V2 multirange URL fetching for xorbs, but optionally splits the multirange requests into multiple single-range requests that can be executed in parallel. This allows the reconstruction process to generate full multirange presigned URLs, but the client effectively performs the retrieval stage as a sequence of parallel single-range queries. The config variable `client.enable_multirange_fetching` controls this behavior; by default it is set to false due to the current observed slowness of fetching multiranged URLs. --------- Co-authored-by: Adrien <adrien@huggingface.co>	2026-03-16 14:10:50 -07:00
Hoyt Koepke	79df99ad01	Unify sync and async download/upload groups in session interface. (#719 ) Currently, `UploadCommitSync` and `DownloadGroupSync` are thin wrappers around `UploadCommit` and `DownloadGroup` that delegate every method through `external_run_async_task`. This means two types, two sets of doc comments, and two test suites covering the same underlying behavior. This PR removes the separate sync types and adds `_blocking` suffixed methods directly on `UploadCommit` and `DownloadGroup`. The session factory methods `new_upload_commit_blocking()` and `new_download_group_blocking()` now return the same types as their async counterparts, and the entire `xet_session::sync` module is deleted (~680 lines removed). This also fixes a minor bug: `UploadCommitSync::upload_from_path` did not call `std::path::absolute()` on the file path before dispatching, unlike the async version. The new `upload_from_path_blocking` includes the `std::path::absolute()` call, matching the async version's behavior.	2026-03-16 13:33:46 -07:00
Hoyt Koepke	71f8570a0e	Optimize config struct for direct access in python (#706 ) This PR adds in a feature flag, "python" to the xet_runtime package such that when compiled, the XetConfig struct is built to have python getters and setters. This integrates the handling of the config struct directly into the XetConfig struct and the macros used to register the config values, making the handling of values in the python bindings seamless.	2026-03-16 12:23:43 -07:00
Di Xiao	b7cd43c8cb	Add Sha256Policy to XetSession APIs (#701 ) Stacking on top of https://github.com/huggingface/xet-core/pull/694, this updates both the async and sync APIs: update `XetSession::UploadCommit` and `XetSession::UploadCommitSync` pub APIs to explicitly take a `sha256: Sha256Policy` to control whether to compute and embed a sha256 for that file in the upload info. Resolves XET-898 --------- Co-authored-by: Hoyt Koepke <hoytak@huggingface.co>	2026-03-16 11:03:23 -07:00

1 2 3 4 5 ...

548 Commits