xet-core

mirror of https://github.com/huggingface/xet-core.git synced 2026-06-04 13:30:29 +08:00

Author	SHA1	Message	Date
Hoyt Koepke	6f0cf38065	Stable chunk boundary detection (#815 ) This PR adds a function, next_stable_chunk_boundary, that takes a list of chunk boundary positions and a starting cut point and returns the next chunk boundary after the cut point such that, for all possible alterations of the data up to the cut point, the chunk boundaries when chunking the entire file will always be the same starting at the stable chunk boundary. The implication of this is that to alter a specific range of a file `[a, b)`, we would do the following: 1. Locate the previous chunk boundary before a; call this `c_start`. 2. Take the full set of chunk boundary locations, call next_stable_chunk_boundary with b as the cut point. this will return the next stable chunk boundary. Call this `c_end`. 3. Make the replacement to `[a, b)`; prepend the original `data[c_start, a)` and append `data[b, c_end)`; chunk this segment. 4. Use the merkle hash subtrees for `[0, c_start)`, the new [c_start, c_end), and the original `[c_end, end)` to calculate the new file hash. This will be the same as chunking the entire new file. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds new public chunk-boundary selection logic used to make resumed/partial workflows deterministic; mistakes could cause misalignment or incorrect resume behavior in deduplication/chunking paths. Large new randomized/stress tests reduce risk but the algorithm’s correctness assumptions are subtle. > > Overview > Introduces a new public helper, `next_stable_chunk_boundary`, that computes a restart-safe/stable resume boundary from existing chunk-boundary metadata (no byte access) by scanning for two consecutive chunks that fall within a conservative size window derived from chunking constants. > > Updates `find_partitions` documentation to reflect the hash warmup/hidden-trigger verification approach and to reference the new helper, re-exports the function from `xet_data::deduplication`, and adds extensive edge-case and randomized mutation/stress tests to validate boundary stability under arbitrary prefix changes. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `98411603e3`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-05-18 12:16:49 -07:00
Rajat Arya	8b8db52be2	Raise HF_XET_CLIENT_READ_TIMEOUT to 300s + clean up query_dedup 404 log (#808 ) Two connected cleanups from the [2026-04-21 Julien upload-stuck investigation](https://www.notion.so/huggingface2/Julien-upload-stuck-upload_xorb-120s-timeouts-2026-04-21-3491384ebcac81a19d0af5394745cfff). Closes #807. Docs PR: huggingface/hub-docs#2419. ## Change 1 — raise `HF_XET_CLIENT_READ_TIMEOUT` default 120s → 300s Files: `xet_runtime/src/config/groups/client.rs`, `xet_client/src/cas_client/remote_client.rs` (stale comment). The 120s client read timeout was firing before legitimate `upload_xorb` requests could complete on high-latency / transatlantic / bursty links. Fleet-wide this produced a chronic 30–50% xorb POST failure rate (1,092–4,196 `error uploading xorb` events per hour sustained over 24h, peaking at 49.1% in the investigation window). 267 successful uploads in the same 24h had latency > 120s (max 37 min), so 120s wasn't protecting anything legitimate — it was only cutting off slow-but-healthy streams. 300s preserves stall-detection semantics (still an order of magnitude under the 3600s ALB idle). The env override `HF_XET_CLIENT_READ_TIMEOUT` is unchanged. ## Change 2 — log `query_dedup` 404 as cache miss, not "Fatal Error" Files: `xet_client/src/cas_client/retry_wrapper.rs`, `xet_client/src/cas_client/remote_client.rs`. A 404 from `cas::query_dedup` is an expected cache miss — the caller converts it to `Ok(None)` and proceeds to upload. Today the retry wrapper logs it as `Fatal Error: \"cas::query_dedup\" api call failed ... 404 Not Found`, producing 20+ alarming-looking lines per upload session with no actual failure behind them (Hoyt flagged this in the incident Slack thread). Fix: add `RetryWrapper::with_expected_404()` — mirroring the existing `with_expected_416()` pattern — and opt `query_dedup` into it. The 404 still short-circuits retries and surfaces as a fatal error to the caller (preserving the existing `Ok(None)` conversion), but the log line now reads `Not Found (cache miss): \"cas::query_dedup\" api call failed ... 404 Not Found`. ## Test plan - [x] `cargo +nightly fmt --all --check` clean - [x] `cargo test -p xet-client --lib cas_client::retry_wrapper` — 5 passed (incl. new `test_404_expected_is_fatal_and_not_retried`) - [ ] Manually verify `HF_XET_CLIENT_READ_TIMEOUT=120` still overrides via env - [ ] Confirm a session run produces no `Fatal Error:` lines for the `query_dedup` 404s - [ ] Watch the xorb POST error-rate panel on the [CAS Grafana dashboard](https://grafana.huggingface.tech/d/dejp4w2hael1cb/cas) after release; expect the 120s-clustered p50 to disappear 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adjusts client networking defaults (read timeout) and alters retry-wrapper handling/logging for HTTP 404s, which can change behavior and observability for slow uploads and cache-miss paths. > > Overview > Raises the default `HF_XET_CLIENT_READ_TIMEOUT` from 120s to 300s to better tolerate slow-but-progressing transfers. > > Adds `RetryWrapper::with_expected_404()` and opts `cas::query_dedup` into it so 404 responses are still non-retried/fatal to the caller but are logged as an expected cache miss (with a new unit test covering the no-retry behavior). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `3e88f9cf8f`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:49:51 -07:00
Assaf Vayner	c3c726bed5	Add napi smoke-test example for hf-xet (#835 ) Human context: in integrating new hf-hub usage in tokenizers, tokenizers also generated a napi binary. so we should validate that hf-hub/hf-xet are napi compat (hf-hub is pretty trivial if given that hf-xet is compatible). ## Summary - Adds `examples/xet_pkg_napi/` — a minimal napi-rs binding that links `hf-xet` (the `xet` crate at `xet_pkg/`) into a Node.js native addon. - Exposes `initLogging(version)` and `smokeTest()`. The smoke test builds a `XetSession` synchronously and constructs upload-commit + file-download-group builders to exercise lazy runtime startup. - Crate is excluded from the xet-core workspace and carries its own `[workspace]` table so it stays standalone under git worktrees (where cargo would otherwise resolve through the canonical repo path). - Build artifacts (`.node`, `index.js`, `index.d.ts`, `node_modules/`) are gitignored; `Cargo.lock` and `package-lock.json` are committed for reproducibility. The point of the smoke test is not* a full JS API — it's to verify hf-xet compiles, links, and starts inside libuv (no pyo3, no host-owned tokio). If you can run `npm run smoke` and see `xet session built; runtime initialized`, the integration is ready for a fuller binding (async upload/download via `#[napi]` async fns, progress callbacks via `ThreadsafeFunction`). ## Test plan - [x] `npm install` (in `examples/xet_pkg_napi/`) - [x] `npm run build:debug` — compiles `hf-xet`, `xet-runtime`, `xet-client`, `xet-data`, `xet-core-structures` and the napi shim against napi 2.16 - [x] `npm run smoke` — outputs: ``` loaded addon, exports: [ 'initLogging', 'smokeTest' ] smokeTest: xet session built; runtime initialized ``` - [x] Verify on Linux (only tested on darwin-arm64 locally) - [x] Decide whether to wire into CI, or keep as a manual example <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Adds a standalone example project and build scripts without changing production crates; primary risk is repo bloat/noise from the committed lockfiles and an extra exclusion in the workspace. > > Overview > Adds a new standalone `examples/xet_pkg_napi` project to smoke-test that `hf-xet` can compile/link as a `napi-rs` Node native addon and perform a real file download via the blocking download APIs. > > Updates the root `Cargo.toml` to exclude this example from the workspace, and includes the example’s build/run scaffolding (`package.json`, `smoke.mjs`, `build.rs`) plus committed lockfiles and a `.gitignore` for generated artifacts. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `cb628956f7`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-05-14 13:48:47 -07:00
Assaf Vayner	654080d080	[codex] Deduplicate shard file infos (#834 ) ## Summary - Deduplicate MDBMinimalShard file infos by file hash during sync and async streaming parse. - Keep only the first file info seen for a duplicate file hash; async callbacks fire only for retained entries. - Add a focused streaming-shard test covering parse, async callbacks, and reserialization. ## Why Duplicate file infos can survive the minimal streaming shard parse/re-serialize path because it stores file entries as a Vec. This narrows canonicalization to that streaming path while leaving in-memory shard and set-operation behavior unchanged. ## Impact - MDBMinimalShard::num_files() now reports unique file hashes for parsed shards. - Later duplicate file infos are ignored even if they contain richer optional verification or metadata extension data. - Raw full-section readers, MDBInMemoryShard behavior, and shard set operations remain unchanged. ## Validation - cargo test -p xet-core-structures metadata_shard - cargo test -p xet-client test_global_dedup - git diff --check - rustfmt --edition 2024 --check xet_core_structures/src/metadata_shard/set_operations.rs xet_core_structures/src/metadata_shard/shard_in_memory.rs xet_core_structures/src/metadata_shard/streaming_shard.rs <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes shard streaming parse semantics by dropping duplicate `file_hash` entries, which can affect downstream counts/serialization and may hide later entries’ richer metadata/verification. > > Overview > `MDBMinimalShard` now deduplicates file-info records by `file_hash` during both sync (`from_reader`) and async (`from_reader_async_with_custom_callbacks`) streaming parses, keeping only the first occurrence. > > Adds a focused test that constructs a shard stream with duplicate file infos and asserts first-wins behavior, validates async parsing/callback behavior, and confirms re-serialization only emits the retained entry. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `1320ce36ce`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-05-14 08:49:52 -07:00
Arpit Jain	6f5060d732	ci: declare empty permissions on hf-xet prerelease testing workflow (#843 ) The `hf-xet prerelease testing` workflow currently doesn't declare a `permissions:` block, so the workflow `GITHUB_TOKEN` falls back to the repository default. Every step in `trigger_rc_testing` authenticates via `TOKEN_HUGGINGFACE_HUB_AUTO_BY_XET` (a PAT scoped for the hf-hub auto-update flow): - the `actions/checkout` step pulls `huggingface/${{ matrix.target-repo }}` with `token: ${{ secrets.TOKEN_HUGGINGFACE_HUB_AUTO_BY_XET }}` - `git push` reuses the credentials persisted by checkout So the workflow's own `GITHUB_TOKEN` is unused. `permissions: {}` (workflow scope) pins that. Pattern matches the workflow-level permissions blocks already used in this repo. With it set: - the workflow token can't be widened by a future change to the repo default - the SLSA / OpenSSF Scorecard `Token-Permissions` check passes for this file - a hypothetical compromise of any third-party action reachable from this workflow (cf. `tj-actions/changed-files` CVE-2025-30066) has nothing to do with the workflow token Signed-off-by: Arpit Jain <arpitjain099@gmail.com>	2026-05-14 05:02:17 -07:00
dependabot[bot]	feb8ddb6fd	Bump openssl from 0.10.76 to 0.10.79 (#836 ) Bumps [openssl](https://github.com/rust-openssl/rust-openssl) from 0.10.76 to 0.10.79. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Di Xiao <seanses@users.noreply.github.com>	2026-05-08 11:43:17 -07:00
tison	5b4f60e9e8	chore: use ctor 1.0 (#830 )	2026-05-08 07:02:21 -07:00
Adrien	ad09a1a70f	fix(ci): export __heap_base in WASM build to fix wasm-bindgen threading (#832 ) ## Summary - Add `-C link-arg=--export=__heap_base` to WASM RUSTFLAGS in `build_wasm.sh` ## Context `nightly-2026-05-06` (`365c0e1d7`) stopped exporting `__heap_base` by default when `--import-memory` is used, which breaks `wasm-bindgen`'s thread ID injection: ``` error: failed to prepare module for threading Caused by: failed to find `__heap_base` for injecting thread id ``` Rather than pinning the nightly, explicitly exporting the symbol keeps us on latest nightly and is forward-compatible (the flag is a no-op when the symbol is already exported). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: a build-script-only change that adjusts WASM linker exports; main risk is unintended side effects in the generated WASM module layout or compatibility, but it is limited to the wasm build pipeline. > > Overview > Updates `wasm/hf_xet_wasm/build_wasm.sh` to explicitly export the `__heap_base` symbol via an additional linker arg in `TARGET_RUSTFLAGS` when building the `wasm32-unknown-unknown` artifact. > > This is intended to keep `wasm-bindgen` threading support working under newer nightly toolchains when `--import-memory` is enabled. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `638ca13c80`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-05-07 11:05:51 +02:00
Di Xiao	8dfe75d68a	Remove third party codesign & notarize action (#829 ) Fix issue https://github.com/huggingface/xet-core/issues/822. ## Problem `lando/code-sign-action` (used to codesign and notarize the macOS `git-xet` binary) does not pin its transitive dependency `cognitedata/code-sign-action@v3` & `lando/notarize-action@v2` to a commit SHA. This repo enforces SHA-pinning for all third-party actions, so the workflow was failing for a while. ## Solution My attempt PR to pin its transitive dependencies met with no response, so this PR extracts the macOS codesign + notarize logic into a local composite action `.github/actions/macos-codesign-notarize` — mirroring the existing `.github/actions/windows-codesign` pattern — with zero external `uses:` dependencies.	2026-05-06 06:05:28 -07:00
dependabot[bot]	6ed5a00c8a	Bump rustls-webpki from 0.103.10 to 0.103.13 (#823 ) Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.10 to 0.103.13. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Di Xiao <seanses@users.noreply.github.com> 1.5.0	2026-05-04 11:34:23 -07:00
Di Xiao	1629992d05	Add context manager for upload stream (#825 ) Addresses the review [comment](https://github.com/huggingface/xet-core/pull/792#issuecomment-4305213150) on #792: adds a context manager to `XetStreamUpload` so callers no longer need an explicit `finish()` call.	2026-05-01 19:55:24 -07:00
Di Xiao	9e804c2dea	Remove unnecessary `UniqueId` -> `UniqueID` type alias (#824 ) In response to https://github.com/huggingface/xet-core/pull/792#discussion_r3119356452, this removes the \`UniqueID\` type alias that re-exported \`xet_runtime::utils::UniqueId\` under a screaming-snake-case name from \`xet_data::progress_tracking\`. This type alias is unnecessary and caused confusion for reviewers (both human beings and agents).	2026-05-01 04:10:14 -07:00
Di Xiao	23ec2940bb	Expose XetSession APIs to Python (#792 ) Replaces the old `upload_files` / `download_files` / `hash_files` Python functions with a new object-oriented API that exposes `XetSession` and its child objects directly as PyO3 classes. This gives Python callers full control over session lifecycle, connection pooling, and progress reporting. The previous module-level functions are kept under `hf_xet/src/legacy/` and remain importable as `from hf_xet import upload_files` etc., but now emit `DeprecationWarning`.	2026-05-01 03:05:51 -07:00
Assaf Vayner	d40f96bbea	Fix spelling typos in comments and docs (#826 ) ## Summary - Run codespell across tracked files in the repo and fix unambiguous spelling typos - All edits are in comments, doc strings, an issue template, and one log message — no logic changes - 22 typos fixed across 19 files (e.g. retreived→retrieved, elegible→eligible, occurances→occurrences, gauranteed→guaranteed, endianess→endianness, archetectures→architectures, etc.) ## Cases left for follow-up (not in this PR) A few hits were ambiguous and need human judgment: - \`xet_core_structures/src/metadata_shard/shard_file_manager.rs:1400\` — comment "but delet" appears truncated - \`xet_core_structures/src/metadata_shard/shard_format.rs:1577\` — "invalid somes" likely meant "invalid ones" - \`xet_data/src/deduplication/chunking.rs:564\` — comment trails off ("on other po") False positives left untouched: \`serde::ser::\` module paths, "process-global statics" (Rust \`static\` items), "implementor(s)" (valid alternate of "implementer"), "re-used", "unparseable". ## Test plan - [x] \`cargo check --workspace --lib --all-features\` passes - [ ] CI green on the draft PR <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk* > Low risk: changes are limited to spelling fixes in comments/docs, an issue template string, and a single log message, with no functional code modifications. > > Overview > Fixes a set of unambiguous spelling typos across the repo (primarily Rust comments/docstrings plus `.github/ISSUE_TEMPLATE/bug-report.yml` and `api_changes/README.md`). > > Also corrects one user-facing log line in `hf_xet` ("cofigured" -> "configured"); otherwise behavior is unchanged. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `e615df87a8`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-30 13:15:18 -07:00
Assaf Vayner	18ebe48a5a	Bump hf_xet rand to 0.10 (#811 ) ## Summary - Bump the `rand` dependency in the `hf_xet` crate from `0.9` to `0.10`, matching the workspace version already pinned at `Cargo.toml:81`. - In rand 0.10 the `random()` method moved from the `Rng` trait to the new `RngExt` trait, so `hf_xet/src/lib.rs` now imports `RngExt`. ## Test plan - [x] `cargo check` in `hf_xet` - [x] `cargo clippy --all-targets` in `hf_xet` (no new warnings) - [x] `cargo test --no-run` in `hf_xet` - [x] `cargo check --workspace` in repo root <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk dependency/compatibility updates; the only functional code change is adapting random generation imports and updating benchmark utilities to match async cache APIs and new `DiskCache::initialize` signature. > > Overview > Updates dependencies to newer versions: `hf_xet` moves `rand` from `0.9` to `0.10` (and workspace `rand_chacha` to `0.10`), plus lockfile bumps like `rustls-webpki`. > > Adjusts call sites for the `rand 0.10` API change by importing `RngExt` (and using `rand::rng()` in simulation code), and updates `simulation/chunk_cache_bench` to depend on `xet-runtime` and pass `XetConfig` into `DiskCache::initialize`, with benchmark code now `await`/`block_on` async `get`/`put` calls. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `65d9229773`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-04-29 10:37:44 -07:00
Adrien	145b819fc1	feat: expose CAS client factory and chunk cache re-exports (#730 ) ## Context These changes support the hf-mount project, which needs direct access to CAS client types. ## Summary - Changed `create_remote_client` visibility from `pub(crate)` to `pub` - Added re-exports for `CasClient`, `ChunkCache`, `CacheConfig`, and `get_cache` in `xet_data::processing`	2026-04-26 17:30:08 +02:00
Di Xiao	8df04e8183	Potential fix for a couple of crates release issues (#806 ) "bump-crates-version" and "crates-release" was under the same group, and a Run https://github.com/huggingface/xet-core/actions/runs/24692834954 was stuck forever under "Queued" state likely due to a failed "bump-crates-version" run.	2026-04-24 12:03:54 -07:00
Hoyt Koepke	b43c0aec0e	Move XetRuntime model away from thread-local statics (#801 ) This PR moves the XetRuntime model away from using thread-local statics and decouples the XetConfig and XetCommon structs from a single runtime. It introduces a struct XetContext that gives the runtime context for operations: ``` struct XetContext { pub runtime : Arc<XetRuntime>, // The current tokio runtime wrapper, minus the config and common objects.. pub common : Arc<XetCommon>, // The common cache objects, semaphores, rate trackers, etc. pub config : Arc<XetConfig> // The config } ``` Now, instead of using functions like `xet_runtime()` and `xet_config()` that examine the thread-local storage, we now explicitly passing through a XetContext instance from the session creation that gets stored in each major processing struct. This allows decoupling between the runtime, config, and common caches, especially: - Running multiple config settings and/or endpoints within the same pre-existing tokio runtime. - Running multiple runtimes that share the same XetCommon object.	2026-04-21 09:17:19 -07:00
Hoyt Koepke	7e91d1c361	Upgrade testing capability for GC simulations (#786 ) Currently, delete_xorb lives on DirectAccessClient and performs unconditional deletion — there is no way for GC workflows to do compare-and-delete style operations that guard against deleting an object that has been rewritten since it was listed. This PR moves delete_xorb to DeletionControlableClient and adds conditional deletion APIs for both xorbs and shards, keyed on an opaque 32-byte ObjectTag. New DeletionControlableClient methods: list_xorbs_and_tags / list_shards_with_tags — return object hashes paired with content-derived tags. delete_xorb_if_tag_matches / delete_shard_if_tag_matches — delete only when the current tag matches the provided one, returning whether the delete occurred. Tags are derived differently per backend: LocalClient hashes filesystem metadata (modified/created timestamps, size, permissions) to detect overwrites; MemoryClient hashes object content. MemoryClient now also fully implements DeletionControlableClient, so in-memory simulation servers wire through deletion routes instead of returning 501. verify_integrity on LocalClient gains a third pass that validates every shard referenced by the global dedup table still exists on disk, catching stale references that previously went undetected. Corresponding HTTP routes and SimulationControlClient methods are added for the new operations. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk: changes simulation deletion and integrity logic (new conditional delete semantics, new on-disk index table, and stricter global-dedup validation) that could affect GC workflows and tests if edge cases are missed. > > Overview > Enables compare-and-delete semantics for simulation GC by introducing `ObjectTag`-based listing and conditional deletion for both xorbs and shards (`list__with_tags`, `delete__if_tag_matches`) and moving XORB deletion off `DirectAccessClient` onto `DeletionControlableClient`. > > Extends the simulation HTTP control surface and `SimulationControlClient` with new tag list and tag-delete endpoints, and wires deletion controls for the in-memory backend (so routes no longer return 501). > > Hardens `LocalClient` integrity and deletion behavior by replacing the old file-status tracking with a `FILE_TO_SHARD_TABLE` index (file -> owning shard + byte ranges) for direct-seek validation/reads, always rewriting xorbs to ensure fresh tags, and adding a new `verify_integrity` pass that fails if `GLOBAL_DEDUP_TABLE` references shard files missing on disk. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `42ba1ec80d`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-20 17:11:09 -07:00
Assaf Vayner	b5f7280a3b	set version 1.5.2 (#805 ) <!-- CURSOR_SUMMARY --> > [!NOTE] > Low Risk > Low risk: this is a coordinated version bump across workspace manifests and lockfiles with no functional code changes. > > Overview > Bumps the workspace/package version to `1.5.2` and updates internal crate dependency pins (`xet-runtime`, `xet-core-structures`, `xet-client`, `xet-data`, `hf-xet`) from `1.5.1` to `1.5.2`. > > Regenerates lockfiles (`Cargo.lock` plus lockfiles under `hf_xet/` and `wasm/`) to reflect the new `1.5.2` crate versions. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `b4ec15471d`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-20 15:06:14 -07:00
Assaf Vayner	5868f64ab9	fixing some issues identified in cargo audit (#802 ) CI for hf-hub is running cargo audit and found many issues through hf-xet transitive deps. this PR attempts to solve some of them (not necessarily all of them). Main changes: - dropped derivative and reqwest-retry - replaced bincode with postcard, only used in testing - upgrade xet-core rand usage - added audit CI step and ignoring some issues that we can't easily fix. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk because it removes `reqwest-retry`/`derivative` and replaces part of the retry classification logic with an in-house equivalent, which could subtly change HTTP retry behavior; the remaining changes are dependency/version bumps and test-only serialization swaps. > > Overview > Adds a new CI `cargo audit` job and introduces `.cargo/audit.toml` to ignore a small set of dev-only RustSec advisories with documented rationale. > > Reduces audit surface by dropping `derivative` (manual `Debug` impl for `AuthConfig`) and removing `reqwest-retry`, replacing its status-code classification with a local `Retryable` enum + `default_on_request_success` helper in `RetryWrapper`. > > Updates workspace deps (notably `rand` to `0.10` and `rand_distr` to `0.6`) and adjusts call sites to the newer `rand` APIs (`RngExt` imports, minor test/bench tweaks). Test-only binary serialization switches from `bincode` to `postcard` (and updates affected tests), with corresponding lockfile updates across crates. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `26377f4a1c`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-20 14:49:48 -07:00
Di Xiao	1fd20f9eaa	Fix git-xet Windows installer build (#794 ) Accept the EULA after sponsoring wixtoolset: https://docs.firegiant.com/wix/osmf/#direct-acceptance	2026-04-10 13:23:20 -07:00
Di Xiao	efc8359323	Crates release workflow (#785 ) ## Summary Adds two workflows to automate the crates.io release process, and refactors the CI WASM job into a reusable composite action. Release process (two separate manual steps): 1. `bump-crates-version.yml` (triggered via `workflow_dispatch` with a `version` input): updates version fields in `Cargo.toml` files, runs `cargo build` + `cargo test` to validate, builds the `hf-xet` Python wheel and WASM targets to update related `Cargo.lock` files, then opens a PR (e.g. `crates-release/1.6.0`). The workflow terminates after PR creation. 2. `crates-release.yml` (triggered manually via `workflow_dispatch` after the version-bump PR is merged): checks out `main`, authenticates to crates.io via OIDC Trusted Publishing, and publishes crates in dependency order with index-propagation delays: `xet-runtime` → `xet-core-structures` → `xet-client` → `xet-data` → `hf-xet`. Requires manual approval via the `crates-release` GitHub environment. Design notes: - Split into two workflows to avoid holding a runner while waiting for the PR to be reviewed and merged - Version bump is committed to a PR so the repo always reflects the published version - Uses OIDC Trusted Publishing (`rust-lang/crates-io-auth-action`) — no long-lived secrets required. See https://crates.io/docs/trusted-publishing CI refactor: - Extracts the nightly Rust/WASM toolchain setup and `hf_xet_wasm` builds into a reusable composite action (`.github/actions/build-wasm`) - The composite action saves and restores the caller's default toolchain around the nightly build, so callers are not affected - Adds post-build porcelain checks in CI to fail if either WASM `Cargo.lock` has uncommitted changes after building ## One-time manual setup required Before this workflow can run successfully, complete the following: ### GitHub - [x] Create a GitHub Environment named `crates-release`: repo Settings → Environments → New environment - [x] Add required reviewers* to the `crates-release` environment — this is the manual approval gate before the `publish` job runs ### crates.io — Trusted Publishing Each crate must have been published manually at least once before Trusted Publishing can be configured. For each crate, go to its Settings page on crates.io → Trusted Publishing → Add, and fill in: \| Field \| Value \| \|---\|---\| \| Owner \| `huggingface` \| \| Repository \| `xet-core` \| \| Workflow name \| `crates-release.yml` \| \| Environment \| `crates-release` \| - [x] Configure Trusted Publishing for `xet-runtime` - [x] Configure Trusted Publishing for `xet-core-structures` - [x] Configure Trusted Publishing for `xet-client` - [x] Configure Trusted Publishing for `xet-data` - [x] Configure Trusted Publishing for `hf-xet`	2026-04-10 03:14:14 -07:00
Assaf Vayner	693945c84c	rm misleading token refresh log (#787 ) Very chatty log for when there is a valid token, but says token refresh occurred which may not actually be true. downleveled and corrected this log and added an actual info on token refresh success (error gets logged by the caller) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: only adjusts logging around CAS auth token refresh without changing token refresh logic or request behavior. > > Overview > Removes the unconditional "token refresh successful" info log from `AuthMiddleware::get_token` so normal requests that reuse a still-valid token no longer claim a refresh happened. > > Adds an `info!` log in `TokenProvider::get_valid_token` that emits only after a successful refresh (including the new expiry), while keeping failure logging in the middleware. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `2a9215c163`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-04-09 15:27:46 -07:00
Hoyt Koepke	9bdaf1e305	Fix for progress reporting bug (#788 ) Currently, there's a bug in the internal progress reporting where += is applied to a value that is intended to be a delta from the last report but is actually the amount sent so far. This causes hugely inflated values for the amount of data sent on large files. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes how `bytes_sent_so_far` is accumulated during partial progress updates, which can affect when/if concurrency adjustments trigger. Logic is localized but impacts adaptive concurrency behavior under high-volume transfers. > > Overview > Fixes a progress-reporting bug in `AdaptiveConcurrencyController` where repeated cumulative byte counts from partial updates could be added multiple times, vastly inflating `bytes_sent_so_far`. > > Each `ConnectionPermit` now tracks `max_bytes_reported` and the controller only adds the incremental delta (`n_bytes - previous_max`), preventing overcounting while preserving existing partial/final reporting behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `fdb2cd6c6c`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-07 15:54:49 -07:00
Di Xiao	7989267d1e	🔒Pin the unpinned GitHub Actions to commit SHAs (#781 ) During the last pin sweep there were a couple actions left unpinned. This pins them. \| Workflow \| Action \| Avant \| Après \| SHA \| \|---\|---\|---\|---\|---\| \| `cache-rust-build/action.yml` \| `actions/cache` \| `v5` \| `v5` \| `668228422ae6…` \| \| `windows-codesign/action.yml` \| `azure/trusted-signing-action` \| `v0` \| `v0` \| `1d365fec1286…` \| \| `pre-release-testing.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \|	2026-04-06 14:27:49 -07:00
Assaf Vayner	08377eab3c	Upgrade crates version to 1.5.1 (#782 ) ## Summary - Bump workspace version from 1.5.0 to 1.5.1 - Update all internal dependency version references to match <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk version-only bump across workspace manifests and lockfiles with no code/behavior changes in the diff. > > Overview > Bumps the workspace package version from `1.5.0` to `1.5.1` and aligns internal crate dependency version pins (`xet-runtime`, `xet-core-structures`, `xet-client`, `xet-data`, `hf-xet`) to match. > > Updates lockfiles (`Cargo.lock` plus `hf_xet` and wasm lockfiles) so published/embedded artifacts resolve to the `1.5.1` crate set (including bringing wasm lockfiles up to `1.5.1`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `e8563700a0`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 14:03:02 -07:00
Assaf Vayner	062138f845	Improve hf-xet crate documentation (#780 ) ## Summary - Expand the crate-level docs (`lib.rs`) with a proper introduction explaining what Xet storage is and an end-to-end upload+download example on the landing page - Add runtime detection and `XetConfig` guidance to `XetSessionBuilder` docs - Describe all three upload methods (path, bytes, stream) in the `xet_session` module docs - Add dedicated streaming upload and streaming download code examples - Add module doc comment to `legacy/` pointing users to `xet_session` ## Test plan - [x] `cargo check -p hf-xet` passes - [x] `cargo doc -p hf-xet --no-deps` builds with no new warnings (50 pre-existing warnings unchanged) - [ ] Review rendered docs on docs.rs after merge <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Documentation-only changes (module/crate docs and examples) with no functional code or behavior modifications, so runtime risk is low aside from potential doc/example inaccuracies. > > Overview > Improves public-facing documentation across `lib.rs`, `xet_session`, and `legacy` by adding a clearer introduction to Xet/CAS, end-to-end upload+download quickstart examples, and guidance on using `XetSessionBuilder` (runtime detection and `XetConfig`). > > Expands `xet_session` docs to describe the three upload modes (path/bytes/stream) and adds dedicated streaming upload and streaming download examples, while clarifying that `legacy` is maintained for backwards compatibility and steering new users to `xet_session`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `6e926857a9`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 13:55:29 -07:00
Assaf Vayner	cd55fb123c	no error log on 416 on reconstruction (#779 ) Fixes 778 Skip the error log in favor of info log on 416 when reconstruction API is called <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: behavior change is narrowly scoped to `get_reconstruction` requests, treating HTTP 416 as an expected terminal condition and altering log severity without impacting other status handling. > > Overview > Improves reconstruction fetch behavior by marking HTTP `416 Range Not Satisfiable` as an expected end-of-data condition: `RemoteClient::get_reconstruction_impl` now opts into this via `RetryWrapper::with_expected_416()`, and `RetryWrapper` special-cases 416 to log at info level while returning a non-retryable failure. > > Updates the WASM `Cargo.lock` to bump `xet-*` crate versions from `1.4.0` to `1.5.0`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `2f1a3b6ef0`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-06 13:42:05 -07:00
Di Xiao	1f7400cc4b	Drop xet-core-structure from xet-runtime dev dep (#776 )	2026-04-03 13:56:26 -07:00
Di Xiao	950807ba43	Upgrade crates version to 1.5.0 (#775 ) Update workspace version to `1.5.0` and intra-workspace dependency versions to `1.5.0`	2026-04-03 13:39:50 -07:00
Hoyt Koepke	0d9f78aaf4	Add README.md files and Cargo.toml updates needed for publishing hf-xet (#773 ) This PR adds crates.io-facing metadata (homepage, readme, keywords, categories) for the publishable crates, along with crate README files and concise crate-level docs so crates.io and docs.rs pages have better context.	2026-04-03 12:34:47 -07:00
Hoyt Koepke	014ff2d75b	Fix for FD leak (#774 ) Currently, the tests can fail intermittently due to a subtle fd leak in how the session and the runtimes interact. This causes tests using the sessions to quickly run out of file handles. There were two different issues: 1. XetSessionInner tracked active upload commits and file download groups in strong-reference maps, and those child objects held a clone of the session. That created a second cycle (session -> child -> session) that prevented cleanup of commit/download resources and the runtime handles. This is dropped. (Note that all abort/sigint-cancellation behavior is handled automatically through TaskRuntime; the session classes don't need any explicit code for it outside of that). 2. The static thread-local reference to the tokio runtime prevented the tokio runtime from getting cleaned up when it was created explicitly and not aborted. In addition, JoinHandle objects hold a reference back to the runtime, so if these are not aborted or joined, then they also prevent the runtime from shutting down. The FD tracking code was left in but feature gated behind feature `fd-track`. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes runtime/session lifetime management (TLS runtime refs, shutdown behavior, and session child ownership), which can affect task cancellation and runtime teardown across the library. > > Overview > Fixes intermittent file-descriptor leaks by breaking ownership cycles between `XetSession` and child upload/download objects and by ensuring `XetRuntime` can actually drop/shutdown when the last external reference is released. > > Adds an opt-in `fd-track` feature with lightweight FD counting/scoped tracing, plus new leak-focused tests, and tightens local CAS DB/shard manager caching to avoid duplicate `redb` opens (canonicalized paths, weak cached handles, and cleanup on drop). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `041426e73e`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-02 18:28:26 -07:00
Pauline Bailly-Masson	2659c69892	🔒 Pin GitHub Actions to commit SHAs (#772 ) ## 🔒 Pin GitHub Actions to commit SHAs This PR pins all GitHub Actions to their exact commit SHA instead of mutable tags or branch names. Why? Pinning to a SHA prevents supply chain attacks where a tag (e.g. `v4`) could be moved to point to malicious code. ### Changes \| Workflow \| Action \| Avant \| Après \| SHA \| \|---\|---\|---\|---\|---\| \| `hf-xet-tests.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `hf-xet-tests.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `hf-xet-tests.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `hf-xet-tests.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/setup-python` \| `v6` \| `v6` \| `a309ff8b426b…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| \| `release.yml` \| `actions/attest-build-provenance` \| `v3` \| `v3` \| `977bb373ede9…` \| \| `release.yml` \| `PyO3/maturin-action` \| `v1` \| `v1` \| `04ac600d27cd…` \| \| `release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `stable` \| `nightly` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `bnjbvr/cargo-machete` \| `main` \| `main` \| `b81ce1560c5f…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `ci.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `ci.yml` \| `dtolnay/rust-toolchain` \| `nightly` \| `nightly` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `lando/code-sign-action` \| `v3` \| `v3` \| `a5703d3b5486…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `dtolnay/rust-toolchain` \| `1.89.0` \| `1.94.1` \| `3c5f7ea28cd6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/upload-artifact` \| `v6` \| `v6` \| `b7c566a772e6…` \| \| `git-xet-release.yml` \| `actions/checkout` \| `v6` \| `v6.0.2` \| `de0fac2e4500…` \| \| `git-xet-release.yml` \| `actions/download-artifact` \| `v7` \| `v7` \| `37930b1c2aba…` \| > 🤖 Generated by `/github-actions-audit` — [security/pin-actions-to-sha] Closes huggingface/tracking-issues#291 Co-authored-by: di <di@huggingface.co>	2026-04-02 11:23:49 -07:00
Di Xiao	1f0918c33e	Refactor XetSession commit / group CAS endpoint and auth configuration (#771 ) There's no publicly documented Xet CAS endpoint. To interact with Xet CAS, all public clients need to obtain a CAS endpoint from the same route to obtain a CAS token. Currently users need to 1. first construct a CAS token URL with respect to a certain operation ("read" or "write", targeted repo type, targeted repo, targeted revision), 2. send a request to this URL to get a CAS token and CAS endpoint, 3. use the CAS endpoint to build a `XetSession`, 4. use the `XetSession` instance and the CAS token and CAS token URL to build an upload or download group. This is a rather completed setup. This PR address this blocker by eagerly "refresh"-ing the CAS token if no CAS endpoint is provided, thus users can 1. build a `XetSession`, 2. construct a CAS token URL with respect to a certain operation ("read" or "write", targeted repo type, targeted repo, targeted revision), 3. use the `XetSession` instance and the CAS token URL to build an upload or download group. So effectively, there will be two common patterns: Pattern A: endpoint known ahead of time — no eager refresh, token_info is used as-is ``` let session = XetSessionBuilder::new().build()?; let commit = session .new_upload_commit()? .with_endpoint(cas_url) .with_token_info(token, expiry) .with_token_refresh_url(refresh_url, /Auth headers/) .build_blocking()?; ``` Pattern B: endpoint unknown — build call fetches it; token_info seeded from response ``` let session = XetSessionBuilder::new().build()?; let commit = session .new_upload_commit()? .with_token_refresh_url(token_refresh_url, /Auth headers/) .build_blocking()?; ``` Other changes: 1. `with_endpoint()` and `with_custom_headers()` configuration is moved from the `XetSession` level down to the operation level, because we can actually have multiple operations with different CAS endpoints co-exist in the same session instance. 2. Builder for different operations `XetUploadCommit`, `XetFileDownloadGroup`, `XetDownloadStreamGroup` are refactored to share common code under `struct AuthGroupBuilder<G>`.	2026-04-02 11:07:07 -07:00
Assaf Vayner	20198a9081	Remove prometheus dependency and metrics (#769 ) ## Summary - Remove the `prometheus` crate dependency from the workspace and `xet_data` - Delete `prometheus_metrics.rs` which defined 3 IntCounter metrics (CAS bytes produced, bytes cleaned, bytes smudged) - Remove metric increment calls from `file_upload_session.rs` and `file_download_session.rs` - Fix Windows CI flake: redb "Database already open" error in `test_single_large` These metrics were collected but never exposed via any HTTP endpoint or text encoder, making them effectively dead code. ## Test plan - [x] `cargo +nightly fmt` — clean - [x] `cargo clippy --all-targets` — no new warnings - [x] `cargo test -p xet-data` — 17/17 pass - [x] `cargo test -p xet-data --features simulation --test test_clean_smudge` — 14/14 pass (including `test_single_large`) - [x] WASM builds (`hf_xet_wasm`, `hf_xet_thin_wasm`) — both succeed <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: this removes unused Prometheus metrics plumbing and related dependencies without changing the core upload/download logic. Main risk is loss of any downstream reliance on these counters at build time (e.g., feature flags or imports). > > Overview > Removes the `prometheus` dependency from the workspace and `xet_data`, and updates lockfiles accordingly (including WASM-related lockfiles). > > Deletes `xet_data`’s `prometheus_metrics` module and strips the associated counter increments from `FileUploadSession` and `FileDownloadSession`, leaving the data processing behavior unchanged aside from no longer recording these metrics. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `c6c866b7ca`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-01 14:56:58 -07:00
Adrien	3d377bdffb	feat: optional chunk cache in download path for cross-file dedup (#731 ) ## Context These changes support the hf-mount project, which needs cross-file chunk deduplication during downloads. ## Summary - Adds an optional `ChunkCache` to the download path (`FileDownloadSession`, `FileReconstructor`, `XorbBlock`). When provided, xorb blocks are looked up in cache before HTTP requests and stored after download. - Cache hits skip permit acquisition, so they don't consume network concurrency slots. This enables cross-file deduplication for mount-style workloads. - Breaking change to `FileDownloadSession::new()` and `from_client()` signatures (new `chunk_cache: Option<Arc<dyn ChunkCache>>` parameter). All existing callers pass `None`. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches the core download/reconstruction path and changes session constructor signatures; cache-hit/miss behavior affects concurrency permits and progress reporting. Risk is mitigated by being opt-in (`None` for existing callers) but incorrect cache keys or offsets could corrupt reconstructed output or skew progress. > > Overview > Adds an optional `ChunkCache` to the download pipeline to enable cross-file xorb/chunk dedup during reconstruction. > > `FileDownloadSession` now accepts/stores `chunk_cache` and wires it into `FileReconstructor`, which passes it down into `FileTerm`/`XorbBlock` retrieval. `XorbBlock::retrieve_data` now checks the cache before acquiring CAS download permits (so cache hits avoid consuming network concurrency), and writes downloaded blocks back to the cache asynchronously on a best-effort basis (logging failures). > > This also introduces a small refactor (`build_chunk_offsets`) and updates all call sites/tests/examples to the new `FileDownloadSession::new(..., chunk_cache)` / `from_client(..., chunk_cache)` signatures (currently passing `None`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `f4fdea5175`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-01 17:29:28 +02:00
Hoyt Koepke	3051478cdd	Allow shard expiration to be set on global dedup queries for GC simulation (#762 ) Currently, simulation global dedup shard queries return full shard bytes with no configurable shard footer expiration, and simulation control knobs are split between partially implemented paths. This PR adds global dedup shard expiration control to simulation clients and servers, and extends /simulation/set_config to cover shard expiration, max range splitting, V2 reconstruction disabling, API delay, and URL expiration in one path. This enables rapid simulation of the GC paths by setting the global dedup expiration to a sub-epoch value. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches simulation client/server APIs and shard serialization behavior (including new trait methods and HTTP knobs), so downstream implementors and tests may break if not updated. Changes are scoped to simulation/GC tooling paths but affect how global-dedup shard bytes are produced and validated. > > Overview > Adds a new simulation control to set global-dedup shard expiration: `DirectAccessClient::set_global_dedup_shard_expiration` now makes `query_for_global_dedup_shard` optionally return minimal shard bytes (file section stripped) with `shard_key_expiry = now + expiration` (sub-second durations round up). > > Extends `MDBMinimalShard` serialization with `serialize_xorb_subset_with_expiry` to write an optional `shard_key_expiry` footer, and updates `LocalClient`/`MemoryClient` to use it when expiration is enabled. > > Unifies and expands runtime simulation knobs under `/simulation/set_config` (global dedup expiration, max ranges per fetch, disable V2 reconstruction, API delay, URL expiration) and updates `SimulationControlClient` to apply them via a retried async POST. Also moves integrity/reachability checks to `DeletionControlableClient`, adds `verify_all_reachable`, and wires new `/simulation/verify_all_reachable` with 501 behavior when no deletion client is configured. > > Separately, introduces simulation-only xorb cut thresholds (`XORB_CUT_THRESHOLD_*`) driven by new `xet_runtime` xorb config overrides, and updates upload/dedup code paths to use these thresholds. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `42bd9c3f4f`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 18:35:19 -07:00
Assaf Vayner	7d97aa3066	Replace heed (LMDB) with redb in local CAS simulation (#766 ) This is an optional change. basically heed imports a bunch of deps and it's also using lmdb that may require more compilation/linking steps in tests. we use it for such a small subset of operations in testing I thought we might try an even thinner rust-native dep instead. that's what redb is. ## Summary - Replace `heed` (C LMDB bindings) with `redb` (pure Rust embedded KV store) in `LocalClient` - Removes C dependency, `unsafe` block, Windows retry workaround, and custom `Drop` impl - Introduces `RedbHash` newtype wrapper for `MerkleHash` to satisfy orphan rules on redb's `Key`/`Value` traits - Net reduction of ~130 lines; all 147 existing tests pass ## Test plan - [x] `cargo check -p xet-client --features simulation` — clean - [x] `cargo test -p xet-client --features simulation` — 147 passed, 0 failed - [x] `cargo clippy -p xet-client --features simulation` — clean - [x] `cargo +nightly fmt` — clean <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Swaps the embedded KV store used for shard dedup/deletion metadata in the local CAS simulation, which can affect test behavior and on-disk state/locking semantics (especially with concurrent clients). Scope is contained to simulation/test code and dependency graph changes. > > Overview > Switches `LocalClient`’s disk-backed global-dedup and file deletion status storage from `heed`/LMDB to `redb`, including new `RedbHash` serialization, `TableDefinition`s, and updated read/write transaction flows. > > Adds a small global database-handle cache to avoid `redb` exclusive-lock conflicts across multiple `LocalClient` instances, and removes the prior LMDB-specific open/retry logic and custom `Drop` close path. Workspace dependencies/lockfiles are updated to drop `heed`/LMDB-related crates and add `redb`, and `.gitignore` now ignores `.worktrees/`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `02d39864d9`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 15:23:18 -07:00
Assaf Vayner	86935b4117	Move test-only deps to dev-dependencies in git_xet (#767 ) ## Summary - Move `russh`, `rand_core`, and `tempfile` from regular dependencies to dev-dependencies in `git_xet`, since they are only used in test code - `russh` and `rand_core` are also declared as optional regular deps activated by the `git-xet-for-integration-test` feature flag, since the integration test SSH server is compiled into the library under that feature - Gate `test_utils/ssh_server` module and related exports behind `#[cfg(any(test, feature = "git-xet-for-integration-test"))]` - Gate `tests/test_ssh.rs` integration test file behind `#![cfg(feature = "git-xet-for-integration-test")]` ## Test plan - [x] `cargo check -p git_xet` passes (no features) - [x] `cargo test -p git_xet --no-run` passes (no features) - [x] `cargo test -p git_xet --features git-xet-for-integration-test --no-run` passes <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: primarily Cargo dependency/feature and `cfg` gating changes, with no production logic changes; risk is limited to build/test configuration and feature-flagged integration test coverage. > > Overview > Reduces default build dependencies for `git_xet`. Moves `russh`, `rand_core`, and `tempfile` into `dev-dependencies`, and keeps `russh`/`rand_core` available as optional deps enabled only by the `git-xet-for-integration-test` feature. > > Gates SSH test helpers and integration tests behind a feature flag. Exposes `GitLFSAuthenticateResponse*` and the local SSH test server only under `#[cfg(test)]` or `feature = "git-xet-for-integration-test"`, and makes `tests/test_ssh.rs` compile only when that feature is enabled. > > Separately, cleans up workspace manifests/lockfiles by moving some crates (`half`, `regex`, `futures-util`) to dev-deps where they’re only needed for tests/benches, and adds `.worktrees/` to `.gitignore`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `cdc30a5a8f`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-31 13:31:20 -07:00
Di Xiao	126f30b981	Prevent multiple XetSession (XetRuntime) attach to the same external tokio runtime (#757 ) When `XetRuntime` wraps an external tokio handle (External mode), it registers the handle ID in `EXTERNAL_RUNTIME_REGISTRY` so `XetRuntime::current()` can look up the correct instance from any task, and thus obtain the associated `XetConfig`. Previously, a second `from_external_with_config` call with the same handle would silently overwrite the registry entry, breaking the first runtime's `current()` lookups. As a result, tasks spawned off the first `XetRuntime` can no longer access their specific `XetRuntime` and its configs, and this is not expected behavior. This PR makes the second call fail with an explicit error instead. This PR checks if `EXTERNAL_RUNTIME_REGISTRY` already contains an entry with key being the Id of the tokio runtime Id it tries to attach to, and returns `RuntimeError::InvalidRuntime` error if it does to prevent the above issue.	2026-03-30 11:36:02 -07:00
Hoyt Koepke	29acd7a981	Fix for download streams swallowing errors into generic "Channel closed" message. (#765 ) Previously, when an error happens, the channel stream can close before the error gets propagated to the user-facing iterators; when this happens, it's random whether the channel closed error or the original error gets surfaced. This PR ensures that the actual error causing the shutdown gets surfaced to the user. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adjusts error handling across async/blocked stream consumption and the sequential writer thread, affecting concurrency and shutdown paths. Risk is moderate due to potential behavior changes in edge cases when channels close during failures. > > Overview > Prevents download streaming APIs from masking reconstruction failures as generic "channel closed" errors. > > When a per-chunk `oneshot` receiver is dropped/closed, `DownloadStream::{next,blocking_next}` and the sequential writer thread (`SyncWriterThread::next_write`) now first call `run_state.check_error()` to surface the actual underlying error before falling back to an internal writer error. > > Wires `RunState` into `SyncWriterThread` so the background writer path can perform the same error propagation check. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `e33b30f076`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-30 10:07:44 -07:00
Hoyt Koepke	a81fc5800a	Rerun hub tests on failures (#761 ) Occasionally it seems the tests directly against hub can fail due to intermittent issues. This PR causes these tests to be run up to two additional times in the CI on failure, passing if they succeed on subsequent attempts.	2026-03-30 09:59:11 -07:00
Adrien	f781498b68	fix: truncate local file on full-file download to prevent corruption (#764 ) ## Summary - Fixes a data corruption bug where downloading a file smaller than an existing local file left stale trailing bytes intact - The file was opened with `truncate(false)` unconditionally (needed for concurrent partial-range writes), but full-file downloads now use `truncate(true)` - Adds regression test `test_full_file_truncates_larger_existing_file` Ref: https://github.com/huggingface/huggingface_hub/issues/3995 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Changes on-disk write semantics for reconstructed downloads by optionally truncating the destination file, which affects data integrity and could impact concurrent/partial-write callers if misused. > > Overview > Fixes a corruption case where full-file downloads could leave stale trailing bytes when writing over an existing larger file by adding a `truncate_file` flag to `FileReconstructor::reconstruct_to_file` and wiring it to `OpenOptions::truncate()`. > > Updates full-file download flow (`FileDownloadSession::download_file_with_id`) to pass `truncate_file=true`, while keeping benchmarks/tests and range/concurrent write paths passing `false` to preserve existing behavior for partial/concurrent writes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `ed33dab9a1`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: di <di@huggingface.co>	2026-03-30 17:57:23 +02:00
Di Xiao	15011cb230	XetSession uses direct token refresh route instead of a callback (#751 ) This PR makes two significant, breaking API redesign: 1. Auth tokens move from session-level (shared by all operations) to per-operation level (per `UploadCommit`, `FileDownloadGroup`, and `DownloadStreamGroup`). This enables uploads and downloads from the same session to carry different access-level tokens — a sensible design for HF's write-vs-read token split. 2. Instead of letting users provide a callback to refresh tokens, this new API now let users provide a token refresh URL and access credential in an HTTP header map. ### Why 1. CAS JWT have short life, but `XetSession` is intended to be held long time -- thus it makes more sense to configure CAS auth on the operation level (`UploadCommit` or `FileDownloadGroup` or `DownloadStreamGroup`) and it will be discarded once the operation is done. 2. For different access level (write vs. read) and different operation target (repo and commit), CAS JWT token will be different and the token refresh URL will be different. `UploadCommit` and `FileDownloadGroup` and `DownloadStreamGroup` they each also function as a single auth group. 3. Providing an URL is considered easier than writing a callback, and is more safe when crossing the GIL Python - Rust boundary. Examples: ``` // Upload token (write access) let mut upload_headers = HeaderMap::new(); upload_headers.insert("Authorization", "Bearer hub-write-token".parse().unwrap()); let commit = session .new_upload_commit()? .with_token_info("CAS_WRITE_JWT", 900) .with_token_refresh_url("https://huggingface.co/api/repos/token/write", upload_headers) .build_blocking()?; ``` ``` // File download token (read access) let mut dl_headers = HeaderMap::new(); dl_headers.insert("Authorization", "Bearer hub-read-token".parse().unwrap()); let group = session .new_file_download_group()? .with_token_info("CAS_READ_JWT", 900) .with_token_refresh_url("https://huggingface.co/api/repos/token/read", dl_headers) .build_blocking()?; ``` Secondary changes include: - `DirectRefreshRouteTokenRefresher` consolidated into `xet_client::cas_client::auth`. - HTTP client module moved from `cas_client` to `xet_client::common` for shared use between `xet_client::cas_client` and `xet_client::hub_client`. - New `DownloadStreamGroup` type (streaming downloads moved off `XetSession`). - Fix Session ID type regression: this was fixed once in https://github.com/huggingface/xet-core/pull/738 but regressed again, seems AI agents don't learn. - HTTP client cache key now incorporates custom headers	2026-03-30 08:39:25 -07:00
Hoyt Koepke	051dee52ea	WASM wrappers for MerkleHashSubtree (#758 ) This PR exposes the new MerkleHashSubtree class for managing groups of hashes to WASM.	2026-03-30 08:09:14 -07:00
Assaf Vayner	9c0cb6e4c8	Reduce workspace dependencies (batches 1-3) (#746 ) ## Summary - Remove unused dependencies: warp (zero imports), paste (zero invocations), tower-service (zero imports), and heed misplacement in xet_core_structures - Move mockall to dev-dependencies in xet_client by gating `#[automock]` with `#[cfg_attr(test, automock)]` - Feature-gate simulation module behind `simulation` cargo feature in xet_client, making axum, heed, humantime, futures-util, human-bandwidth, and tower-http optional - Replace duration-str with humantime (~2 deps vs ~78 transitive deps) across xet_runtime, xet_client simulation, and simulation crate ## Impact \| Metric \| Before \| After \| Change \| \|---\|---\|---\|---\| \| hf-xet production deps \| 371 \| 321 \| -50 \| \| Workspace total \| 575 \| 569 \| -6 \| ## Test plan - [x] `cargo check --workspace` passes - [x] `cargo check -p hf-xet` passes (without simulation feature — key validation) - [x] `cargo test --workspace` — all tests pass (4 pre-existing auth test failures in git_xet unrelated to this PR) - [x] `cargo tree -p hf-xet -e normal --prefix none \| sort -u \| wc -l` confirms 321 deps 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Medium risk because it changes dependency graph and Cargo feature gating (notably `xet-client` simulation modules and CI test features), which can affect build/test behavior across targets despite minimal runtime logic changes. > > Overview > Reduces workspace dependency surface by removing `duration-str` (replaced with `humantime`) and trimming other transitive-heavy crates; updates lockfiles accordingly across the workspace, `hf_xet`, and WASM builds. > > Introduces/propagates a `simulation` Cargo feature: `xet-client`’s simulation server-related deps become optional and are only compiled/exported when `feature = "simulation"` is enabled; `git_xet` adds a `simulation` feature that forwards to dependent crates, and CI now runs tests with `strict simulation git-xet-for-integration-test`. > > Minor repo hygiene updates include ignoring `.claude/` in `.gitignore` and wiring the `simulation` crate to depend on `xet-client` with `features = ["simulation"]` (plus swapping its duration parsing helper to `humantime`). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `6abc194398`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 09:54:36 -07:00
Hoyt Koepke	69962587b5	Composable Hash Functionality (#745 ) Currently, computing aggregate chunk hashes across independently processed ranges requires recomputing over the full concatenated chunk list. This PR introduces ChunkHashRange, a composable representation that can hash contiguous partial ranges and merge them while preserving equivalence with the existing xorb_hash / file_hash behavior. This allows an intermediate representation of the hash ranges that can be merged in arbitrary order to get the final hash. It also uses O(log(n)) storage and all operations are done in linear time. Serialization and Deserialization are fully supported. The main use case for this is in doing partial file edits. Previously, to edit the middle of a large file, the client would have to know all the hashes for the full file, even if only a few in the middle were changed. With a large file, this can still be 100s of MB; the chunk metadata size is roughly 1/1000 of the data size. With this change, we can now transmit the unmodified parts of a file in O(log(n)) storage but still be able to build the entire function hash; now a sequence of 10M chunks takes the equivalent storage of ~500 chunks or so. Along the way, we also added in an optimization for the merge step to avoid an allocation, yielding a 2x speedup. --------- Co-authored-by: Hoyt Koepke <hoytak@xethub.com>	2026-03-27 08:38:59 -07:00
Hoyt Koepke	c90f0a7bd9	Session API Polish; unify task handling/cancellation behavior. (#747 ) Previously, upload and download paths each had their own ad-hoc state tracking, cancellation, and runtime bridging logic. TaskRuntime consolidates this into a single type that owns a CancellationToken tree, tracks Running/Finished/Cancelled state with recursive propagation to children, and provides bridge_async/bridge_sync wrappers that automatically wire up tokio::select! cancellation. Session → commit/group → per-file handles form a parent-child token tree, so aborting a session cancels all descendant work. The upload path gets new UploadFileHandle and UploadStreamHandle wrapper types (replacing the old UploadTaskHandle), with inner/wrapper pattern for cheap cloning. UploadCommit::commit() now returns a CommitReport containing aggregate dedup metrics, progress, and per-file FileMetadata. The download path mirrors this structure: FileDownloadGroup uses TaskRuntime for state gating and owns bespoke DownloadTaskHandle instances with per-task status and result access. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > High Risk > High risk due to a breaking redesign of the public `xet_session` API (new handle/report types and renamed methods) plus new cancellation/state machinery that changes how uploads/downloads are coordinated and terminated. > > Overview > Redesigns `xet_pkg::xet_session` around a new hierarchical `TaskRuntime` (using `tokio-util` cancellation tokens) to unify state, bridging, and cancellation across session → commit/group → per-file handles. > > Replaces the old task-handle/result model (`tasks.rs`, `UploadResult`/`DownloadResult`, `TaskStatus`, group/session state enums) with explicit handle/report types: `XetFileUpload`, `XetStreamUpload`, `XetFileDownload`, `XetCommitReport`, and `XetDownloadGroupReport`, and standardizes task state via `XetTaskState`. > > Adjusts APIs and error semantics: `commit()` now returns an aggregate report (dedup metrics + progress + per-file metadata) and no longer consumes `self`; progress methods become infallible (`progress()`); cancellations/errors are consolidated (`AlreadyCompleted`, `UserCancelled`, `KeyboardInterrupt`, `TaskError`/`PreviousTaskError`) with updated Python exception mapping. `xet_data` now returns per-file `DeduplicationMetrics` from upload tasks and adds a zero-copy `SingleFileCleaner::add_data_from_bytes`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `153a3ebbbe`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-27 07:54:37 -07:00
Di Xiao	b3edd92a10	Split out cargo bench compile check (#753 ) Acknowledged that running "cargo bench --no-run" on every test platform is slow. This PR - extracts benchmark compilation verification from the Linux and macOS build_and_test jobs into a dedicated `check-bench-compiles` job so it runs in parallel with the cargo test jobs; - also skips compiling "git_xet" in release mode which itself doesn't contain benchmarks and takes the longest to compile due to optimized linking; - also removes unused clippy component installs from Windows and macOS toolchain setup. See below that the `check-bench-compiles` job finishes faster than `build_and_test-linux` and `build_and_test-win`, so it's not introducing extra wait time.	2026-03-25 22:25:20 -07:00

1 2 3 4 5 ...

571 Commits