CI for hf-hub is running cargo audit and found many issues through
hf-xet transitive deps. this PR attempts to solve some of them (not
necessarily all of them).
Main changes:
- dropped derivative and reqwest-retry
- replaced bincode with postcard, only used in testing
- upgrade xet-core rand usage
- added audit CI step and ignoring some issues that we can't easily fix.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk because it removes `reqwest-retry`/`derivative` and
replaces part of the retry classification logic with an in-house
equivalent, which could subtly change HTTP retry behavior; the remaining
changes are dependency/version bumps and test-only serialization swaps.
>
> **Overview**
> Adds a new CI `cargo audit` job and introduces `.cargo/audit.toml` to
ignore a small set of **dev-only** RustSec advisories with documented
rationale.
>
> Reduces audit surface by dropping `derivative` (manual `Debug` impl
for `AuthConfig`) and removing `reqwest-retry`, replacing its
status-code classification with a local `Retryable` enum +
`default_on_request_success` helper in `RetryWrapper`.
>
> Updates workspace deps (notably `rand` to `0.10` and `rand_distr` to
`0.6`) and adjusts call sites to the newer `rand` APIs (`RngExt`
imports, minor test/bench tweaks). Test-only binary serialization
switches from `bincode` to `postcard` (and updates affected tests), with
corresponding lockfile updates across crates.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
26377f4a1c. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
Adds two workflows to automate the crates.io release process, and
refactors the CI WASM job into a reusable composite action.
**Release process** (two separate manual steps):
1. **`bump-crates-version.yml`** (triggered via `workflow_dispatch` with
a `version` input): updates version fields in `Cargo.toml` files, runs
`cargo build` + `cargo test` to validate, builds the `hf-xet` Python
wheel and WASM targets to update related `Cargo.lock` files, then opens
a PR (e.g. `crates-release/1.6.0`). The workflow terminates after PR
creation.
2. **`crates-release.yml`** (triggered manually via `workflow_dispatch`
after the version-bump PR is merged): checks out `main`, authenticates
to crates.io via OIDC Trusted Publishing, and publishes crates in
dependency order with index-propagation delays: `xet-runtime` →
`xet-core-structures` → `xet-client` → `xet-data` → `hf-xet`. Requires
manual approval via the `crates-release` GitHub environment.
**Design notes:**
- Split into two workflows to avoid holding a runner while waiting for
the PR to be reviewed and merged
- Version bump is committed to a PR so the repo always reflects the
published version
- Uses OIDC Trusted Publishing (`rust-lang/crates-io-auth-action`) — no
long-lived secrets required. See
https://crates.io/docs/trusted-publishing
**CI refactor:**
- Extracts the nightly Rust/WASM toolchain setup and `hf_xet*_wasm`
builds into a reusable composite action (`.github/actions/build-wasm`)
- The composite action saves and restores the caller's default toolchain
around the nightly build, so callers are not affected
- Adds post-build porcelain checks in CI to fail if either WASM
`Cargo.lock` has uncommitted changes after building
## One-time manual setup required
Before this workflow can run successfully, complete the following:
### GitHub
- [x] Create a GitHub Environment named **`crates-release`**: repo
Settings → Environments → New environment
- [x] Add **required reviewers** to the `crates-release` environment —
this is the manual approval gate before the `publish` job runs
### crates.io — Trusted Publishing
Each crate must have been published manually at least once before
Trusted Publishing can be configured. For each crate, go to its Settings
page on crates.io → **Trusted Publishing** → **Add**, and fill in:
| Field | Value |
|---|---|
| Owner | `huggingface` |
| Repository | `xet-core` |
| Workflow name | `crates-release.yml` |
| Environment | `crates-release` |
- [x] Configure Trusted Publishing for **`xet-runtime`**
- [x] Configure Trusted Publishing for **`xet-core-structures`**
- [x] Configure Trusted Publishing for **`xet-client`**
- [x] Configure Trusted Publishing for **`xet-data`**
- [x] Configure Trusted Publishing for **`hf-xet`**
## Summary
- **Remove unused dependencies**: warp (zero imports), paste (zero
invocations), tower-service (zero imports), and heed misplacement in
xet_core_structures
- **Move mockall to dev-dependencies** in xet_client by gating
`#[automock]` with `#[cfg_attr(test, automock)]`
- **Feature-gate simulation module** behind `simulation` cargo feature
in xet_client, making axum, heed, humantime, futures-util,
human-bandwidth, and tower-http optional
- **Replace duration-str with humantime** (~2 deps vs ~78 transitive
deps) across xet_runtime, xet_client simulation, and simulation crate
## Impact
| Metric | Before | After | Change |
|---|---|---|---|
| hf-xet production deps | 371 | 321 | **-50** |
| Workspace total | 575 | 569 | -6 |
## Test plan
- [x] `cargo check --workspace` passes
- [x] `cargo check -p hf-xet` passes (without simulation feature — key
validation)
- [x] `cargo test --workspace` — all tests pass (4 pre-existing auth
test failures in git_xet unrelated to this PR)
- [x] `cargo tree -p hf-xet -e normal --prefix none | sort -u | wc -l`
confirms 321 deps
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk because it changes dependency graph and Cargo feature
gating (notably `xet-client` simulation modules and CI test features),
which can affect build/test behavior across targets despite minimal
runtime logic changes.
>
> **Overview**
> Reduces workspace dependency surface by removing `duration-str`
(replaced with `humantime`) and trimming other transitive-heavy crates;
updates lockfiles accordingly across the workspace, `hf_xet`, and WASM
builds.
>
> Introduces/propagates a `simulation` Cargo feature: `xet-client`’s
simulation server-related deps become optional and are only
compiled/exported when `feature = "simulation"` is enabled; `git_xet`
adds a `simulation` feature that forwards to dependent crates, and CI
now runs tests with `strict simulation git-xet-for-integration-test`.
>
> Minor repo hygiene updates include ignoring `.claude/` in `.gitignore`
and wiring the `simulation` crate to depend on `xet-client` with
`features = ["simulation"]` (plus swapping its duration parsing helper
to `humantime`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6abc194398. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Acknowledged that running "cargo bench --no-run" on every test platform
is slow. This PR
- extracts benchmark compilation verification from the Linux and macOS
build_and_test jobs into a dedicated `check-bench-compiles` job so it
runs in parallel with the cargo test jobs;
- also skips compiling "git_xet" in release mode which itself doesn't
contain benchmarks and takes the longest to compile due to optimized
linking;
- also removes unused clippy component installs from Windows and macOS
toolchain setup.
See below that the `check-bench-compiles` job finishes faster than
`build_and_test-linux` and `build_and_test-win`, so it's not introducing
extra wait time.
The last repo restructuring didn't update several bench code that are
not compiled by default as part of "cargo build". This PR fixes those
compilation errors and warning, and adds "cargo bench --no-run" to CI
which checks compilation but doesn't actually run benchmarks.
This PR is a massive rearrangement of the code base into 5 packages
intended for release on cargo. The directories and corresponding
packages are:
1. xet_runtime/ — compiles into the xet-runtime package. Contains the
runtime, config, and logging management.
2. xet_core_structures/ — compiles into the xet-core-structures package.
Contains core data structures for hashing, shards, and xorbs as well as
internal data structures that depend on these.
3. xet_client/ — compiles into the xet-client package, contains client
code for remotely connecting to the Hugging Face servers.
4. xet_data/ — compiles into the xet-data package, contains the data
processing pipeline: chunking/deduplication, file reconstruction,
clean/smudge operations, and progress tracking.
5. xet_pkg/ — compiles into the hf-xet package, provides the top-level
session-based API for file upload and download with user-facing error
categorization. This is the primary package downstream dependencies
would use. This also contains a single summary error type, XetError,
that translates cleanly into python error types.
In addition, the other tools are:
- git_xet/ — the git_xet CLI binary crate (location preserved).
- hf_xet/ -- the hf_xet python package (location preserved).
- simulation/ — the simulation crate for upload scenario benchmarking.
- wasm/ -- the wasm objects.
The full description — and information for an AI agent to use to update
downstream dependencies — is at
api_changes/update_260309_package_restructure.md.
Summary of moves:
- xet_runtime: became xet_runtime::core inside xet_runtime/.
- utils: became xet_runtime::utils inside xet_runtime/.
- xet_config: became xet_runtime::config inside xet_runtime/.
- xet_logging: became xet_runtime::logging inside xet_runtime/.
- error_printer: became xet_runtime::error_printer inside xet_runtime/.
- file_utils: became xet_runtime::file_utils inside xet_runtime/.
- merklehash: became xet_core_structures::merklehash inside
xet_core_structures/.
- mdb_shard: became xet_core_structures::metadata_shard inside
xet_core_structures/.
- xorb_object: became xet_core_structures::xorb_object inside
xet_core_structures/.
- cas_client: became xet_client::cas_client inside xet_client/.
- hub_client: became xet_client::hub_client inside xet_client/.
- cas_types: became xet_client::cas_types inside xet_client/.
- chunk_cache: became xet_client::chunk_cache inside xet_client/.
- data: became xet_data::processing inside xet_data/.
- deduplication: became xet_data::deduplication inside xet_data/.
- file_reconstruction: became xet_data::file_reconstruction inside
xet_data/.
- progress_tracking: became xet_data::progress_tracking inside
xet_data/.
- xet_session: became xet::xet_session inside xet_pkg/.
- Wasm packages (hf_xet_wasm, hf_xet_thin_wasm): moved from top-level
into wasm/; internal imports updated, public APIs unchanged.
Adding support for setting an optional `request_header` map on the
hf_xet upload and download API calls. This map is augmented with the
hf_xet user agent string and is passed along with the requests to
xetcas.
This PR also adds some unit tests for testing the map merging behavior
to `hf_xet/lib.rs` and adds support for running these with cargo test
and in github actions CI step.
## Summary
This PR upgrades GitHub Actions to their latest versions for Node.js 24
compatibility and security updates.
## Changes
| Action | Old Version(s) | New Version | Files |
|--------|---------------|-------------|-------|
| actions/attest-build-provenance | v1 | v3 | release.yml |
## Why these changes?
- Keeps actions up to date with latest stable releases
- Updated actions include security fixes and new features
## Testing
These changes only update action versions and don't modify workflow
logic.
---------
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
- Remove dependencies from Cargo.toml files that are not used.
- Move dependencies directly referencing crates.io from crate level
Cargo.toml to the workspace Cargo.toml.
- Fix using RemoteClient in WASM: AdaptiveConcurrencyController uses
`tokio::time::Instant` which wraps `std::time::Instant` and is not
available in WASM.
- Add [cargo-machete](https://github.com/bnjbvr/cargo-machete) to CI to
check unused dependencies.
No functionality change.
This PR builds on top of
https://github.com/huggingface/xet-core/pull/565 and builds an
integration test to test access to "ssh" and "sh" on Windows through the
"git" (-> "git-lfs") -> "git-xet" call chain.
Out of all the ssh variants, access to programs like "plink", "putty",
"tortoiseplink" or "simple" should be given by the env var
`$GIT_SSH_COMMAND` or `$GIT_SSH`, or by git config entry
`core.sshCommand`. Direct access to the mostly used utility "ssh" and
in-direct access to "ssh" via "sh -c" on Windows is provided by the
"git" (-> "git-lfs") -> "git-xet" call chain, see
git_xet/tests/test_ssh.rs for details.
fix XET-681
XET protocol specification initial draft
- documentation of core procedures required for file uploads and
downloads
- format specifications for shards and xorbs
- Upgrade Rust edition and rustc version to bring in some nice features,
e.g. let chains instead of nested if block.
- Fix clippy and format due to the upgrade.
- Fix a bug identified by the new rustc:
6cb0a7fb4e/xet_runtime/src/runtime.rs (L195)
```
#[cfg(not(target_family = "wasm"))]
{
// A new multithreaded runtime with a capped number of threads
TokioRuntimeBuilder::new_multi_thread().worker_threads(get_num_tokio_worker_threads())
}
```
here the end curly bracket drops the temporary builder while a `&mut
Self` to the dropped value is returned. (this may be due to a difference
between compilers regarding how they treat the scope of "{...}" of
`#[cfg(...))] {...}`?)
This PR builds a Git integration called `git-xet` that enables users to
upload files using the Xet protocol as part of a standard git push.
This integration builds on the Git LFS custom transfer adapter protocol,
the same mechanism we now use to handle Git LFS uploads for files larger
than 5 GB through multipart PUT.
To enable uploads to Xet, users run `git-xet install`, which writes the
following configuration to the Git config file at a selected scope
[`--system`, `--global` (default), or `--local`]:
```
[lfs "customtransfer.xet"]
path = git-xet
args = transfer
concurrent = true
```
This setup registers a new transfer adapter named xet, allowing Git to
delegate LFS file transfers to the git-xet binary when applicable.
On the server side, support is rolled out in two stages:
Stage 1 (Upload): The Git LFS batch API for the "upload" operation is
updated.
- If a repo is Xet enabled but users didn't run git-xet install,
moon-landing rejects the request when users initiated git push and
returns an instruction to install git-xet.
- If a repo is Xet enabled and users have git-xet configured correctly,
moon-landing accepts the request and replies with CAS server URL and
access token, which git-xet will use to upload files to Xet.
- If a repo is NOT Xet enabled, upload goes through the LFS path.
Updates paths used by the clients to use latest CAS paths as defined in
the spec.
All paths now use plural nouns and shard upload no longer uses the hash,
removes the prefix and hash from the client trait upload_shard function.
This PR adds an explicit lint command on the hf_xet directory. This is
necessary because it is excluded from the workspace. Other excluded
directories aren't touched very often and are less important for now.
Re-adding a thin wasm crate for JS client development.
checks build in ci job build_and_test-wasm
only includes a wrapper over a chunker and function to compute xorb hash
at the moment.
This implements uploading through Xet protocol in WASM environment, and
makes necessary changes to make dependent crates WASM compatible.
1. Uploading through Xet protocol is done in hf_xet_wasm crate;
2. Separate Cas Client trait definitions into upload and download
functionality groups and disable download for WASM;
3. Disable Cas Client request retry in WASM environment, which isn't
critical for a POC (until we have a retry strategy that doesn't depends
on time);
4. Disable async CasObject deserialization;
5. Enable in-memory global dedup;
---------
Co-authored-by: Assaf Vayner <assaf@huggingface.co>
We keep having out of date hf_xet/Cargo.lock, likely people are not
building hf_xet 100% of the time they are pushing to the repo. This PR
enforces that hf_xet/Cargo.lock and the root Cargo.lock must be up to
date, a CI job will fail if this is not true.
* Adds a job to run tests on windows in addition to linux.
* Fixes linting for windows builds
* Fixes the LocalClient that tries to set files as read-only, which,
during tests, will break on windows due to file deletion behaving
differently than on Linux.
* Identified an issue with the disk cache deleting items if there are
simultaneous `put`s to the cache for the same key, range. This is fine
on unix, but on windows, causes errors (again, due to file deletion
behavior differences). This change mitigates the issue, allowing
huggingface_hub tests to pass on windows, but opens up another issue of
us needing to vet our filesystem deletes (e.g. cache eviction) for
correctness on windows.