CI for hf-hub is running cargo audit and found many issues through
hf-xet transitive deps. this PR attempts to solve some of them (not
necessarily all of them).
Main changes:
- dropped derivative and reqwest-retry
- replaced bincode with postcard, only used in testing
- upgrade xet-core rand usage
- added audit CI step and ignoring some issues that we can't easily fix.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk because it removes `reqwest-retry`/`derivative` and
replaces part of the retry classification logic with an in-house
equivalent, which could subtly change HTTP retry behavior; the remaining
changes are dependency/version bumps and test-only serialization swaps.
>
> **Overview**
> Adds a new CI `cargo audit` job and introduces `.cargo/audit.toml` to
ignore a small set of **dev-only** RustSec advisories with documented
rationale.
>
> Reduces audit surface by dropping `derivative` (manual `Debug` impl
for `AuthConfig`) and removing `reqwest-retry`, replacing its
status-code classification with a local `Retryable` enum +
`default_on_request_success` helper in `RetryWrapper`.
>
> Updates workspace deps (notably `rand` to `0.10` and `rand_distr` to
`0.6`) and adjusts call sites to the newer `rand` APIs (`RngExt`
imports, minor test/bench tweaks). Test-only binary serialization
switches from `bincode` to `postcard` (and updates affected tests), with
corresponding lockfile updates across crates.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
26377f4a1c. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
- Bump workspace version from 1.5.0 to 1.5.1
- Update all internal dependency version references to match
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Low risk version-only bump across workspace manifests and lockfiles
with no code/behavior changes in the diff.
>
> **Overview**
> Bumps the workspace package version from `1.5.0` to `1.5.1` and aligns
internal crate dependency version pins (`xet-runtime`,
`xet-core-structures`, `xet-client`, `xet-data`, `hf-xet`) to match.
>
> Updates lockfiles (`Cargo.lock` plus `hf_xet` and wasm lockfiles) so
published/embedded artifacts resolve to the `1.5.1` crate set (including
bringing wasm lockfiles up to `1.5.1`).
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
e8563700a0. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
- Remove the `prometheus` crate dependency from the workspace and
`xet_data`
- Delete `prometheus_metrics.rs` which defined 3 IntCounter metrics (CAS
bytes produced, bytes cleaned, bytes smudged)
- Remove metric increment calls from `file_upload_session.rs` and
`file_download_session.rs`
- Fix Windows CI flake: redb "Database already open" error in
`test_single_large`
These metrics were collected but never exposed via any HTTP endpoint or
text encoder, making them effectively dead code.
## Test plan
- [x] `cargo +nightly fmt` — clean
- [x] `cargo clippy --all-targets` — no new warnings
- [x] `cargo test -p xet-data` — 17/17 pass
- [x] `cargo test -p xet-data --features simulation --test
test_clean_smudge` — 14/14 pass (including `test_single_large`)
- [x] WASM builds (`hf_xet_wasm`, `hf_xet_thin_wasm`) — both succeed
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Low risk: this removes unused Prometheus metrics plumbing and related
dependencies without changing the core upload/download logic. Main risk
is loss of any downstream reliance on these counters at build time
(e.g., feature flags or imports).
>
> **Overview**
> Removes the `prometheus` dependency from the workspace and `xet_data`,
and updates lockfiles accordingly (including WASM-related lockfiles).
>
> Deletes `xet_data`’s `prometheus_metrics` module and strips the
associated counter increments from `FileUploadSession` and
`FileDownloadSession`, leaving the data processing behavior unchanged
aside from no longer recording these metrics.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c6c866b7ca. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Currently, simulation global dedup shard queries return full shard bytes
with no configurable shard footer expiration, and simulation control
knobs are split between partially implemented paths. This PR adds global
dedup shard expiration control to simulation clients and servers, and
extends /simulation/set_config to cover shard expiration, max range
splitting, V2 reconstruction disabling, API delay, and URL expiration in
one path. This enables rapid simulation of the GC paths by setting the
global dedup expiration to a sub-epoch value.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Touches simulation client/server APIs and shard serialization behavior
(including new trait methods and HTTP knobs), so downstream implementors
and tests may break if not updated. Changes are scoped to simulation/GC
tooling paths but affect how global-dedup shard bytes are produced and
validated.
>
> **Overview**
> Adds a new simulation control to set **global-dedup shard
expiration**: `DirectAccessClient::set_global_dedup_shard_expiration`
now makes `query_for_global_dedup_shard` optionally return *minimal*
shard bytes (file section stripped) with `shard_key_expiry = now +
expiration` (sub-second durations round up).
>
> Extends `MDBMinimalShard` serialization with
`serialize_xorb_subset_with_expiry` to write an optional
`shard_key_expiry` footer, and updates `LocalClient`/`MemoryClient` to
use it when expiration is enabled.
>
> Unifies and expands runtime simulation knobs under
`/simulation/set_config` (global dedup expiration, max ranges per fetch,
disable V2 reconstruction, API delay, URL expiration) and updates
`SimulationControlClient` to apply them via a retried async POST. Also
moves integrity/reachability checks to `DeletionControlableClient`, adds
`verify_all_reachable`, and wires new `/simulation/verify_all_reachable`
with 501 behavior when no deletion client is configured.
>
> Separately, introduces **simulation-only xorb cut thresholds**
(`XORB_CUT_THRESHOLD_*`) driven by new `xet_runtime` xorb config
overrides, and updates upload/dedup code paths to use these thresholds.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
42bd9c3f4f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This is an optional change. basically heed imports a bunch of deps and
it's also using lmdb that may require more compilation/linking steps in
tests. we use it for such a small subset of operations in testing I
thought we might try an even thinner rust-native dep instead. that's
what redb is.
## Summary
- Replace `heed` (C LMDB bindings) with `redb` (pure Rust embedded KV
store) in `LocalClient`
- Removes C dependency, `unsafe` block, Windows retry workaround, and
custom `Drop` impl
- Introduces `RedbHash` newtype wrapper for `MerkleHash` to satisfy
orphan rules on redb's `Key`/`Value` traits
- Net reduction of ~130 lines; all 147 existing tests pass
## Test plan
- [x] `cargo check -p xet-client --features simulation` — clean
- [x] `cargo test -p xet-client --features simulation` — 147 passed, 0
failed
- [x] `cargo clippy -p xet-client --features simulation` — clean
- [x] `cargo +nightly fmt` — clean
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Swaps the embedded KV store used for shard dedup/deletion metadata in
the local CAS simulation, which can affect test behavior and on-disk
state/locking semantics (especially with concurrent clients). Scope is
contained to simulation/test code and dependency graph changes.
>
> **Overview**
> Switches `LocalClient`’s disk-backed global-dedup and file deletion
status storage from `heed`/LMDB to `redb`, including new `RedbHash`
serialization, `TableDefinition`s, and updated read/write transaction
flows.
>
> Adds a small global database-handle cache to avoid `redb`
exclusive-lock conflicts across multiple `LocalClient` instances, and
removes the prior LMDB-specific open/retry logic and custom `Drop` close
path. Workspace dependencies/lockfiles are updated to drop
`heed`/LMDB-related crates and add `redb`, and `.gitignore` now ignores
`.worktrees/`.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
02d39864d9. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This PR makes two significant, breaking API redesign:
1. Auth tokens move from session-level (shared by all operations) to
per-operation level (per `UploadCommit`, `FileDownloadGroup`, and
`DownloadStreamGroup`). This enables uploads and downloads from the same
session to carry different access-level tokens — a sensible design for
HF's write-vs-read token split.
2. Instead of letting users provide a callback to refresh tokens, this
new API now let users provide a token refresh URL and access credential
in an HTTP header map.
### Why
1. CAS JWT have short life, but `XetSession` is intended to be held long
time -- thus it makes more sense to configure CAS auth on the operation
level (`UploadCommit` or `FileDownloadGroup` or `DownloadStreamGroup`)
and it will be discarded once the operation is done.
2. For different access level (write vs. read) and different operation
target (repo and commit), CAS JWT token will be different and the token
refresh URL will be different. `UploadCommit` and `FileDownloadGroup`
and `DownloadStreamGroup` they each also function as a single auth
group.
3. Providing an URL is considered easier than writing a callback, and is
more safe when crossing the GIL Python - Rust boundary.
Examples:
```
// Upload token (write access)
let mut upload_headers = HeaderMap::new();
upload_headers.insert("Authorization", "Bearer hub-write-token".parse().unwrap());
let commit = session
.new_upload_commit()?
.with_token_info("CAS_WRITE_JWT", 900)
.with_token_refresh_url("https://huggingface.co/api/repos/token/write", upload_headers)
.build_blocking()?;
```
```
// File download token (read access)
let mut dl_headers = HeaderMap::new();
dl_headers.insert("Authorization", "Bearer hub-read-token".parse().unwrap());
let group = session
.new_file_download_group()?
.with_token_info("CAS_READ_JWT", 900)
.with_token_refresh_url("https://huggingface.co/api/repos/token/read", dl_headers)
.build_blocking()?;
```
Secondary changes include:
- `DirectRefreshRouteTokenRefresher` consolidated into
`xet_client::cas_client::auth`.
- HTTP client module moved from `cas_client` to `xet_client::common` for
shared use between `xet_client::cas_client` and
`xet_client::hub_client`.
- New `DownloadStreamGroup` type (streaming downloads moved off
`XetSession`).
- Fix Session ID type regression: this was fixed once in
https://github.com/huggingface/xet-core/pull/738 but regressed again,
seems AI agents don't learn.
- HTTP client cache key now incorporates custom headers
## Summary
- **Remove unused dependencies**: warp (zero imports), paste (zero
invocations), tower-service (zero imports), and heed misplacement in
xet_core_structures
- **Move mockall to dev-dependencies** in xet_client by gating
`#[automock]` with `#[cfg_attr(test, automock)]`
- **Feature-gate simulation module** behind `simulation` cargo feature
in xet_client, making axum, heed, humantime, futures-util,
human-bandwidth, and tower-http optional
- **Replace duration-str with humantime** (~2 deps vs ~78 transitive
deps) across xet_runtime, xet_client simulation, and simulation crate
## Impact
| Metric | Before | After | Change |
|---|---|---|---|
| hf-xet production deps | 371 | 321 | **-50** |
| Workspace total | 575 | 569 | -6 |
## Test plan
- [x] `cargo check --workspace` passes
- [x] `cargo check -p hf-xet` passes (without simulation feature — key
validation)
- [x] `cargo test --workspace` — all tests pass (4 pre-existing auth
test failures in git_xet unrelated to this PR)
- [x] `cargo tree -p hf-xet -e normal --prefix none | sort -u | wc -l`
confirms 321 deps
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk because it changes dependency graph and Cargo feature
gating (notably `xet-client` simulation modules and CI test features),
which can affect build/test behavior across targets despite minimal
runtime logic changes.
>
> **Overview**
> Reduces workspace dependency surface by removing `duration-str`
(replaced with `humantime`) and trimming other transitive-heavy crates;
updates lockfiles accordingly across the workspace, `hf_xet`, and WASM
builds.
>
> Introduces/propagates a `simulation` Cargo feature: `xet-client`’s
simulation server-related deps become optional and are only
compiled/exported when `feature = "simulation"` is enabled; `git_xet`
adds a `simulation` feature that forwards to dependent crates, and CI
now runs tests with `strict simulation git-xet-for-integration-test`.
>
> Minor repo hygiene updates include ignoring `.claude/` in `.gitignore`
and wiring the `simulation` crate to depend on `xet-client` with
`features = ["simulation"]` (plus swapping its duration parsing helper
to `humantime`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6abc194398. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This PR performs some housecleaning and removes some technical debt
around using different error types, unifying them with the python
interface.
- Our client code tended to do a lot with anyhow errors as an artifact
of first using them before switching to thiserror. This PR cleans these
up in favor of using ClientError or other named error types directly.
- It also removes all the aliases to the old error type names present in
the packages before the refactoring, now settling into ClientError,
FormatError, DataError, and RuntimeError, with XetError being the error
type exposed publicly.
- Also, currently, xet_session exposes SessionError as an alias of
XetError, which adds an extra public type name without adding behavior.
This PR removes that alias and standardizes the public API/docs onto
XetError directly.
-It also tightens Python-facing error behavior and moves the python
handling to the XetError class directly, hidden behind a python feature
flag. Using these types, hf_xet now registers XetObjectNotFoundError and
XetAuthenticationError exception classes for authentication and the
not-found cases. These inherit from the current exception classes, so
all behavior is preserved.
- In addition, the From for PyErr mapping routes
timeout/network/auth/not-found categories to more appropriate Python
exception types than simply RuntimeError.
This is primarily an API-surface cleanup plus error-classification
alignment.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> API-breaking error-surface changes (removal of legacy alias modules
and signature changes like `CredentialHelper::fill_credential`) may
require downstream code updates, especially where errors are
matched/converted. Runtime behavior should be mostly unchanged, but
error mapping/propagation paths (including Python exceptions) are widely
touched across crates.
>
> **Overview**
> This PR **unifies error types across the workspace** by removing
legacy re-export/alias modules (e.g. `CasClientError`, `CasTypesError`,
`DataProcessingError`, `SessionError`) and updating call sites to use
canonical errors like `xet_client::ClientError`,
`xet_core_structures::CoreError`, and `xet_data::DataError` directly.
>
> It updates CAS client code to **standardize on
`crate::error::Result`/`ClientError`**, including deleting
`cas_client/error.rs`, adjusting error conversions in retry/http
middleware paths, and updating simulation/local-server code to map
`ClientError` to HTTP responses.
>
> Python bindings (`hf_xet`) now **convert failures via `XetError`**
(with `xet_pkg` built with `python` support), register custom exceptions
on module init, and refine argument-validation errors to `PyValueError`
while routing network/timeout/auth/not-found to more appropriate Python
exception classes.
>
> Misc cleanup: `git_xet` now depends on `xet-data`, simulation binaries
switch to `anyhow::Result`/`bail!`, and lockfiles are updated for
new/updated dependencies (notably `pyo3`/`inventory`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
f3d056a909. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
The session id was replaced from `ulid` to `UniqueID` (a self
incrementing u64 in memory) in a previous PR but it's not correct.
The session id is used on CAS server logs and traces and CDN logs to
identity a related group of activity (for debugging and etc. purposes)
and it needs to be globally unique (thus using `ulid`) instead of
locally unique.
This PR should resolve all Dependabot alerts by upgrading deps and
switching out some deprecated crate for suggested alternatives, e.g.
`tempdir -> tempfile`. Supersede PR #721. Fix issue #722
Currently, progress tracking is split between callback-driven and
snapshot-driven paths, making session and task wiring across xet_data,
xet_pkg, hf_xet, and git_xet harder to keep consistent. This PR moves
upload/download progress to a polling snapshot model backed by atomics.
It also switches task identifiers to a UniqueID common with the progress
tracking throughout the session APIs.
This PR also updates the rate estimation to use the lighter weight
exponentially weighted moving averages model, so this can be done at a
low level.
To preserve compatibility for existing callback consumers,
callback-oriented upload/download progress tracking APIs are moved under
xet_pkg::legacy and bridged from polling snapshots via a callback based
updaters. hf_xet and git_xet are updated to use that legacy bridge
layer, so current integrations keep working until everything is fully
switched over to the XetSession method.
This PR adds in a feature flag, "python" to the xet_runtime package such
that when compiled, the XetConfig struct is built to have python getters
and setters. This integrates the handling of the config struct directly
into the XetConfig struct and the macros used to register the config
values, making the handling of values in the python bindings seamless.
This PR is a massive rearrangement of the code base into 5 packages
intended for release on cargo. The directories and corresponding
packages are:
1. xet_runtime/ — compiles into the xet-runtime package. Contains the
runtime, config, and logging management.
2. xet_core_structures/ — compiles into the xet-core-structures package.
Contains core data structures for hashing, shards, and xorbs as well as
internal data structures that depend on these.
3. xet_client/ — compiles into the xet-client package, contains client
code for remotely connecting to the Hugging Face servers.
4. xet_data/ — compiles into the xet-data package, contains the data
processing pipeline: chunking/deduplication, file reconstruction,
clean/smudge operations, and progress tracking.
5. xet_pkg/ — compiles into the hf-xet package, provides the top-level
session-based API for file upload and download with user-facing error
categorization. This is the primary package downstream dependencies
would use. This also contains a single summary error type, XetError,
that translates cleanly into python error types.
In addition, the other tools are:
- git_xet/ — the git_xet CLI binary crate (location preserved).
- hf_xet/ -- the hf_xet python package (location preserved).
- simulation/ — the simulation crate for upload scenario benchmarking.
- wasm/ -- the wasm objects.
The full description — and information for an AI agent to use to update
downstream dependencies — is at
api_changes/update_260309_package_restructure.md.
Summary of moves:
- xet_runtime: became xet_runtime::core inside xet_runtime/.
- utils: became xet_runtime::utils inside xet_runtime/.
- xet_config: became xet_runtime::config inside xet_runtime/.
- xet_logging: became xet_runtime::logging inside xet_runtime/.
- error_printer: became xet_runtime::error_printer inside xet_runtime/.
- file_utils: became xet_runtime::file_utils inside xet_runtime/.
- merklehash: became xet_core_structures::merklehash inside
xet_core_structures/.
- mdb_shard: became xet_core_structures::metadata_shard inside
xet_core_structures/.
- xorb_object: became xet_core_structures::xorb_object inside
xet_core_structures/.
- cas_client: became xet_client::cas_client inside xet_client/.
- hub_client: became xet_client::hub_client inside xet_client/.
- cas_types: became xet_client::cas_types inside xet_client/.
- chunk_cache: became xet_client::chunk_cache inside xet_client/.
- data: became xet_data::processing inside xet_data/.
- deduplication: became xet_data::deduplication inside xet_data/.
- file_reconstruction: became xet_data::file_reconstruction inside
xet_data/.
- progress_tracking: became xet_data::progress_tracking inside
xet_data/.
- xet_session: became xet::xet_session inside xet_pkg/.
- Wasm packages (hf_xet_wasm, hf_xet_thin_wasm): moved from top-level
into wasm/; internal imports updated, public APIs unchanged.