CI for hf-hub is running cargo audit and found many issues through
hf-xet transitive deps. this PR attempts to solve some of them (not
necessarily all of them).
Main changes:
- dropped derivative and reqwest-retry
- replaced bincode with postcard, only used in testing
- upgrade xet-core rand usage
- added audit CI step and ignoring some issues that we can't easily fix.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk because it removes `reqwest-retry`/`derivative` and
replaces part of the retry classification logic with an in-house
equivalent, which could subtly change HTTP retry behavior; the remaining
changes are dependency/version bumps and test-only serialization swaps.
>
> **Overview**
> Adds a new CI `cargo audit` job and introduces `.cargo/audit.toml` to
ignore a small set of **dev-only** RustSec advisories with documented
rationale.
>
> Reduces audit surface by dropping `derivative` (manual `Debug` impl
for `AuthConfig`) and removing `reqwest-retry`, replacing its
status-code classification with a local `Retryable` enum +
`default_on_request_success` helper in `RetryWrapper`.
>
> Updates workspace deps (notably `rand` to `0.10` and `rand_distr` to
`0.6`) and adjusts call sites to the newer `rand` APIs (`RngExt`
imports, minor test/bench tweaks). Test-only binary serialization
switches from `bincode` to `postcard` (and updates affected tests), with
corresponding lockfile updates across crates.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
26377f4a1c. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
hf_xet_wasm: xet-core for WebAssembly
This crate enables functionality to use the xet upload protocol from the browser with the use of a wasm based binary replicating the functionality of the hf_xet python library.
Functionality included but not limited to chunking, global deduplication, xorb formation, xorb upload, shard formation, shard upload.
Download functionality is not currently supported.
hf_xet_wasm has: chunking, global deduplication, xorb formation, xorb upload, shard formation, shard upload
hf_xet_wasm is missing: complete download support (xorbs, shards, chunk caching)
Critical Differences and Changes
In order to compile xet-core to wasm there are numerous changes:
- A version of the data crate that does not assume the presence of any tokio threads
- there is not yet such a thing as "multiple threads" in WebAssembly (at the time of writing)
- Additionally only a specific feature set of tokio is supported in WASM, we only use those traits: ["sync", "rt", "macros", "time", "io-util"]
- To support multithreading we use web workers (wasm_thread dependency)
- Any components that use
async_traitare required to change theasync_traitproc_macro usage to not dictateSend'ness- any use of
#[async_trait::async_trait]becomes: -
#[cfg_attr(not(target_family = "wasm"), async_trait::async_trait)] #[cfg_attr(target_family = "wasm", async_trait::async_trait(?Send))] pub trait Blah {} - this is required as the output from the
async_traitmacro is not compatible to beSendwhen compiled to WASM - (pattern adopted from from reqwest_middleware)
- any use of
- Moves any operations that utilise or rely on the file system to in memory, primarily shard formation and storage
- We choose not to use on the file system interface provided to browser based applications
- Remove custom dns resolver to HTTP requests
- HTTP requests in the browser are limited fetch calls made by reqwest.
- custom dns is not allowed, only HTTP
Build Instructions
- Install nightly toolchain and dependencies:
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
cargo install --version 0.2.100 wasm-bindgen-cli
- Build with
./build_wasm.sh(bash)
Run Instructions
The runnable example is composed of a set of files in the examples directory.
First fill up the four [FILL_ME] fields in examples/index.html with a desired testing target.
Then serve the web directory using a local http server, for example, https://crates.io/crates/sfz.
- Install sfz:
cargo install sfz
- Serve the web
sfz --coi -r examples
- Observe in browser In browser, go to URL http://127.0.0.1:5000, hit F12 and check the output under the "Console" tab.
Authentication in hf_xet_wasm
Like hf_xet it is the caller's responsibility to set up authentication with the CAS server by getting a token from the huggingface hub. The caller is also required to provide a method to get a fresh/refreshed token from the hub in the event of token expiration.
In hf_xet_wasm it must be supplied to the XetSession using a user-defined set of interfaces.
class TokenInfo {
token(): string {
}
exp(): bigint {
return this.exp;
}
}
class TokenRefresher {
async refreshToken(): TokenInfo {
}
}
const xetSession = new XetSession(<cas-enpdoint>, tokenInfo, tokenRefresher);