Files
xet-core/wasm/hf_xet_wasm
Assaf Vayner 5868f64ab9 fixing some issues identified in cargo audit (#802)
CI for hf-hub is running cargo audit and found many issues through
hf-xet transitive deps. this PR attempts to solve some of them (not
necessarily all of them).

Main changes:
- dropped derivative and reqwest-retry
- replaced bincode with postcard, only used in testing
- upgrade xet-core rand usage
- added audit CI step and ignoring some issues that we can't easily fix.





<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Medium risk because it removes `reqwest-retry`/`derivative` and
replaces part of the retry classification logic with an in-house
equivalent, which could subtly change HTTP retry behavior; the remaining
changes are dependency/version bumps and test-only serialization swaps.
> 
> **Overview**
> Adds a new CI `cargo audit` job and introduces `.cargo/audit.toml` to
ignore a small set of **dev-only** RustSec advisories with documented
rationale.
> 
> Reduces audit surface by dropping `derivative` (manual `Debug` impl
for `AuthConfig`) and removing `reqwest-retry`, replacing its
status-code classification with a local `Retryable` enum +
`default_on_request_success` helper in `RetryWrapper`.
> 
> Updates workspace deps (notably `rand` to `0.10` and `rand_distr` to
`0.6`) and adjusts call sites to the newer `rand` APIs (`RngExt`
imports, minor test/bench tweaks). Test-only binary serialization
switches from `bincode` to `postcard` (and updates affected tests), with
corresponding lockfile updates across crates.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
26377f4a1c. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-04-20 14:49:48 -07:00
..

hf_xet_wasm: xet-core for WebAssembly

This crate enables functionality to use the xet upload protocol from the browser with the use of a wasm based binary replicating the functionality of the hf_xet python library. Functionality included but not limited to chunking, global deduplication, xorb formation, xorb upload, shard formation, shard upload.

Download functionality is not currently supported.

hf_xet_wasm has: chunking, global deduplication, xorb formation, xorb upload, shard formation, shard upload

hf_xet_wasm is missing: complete download support (xorbs, shards, chunk caching)

Critical Differences and Changes

In order to compile xet-core to wasm there are numerous changes:

  • A version of the data crate that does not assume the presence of any tokio threads
    • there is not yet such a thing as "multiple threads" in WebAssembly (at the time of writing)
    • Additionally only a specific feature set of tokio is supported in WASM, we only use those traits: ["sync", "rt", "macros", "time", "io-util"]
  • To support multithreading we use web workers (wasm_thread dependency)
  • Any components that use async_trait are required to change the async_trait proc_macro usage to not dictate Send'ness
    • any use of #[async_trait::async_trait] becomes:
    • #[cfg_attr(not(target_family = "wasm"), async_trait::async_trait)]
      #[cfg_attr(target_family = "wasm", async_trait::async_trait(?Send))]
      pub trait Blah {}
      
    • this is required as the output from the async_trait macro is not compatible to be Send when compiled to WASM
    • (pattern adopted from from reqwest_middleware)
  • Moves any operations that utilise or rely on the file system to in memory, primarily shard formation and storage
    • We choose not to use on the file system interface provided to browser based applications
  • Remove custom dns resolver to HTTP requests
    • HTTP requests in the browser are limited fetch calls made by reqwest.
    • custom dns is not allowed, only HTTP

Build Instructions

  • Install nightly toolchain and dependencies:
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
cargo install --version 0.2.100 wasm-bindgen-cli
  • Build with ./build_wasm.sh (bash)

Run Instructions

The runnable example is composed of a set of files in the examples directory.

First fill up the four [FILL_ME] fields in examples/index.html with a desired testing target.

Then serve the web directory using a local http server, for example, https://crates.io/crates/sfz.

  • Install sfz:
cargo install sfz
  • Serve the web
sfz --coi -r examples
  • Observe in browser In browser, go to URL http://127.0.0.1:5000, hit F12 and check the output under the "Console" tab.

Authentication in hf_xet_wasm

Like hf_xet it is the caller's responsibility to set up authentication with the CAS server by getting a token from the huggingface hub. The caller is also required to provide a method to get a fresh/refreshed token from the hub in the event of token expiration.

In hf_xet_wasm it must be supplied to the XetSession using a user-defined set of interfaces.

    class TokenInfo {
        token(): string {
        }
        exp(): bigint {
            return this.exp;
        }
    }

    class TokenRefresher {
        async refreshToken(): TokenInfo {
        }
    }

    const xetSession = new XetSession(<cas-enpdoint>, tokenInfo, tokenRefresher);