Human context: in integrating new hf-hub usage in tokenizers, tokenizers
also generated a napi binary. so we should validate that hf-hub/hf-xet
are napi compat (hf-hub is pretty trivial if given that hf-xet is
compatible).
## Summary
- Adds `examples/xet_pkg_napi/` — a minimal napi-rs binding that links
`hf-xet` (the `xet` crate at `xet_pkg/`) into a Node.js native addon.
- Exposes `initLogging(version)` and `smokeTest()`. The smoke test
builds a `XetSession` synchronously and constructs upload-commit +
file-download-group builders to exercise lazy runtime startup.
- Crate is excluded from the xet-core workspace and carries its own
`[workspace]` table so it stays standalone under git worktrees (where
cargo would otherwise resolve through the canonical repo path).
- Build artifacts (`*.node`, `index.js`, `index.d.ts`, `node_modules/`)
are gitignored; `Cargo.lock` and `package-lock.json` are committed for
reproducibility.
The point of the smoke test is **not** a full JS API — it's to verify
hf-xet compiles, links, and starts inside libuv (no pyo3, no host-owned
tokio). If you can run `npm run smoke` and see `xet session built;
runtime initialized`, the integration is ready for a fuller binding
(async upload/download via `#[napi]` async fns, progress callbacks via
`ThreadsafeFunction`).
## Test plan
- [x] `npm install` (in `examples/xet_pkg_napi/`)
- [x] `npm run build:debug` — compiles `hf-xet`, `xet-runtime`,
`xet-client`, `xet-data`, `xet-core-structures` and the napi shim
against napi 2.16
- [x] `npm run smoke` — outputs:
```
loaded addon, exports: [ 'initLogging', 'smokeTest' ]
smokeTest: xet session built; runtime initialized
```
- [x] Verify on Linux (only tested on darwin-arm64 locally)
- [x] Decide whether to wire into CI, or keep as a manual example
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Adds a standalone example project and build scripts without changing
production crates; primary risk is repo bloat/noise from the committed
lockfiles and an extra exclusion in the workspace.
>
> **Overview**
> Adds a new standalone `examples/xet_pkg_napi` project to smoke-test
that `hf-xet` can compile/link as a `napi-rs` Node native addon and
perform a real file download via the blocking download APIs.
>
> Updates the root `Cargo.toml` to **exclude** this example from the
workspace, and includes the example’s build/run scaffolding
(`package.json`, `smoke.mjs`, `build.rs`) plus committed lockfiles and a
`.gitignore` for generated artifacts.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
cb628956f7. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
xet_pkg_napi — napi smoke test for hf-xet
A minimal napi-rs native addon that links against the
xet_pkg (hf-xet) crate. Verifies that hf-xet compiles, links, starts up,
and can actually pull a file from CAS — all from inside a Node.js native
module.
What it exports
The Rust crate at src/lib.rs exposes three functions to Node:
initLogging(version: string)— installsxet's tracing subscriber.smokeTest(): string— builds aXetSessionsynchronously and constructs upload-commit + file-download-group builders. No I/O.downloadFile(opts): { destPath, bytesDownloaded }— actually downloads a Xet-stored file from the HuggingFace Hub. Synchronous: blocks the libuv main thread until the download finishes, so the JS event loop is paused for the duration. Acceptable for a smoke test; a real binding should wrap this innapi::Task/tokio::task::spawn_blocking.
This crate is excluded from the workspace (see the root Cargo.toml)
and carries its own [workspace] table because it has its own
crate-type = ["cdylib"] and ships under the napi-rs/cli build flow rather
than cargo build.
Build & run
Requires Node ≥ 18, a Rust toolchain, and outbound network access to
huggingface.co and cas-bridge.xethub.hf.co.
cd examples/xet_pkg_napi
npm install
npm run build:debug # or `npm run build` for release
npm run smoke
napi build writes two artifacts next to package.json:
xet-pkg-napi.<platform>-<arch>.node— the compiled cdylibindex.js/index.d.ts— a CJS shim that picks the right.nodefor the current platform
smoke.mjs:
- Issues a
HEADagainst the HF HubresolveURL with a non-default User-Agent (Cloudfront stripsX-Xet-Hashon cache hits served to default UAs). - Reads
X-Xet-Hash,X-Linked-Size, andX-Linked-Etagfrom the response. - Calls
downloadFile()with the parsed metadata. - Verifies the on-disk size matches
X-Linked-Size.
Configuration
All env vars are optional. Defaults target a tiny (~540 KB) public Xet file so the smoke test runs quickly without an HF token.
| Var | Default |
|---|---|
HF_ENDPOINT |
https://huggingface.co |
HF_REPO_TYPE |
model (model | dataset | space) |
HF_REPO |
hf-internal-testing/tiny-random-bert |
HF_BRANCH |
main |
HF_FILENAME |
pytorch_model.bin |
HF_TOKEN |
unset (required for private repos) |
HF_DEST_DIR |
./downloads |
Expected output
loaded addon, exports: [ 'initLogging', 'smokeTest', 'downloadFile' ]
Fetching xet metadata for model:hf-internal-testing/tiny-random-bert/pytorch_model.bin@main
https://huggingface.co/hf-internal-testing/tiny-random-bert/resolve/main/pytorch_model.bin
xet-hash: 75402e74462600f62ca4a08b91c9218f36075860d5f6d7eb07f4c29ed7fa4ad6
size: 540,217 bytes
sha256: 9922e8996d0c7e24c7f4e7a5d9c5b7303549f4ee94de0f1138b103014b51be13
smokeTest: xet session built; runtime initialized
Downloading -> downloads/pytorch_model.bin
Result:
bytes downloaded: 540,217
on-disk size: 540,217
elapsed: 1.23s
OK — file downloaded and size matches.
Notes / caveats
- Synchronous download. A real binding should expose this as
#[napi]async fn or wrap innapi::Taskso the JS event loop isn't blocked while xet pulls bytes from CAS. - No double runtime.
xet-runtimeowns its own tokio runtime; it doesn't piggyback on libuv. The blocking calls used here useblock_onagainst xet's runtime, so napi's main thread is the only thread that gets parked. - Metadata source. The xet hash + file size come from the HF Hub's
X-Xet-Hash/X-Linked-Sizeheaders. A non-defaultUser-Agentis required because Cloudfront caches strip those headers on cache hits served to default UAs. - napi feature level. Built against
napi8. Bumping tonapi9+ would unlock newer N-API surfaces if needed.