Add napi smoke-test example for hf-xet (#835)

Human context: in integrating new hf-hub usage in tokenizers, tokenizers
also generated a napi binary. so we should validate that hf-hub/hf-xet
are napi compat (hf-hub is pretty trivial if given that hf-xet is
compatible).

## Summary

- Adds `examples/xet_pkg_napi/` — a minimal napi-rs binding that links
`hf-xet` (the `xet` crate at `xet_pkg/`) into a Node.js native addon.
- Exposes `initLogging(version)` and `smokeTest()`. The smoke test
builds a `XetSession` synchronously and constructs upload-commit +
file-download-group builders to exercise lazy runtime startup.
- Crate is excluded from the xet-core workspace and carries its own
`[workspace]` table so it stays standalone under git worktrees (where
cargo would otherwise resolve through the canonical repo path).
- Build artifacts (`*.node`, `index.js`, `index.d.ts`, `node_modules/`)
are gitignored; `Cargo.lock` and `package-lock.json` are committed for
reproducibility.

The point of the smoke test is **not** a full JS API — it's to verify
hf-xet compiles, links, and starts inside libuv (no pyo3, no host-owned
tokio). If you can run `npm run smoke` and see `xet session built;
runtime initialized`, the integration is ready for a fuller binding
(async upload/download via `#[napi]` async fns, progress callbacks via
`ThreadsafeFunction`).

## Test plan

- [x] `npm install` (in `examples/xet_pkg_napi/`)
- [x] `npm run build:debug` — compiles `hf-xet`, `xet-runtime`,
`xet-client`, `xet-data`, `xet-core-structures` and the napi shim
against napi 2.16
- [x] `npm run smoke` — outputs:
  ```
  loaded addon, exports: [ 'initLogging', 'smokeTest' ]
  smokeTest: xet session built; runtime initialized
  ```
- [x] Verify on Linux (only tested on darwin-arm64 locally)
- [x] Decide whether to wire into CI, or keep as a manual example

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Adds a standalone example project and build scripts without changing
production crates; primary risk is repo bloat/noise from the committed
lockfiles and an extra exclusion in the workspace.
> 
> **Overview**
> Adds a new standalone `examples/xet_pkg_napi` project to smoke-test
that `hf-xet` can compile/link as a `napi-rs` Node native addon and
perform a real file download via the blocking download APIs.
> 
> Updates the root `Cargo.toml` to **exclude** this example from the
workspace, and includes the example’s build/run scaffolding
(`package.json`, `smoke.mjs`, `build.rs`) plus committed lockfiles and a
`.gitignore` for generated artifacts.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
cb628956f7. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This commit is contained in:
Assaf Vayner
2026-05-14 13:48:47 -07:00
committed by GitHub
parent 654080d080
commit c3c726bed5
10 changed files with 3873 additions and 0 deletions

View File

@@ -17,6 +17,7 @@ exclude = [
"hf_xet",
"wasm/hf_xet_wasm",
"wasm/hf_xet_thin_wasm",
"examples/xet_pkg_napi",
]
[workspace.package]

5
examples/xet_pkg_napi/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
node_modules/
*.node
index.js
index.d.ts
downloads/

3448
examples/xet_pkg_napi/Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,38 @@
# Standalone — not part of the xet-core workspace. The workspace excludes this
# directory, but napi-rs's build flow can resolve cargo against the *original*
# (non-worktree) repo path during development, which would otherwise drag this
# crate back into the workspace it's deliberately outside of.
[workspace]
[package]
name = "xet_pkg_napi"
version = "0.0.1"
edition = "2024"
license = "Apache-2.0"
description = "Smoke-test napi binding for hf-xet (xet_pkg). Verifies that the Rust client builds and links inside a Node.js native addon."
publish = false
[lib]
crate-type = ["cdylib"]
[dependencies]
# hf-xet is published as `hf-xet` but the lib name is `xet`. Pull it via path.
hf-xet = { path = "../../xet_pkg" }
# napi-rs 2.x — the standard for Rust → Node.js native addons.
napi = { version = "2", default-features = false, features = ["napi8"] }
napi-derive = "2"
[build-dependencies]
napi-build = "2"
# cargo-machete can't match `hf-xet` (package name) to `xet::` (lib name in source).
[package.metadata.cargo-machete]
ignored = ["hf-xet"]
[profile.release]
lto = true
opt-level = 3
debug = 1
strip = "symbols"

View File

@@ -0,0 +1,103 @@
# xet_pkg_napi — napi smoke test for hf-xet
A minimal [napi-rs](https://napi.rs) native addon that links against the
`xet_pkg` (`hf-xet`) crate. Verifies that `hf-xet` compiles, links, starts up,
and can actually pull a file from CAS — all from inside a Node.js native
module.
## What it exports
The Rust crate at `src/lib.rs` exposes three functions to Node:
- `initLogging(version: string)` — installs `xet`'s tracing subscriber.
- `smokeTest(): string` — builds a `XetSession` synchronously and constructs
upload-commit + file-download-group builders. No I/O.
- `downloadFile(opts): { destPath, bytesDownloaded }` — actually downloads a
Xet-stored file from the HuggingFace Hub. **Synchronous**: blocks the libuv
main thread until the download finishes, so the JS event loop is paused for
the duration. Acceptable for a smoke test; a real binding should wrap this
in `napi::Task` / `tokio::task::spawn_blocking`.
This crate is **excluded from the workspace** (see the root `Cargo.toml`)
and carries its own `[workspace]` table because it has its own
`crate-type = ["cdylib"]` and ships under the `napi-rs/cli` build flow rather
than `cargo build`.
## Build & run
Requires Node ≥ 18, a Rust toolchain, and outbound network access to
`huggingface.co` and `cas-bridge.xethub.hf.co`.
```sh
cd examples/xet_pkg_napi
npm install
npm run build:debug # or `npm run build` for release
npm run smoke
```
`napi build` writes two artifacts next to `package.json`:
- `xet-pkg-napi.<platform>-<arch>.node` — the compiled cdylib
- `index.js` / `index.d.ts` — a CJS shim that picks the right `.node` for the
current platform
`smoke.mjs`:
1. Issues a `HEAD` against the HF Hub `resolve` URL with a non-default
User-Agent (Cloudfront strips `X-Xet-Hash` on cache hits served to
default UAs).
2. Reads `X-Xet-Hash`, `X-Linked-Size`, and `X-Linked-Etag` from the response.
3. Calls `downloadFile()` with the parsed metadata.
4. Verifies the on-disk size matches `X-Linked-Size`.
### Configuration
All env vars are optional. Defaults target a tiny (~540 KB) public Xet file
so the smoke test runs quickly without an HF token.
| Var | Default |
| -------------- | -------------------------------------------------- |
| `HF_ENDPOINT` | `https://huggingface.co` |
| `HF_REPO_TYPE` | `model` (`model` \| `dataset` \| `space`) |
| `HF_REPO` | `hf-internal-testing/tiny-random-bert` |
| `HF_BRANCH` | `main` |
| `HF_FILENAME` | `pytorch_model.bin` |
| `HF_TOKEN` | _unset_ (required for private repos) |
| `HF_DEST_DIR` | `./downloads` |
## Expected output
```
loaded addon, exports: [ 'initLogging', 'smokeTest', 'downloadFile' ]
Fetching xet metadata for model:hf-internal-testing/tiny-random-bert/pytorch_model.bin@main
https://huggingface.co/hf-internal-testing/tiny-random-bert/resolve/main/pytorch_model.bin
xet-hash: 75402e74462600f62ca4a08b91c9218f36075860d5f6d7eb07f4c29ed7fa4ad6
size: 540,217 bytes
sha256: 9922e8996d0c7e24c7f4e7a5d9c5b7303549f4ee94de0f1138b103014b51be13
smokeTest: xet session built; runtime initialized
Downloading -> downloads/pytorch_model.bin
Result:
bytes downloaded: 540,217
on-disk size: 540,217
elapsed: 1.23s
OK — file downloaded and size matches.
```
## Notes / caveats
- **Synchronous download.** A real binding should expose this as
`#[napi]` async fn or wrap in `napi::Task` so the JS event loop isn't blocked
while xet pulls bytes from CAS.
- **No double runtime.** `xet-runtime` owns its own tokio runtime; it doesn't
piggyback on libuv. The blocking calls used here use `block_on` against
xet's runtime, so napi's main thread is the only thread that gets parked.
- **Metadata source.** The xet hash + file size come from the HF Hub's
`X-Xet-Hash` / `X-Linked-Size` headers. A non-default `User-Agent` is
required because Cloudfront caches strip those headers on cache hits served
to default UAs.
- **napi feature level.** Built against `napi8`. Bumping to `napi9`+ would
unlock newer N-API surfaces if needed.

View File

@@ -0,0 +1,3 @@
fn main() {
napi_build::setup();
}

35
examples/xet_pkg_napi/package-lock.json generated Normal file
View File

@@ -0,0 +1,35 @@
{
"name": "xet-pkg-napi",
"version": "0.0.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "xet-pkg-napi",
"version": "0.0.1",
"devDependencies": {
"@napi-rs/cli": "^2.18.4"
},
"engines": {
"node": ">=18"
}
},
"node_modules/@napi-rs/cli": {
"version": "2.18.4",
"resolved": "https://registry.npmjs.org/@napi-rs/cli/-/cli-2.18.4.tgz",
"integrity": "sha512-SgJeA4df9DE2iAEpr3M2H0OKl/yjtg1BnRI5/JyowS71tUWhrfSu2LT0V3vlHET+g1hBVlrO60PmEXwUEKp8Mg==",
"dev": true,
"license": "MIT",
"bin": {
"napi": "scripts/index.js"
},
"engines": {
"node": ">= 10"
},
"funding": {
"type": "github",
"url": "https://github.com/sponsors/Brooooooklyn"
}
}
}
}

View File

@@ -0,0 +1,22 @@
{
"name": "xet-pkg-napi",
"version": "0.0.1",
"description": "napi-rs smoke test for hf-xet",
"private": true,
"main": "index.js",
"type": "commonjs",
"napi": {
"name": "xet-pkg-napi"
},
"scripts": {
"build": "napi build --platform --release",
"build:debug": "napi build --platform",
"smoke": "node smoke.mjs"
},
"devDependencies": {
"@napi-rs/cli": "^2.18.4"
},
"engines": {
"node": ">=18"
}
}

View File

@@ -0,0 +1,112 @@
// Smoke driver: load the napi addon, fetch a public Xet file's metadata from
// the HuggingFace Hub, then download the file via the binding.
//
// Run after `npm run build` (or `npm run build:debug`):
//
// node smoke.mjs
//
// Optional env vars (defaults pick a tiny ~540KB public Xet file):
// HF_ENDPOINT default: https://huggingface.co
// HF_REPO_TYPE default: model (model | dataset | space)
// HF_REPO default: hf-internal-testing/tiny-random-bert
// HF_BRANCH default: main
// HF_FILENAME default: pytorch_model.bin
// HF_TOKEN optional; required for private repos
// HF_DEST_DIR default: ./downloads
import { createRequire } from "node:module";
import { mkdirSync, statSync, rmSync } from "node:fs";
import { join, basename } from "node:path";
const require = createRequire(import.meta.url);
const addon = require("./index.js");
console.log("loaded addon, exports:", Object.keys(addon));
const endpoint = process.env.HF_ENDPOINT ?? "https://huggingface.co";
const repoType = process.env.HF_REPO_TYPE ?? "model";
const repoId = process.env.HF_REPO ?? "hf-internal-testing/tiny-random-bert";
const branch = process.env.HF_BRANCH ?? "main";
const filename = process.env.HF_FILENAME ?? "pytorch_model.bin";
const token = process.env.HF_TOKEN ?? null;
const destDir = process.env.HF_DEST_DIR ?? "./downloads";
const repoPathSegment = repoType === "model" ? "" : `${repoType}s/`;
const apiTypeSegment = `${repoType}s`;
const resolveUrl =
`${endpoint}/${repoPathSegment}${repoId}/resolve/${branch}/${filename}`;
const tokenRefreshUrl =
`${endpoint}/api/${apiTypeSegment}/${repoId}/xet-read-token/${branch}`;
console.log(`\nFetching xet metadata for ${repoType}:${repoId}/${filename}@${branch}`);
console.log(` ${resolveUrl}`);
// HEAD against /resolve/ — Cloudfront strips X-Xet-Hash on cache hits served
// to default UAs, so spoof a hf-xet-style User-Agent + cache-bust the URL.
const headResp = await fetch(`${resolveUrl}?_=${Date.now()}`, {
method: "HEAD",
redirect: "manual",
headers: {
"User-Agent": "xet-pkg-napi-smoke/0.1",
...(token ? { Authorization: `Bearer ${token}` } : {}),
},
});
if (headResp.status !== 302 && headResp.status !== 200) {
throw new Error(
`HEAD ${resolveUrl} returned ${headResp.status} ${headResp.statusText}` +
(token ? "" : " — set HF_TOKEN if the repo is private"),
);
}
const xetHash = headResp.headers.get("x-xet-hash");
const linkedSize = headResp.headers.get("x-linked-size");
const linkedEtag = headResp.headers.get("x-linked-etag");
if (!xetHash || !linkedSize) {
throw new Error(
"Hub did not return X-Xet-Hash / X-Linked-Size headers — is this a Xet-stored file?",
);
}
const sha256 = linkedEtag ? linkedEtag.replace(/"/g, "") : null;
const fileSize = Number(linkedSize);
console.log(` xet-hash: ${xetHash}`);
console.log(` size: ${fileSize.toLocaleString()} bytes`);
console.log(` sha256: ${sha256 ?? "(unknown)"}`);
mkdirSync(destDir, { recursive: true });
const destPath = join(destDir, basename(filename));
rmSync(destPath, { force: true });
addon.initLogging("xet_pkg_napi/0.0.1 smoke");
const smokeResult = addon.smokeTest();
console.log(`smokeTest: ${smokeResult}`);
console.log(`\nDownloading -> ${destPath}`);
const t0 = Date.now();
const result = addon.downloadFile({
tokenRefreshUrl,
...(token ? { authToken: token } : {}),
xetHash,
fileSize,
...(sha256 ? { sha256 } : {}),
destPath,
});
const elapsedSec = (Date.now() - t0) / 1000;
const stat = statSync(destPath);
console.log(`\nResult:`);
console.log(` bytes downloaded: ${result.bytesDownloaded.toLocaleString()}`);
console.log(` on-disk size: ${stat.size.toLocaleString()}`);
console.log(` elapsed: ${elapsedSec.toFixed(2)}s`);
if (stat.size !== fileSize) {
throw new Error(
`size mismatch: expected ${fileSize}, got ${stat.size} on disk`,
);
}
console.log("\nOK — file downloaded and size matches.");

View File

@@ -0,0 +1,106 @@
//! napi smoke-test binding for `hf-xet` (the `xet` crate at `xet_pkg/`).
//!
//! Exposes:
//! - `initLogging(version)` — install xet's tracing subscriber
//! - `smokeTest()` — build a `XetSession` and runtime helpers without doing any I/O
//! - `downloadFile(opts)` — actually download a Xet-stored file from the HuggingFace Hub to a local path
//!
//! `downloadFile` is intentionally synchronous: it blocks the caller until the
//! download completes, internally using `xet`'s `*_blocking` APIs which run on
//! xet-runtime's own tokio runtime. Calling it from JS will block the libuv
//! main thread for the duration — fine for a smoke test, but a real binding
//! should wrap this in `napi::Task` / `tokio::task::spawn_blocking` so the JS
//! event loop stays responsive.
use napi::{Error as NapiError, Status};
use napi_derive::napi;
use xet::xet_session::{HeaderMap, HeaderValue, XetFileInfo, XetSessionBuilder, header};
fn to_napi_err<E: std::fmt::Display>(e: E) -> NapiError {
NapiError::new(Status::GenericFailure, e.to_string())
}
#[napi(js_name = "initLogging")]
pub fn init_logging(version: String) {
xet::init_logging(version);
}
#[napi(js_name = "smokeTest")]
pub fn smoke_test() -> Result<String, NapiError> {
let session = XetSessionBuilder::new().build().map_err(to_napi_err)?;
let _upload = session.new_upload_commit().map_err(to_napi_err)?;
let _download = session.new_file_download_group().map_err(to_napi_err)?;
Ok("xet session built; runtime initialized".to_string())
}
/// Options for [`downloadFile`].
///
/// `xetHash` and `fileSize` come from the HuggingFace Hub's
/// `X-Xet-Hash` and `X-Linked-Size` response headers (issue a HEAD against
/// the `/{repo}/resolve/{ref}/{filename}` URL with a `User-Agent` to see them
/// — Cloudfront strips them on cache hits without a UA hint).
#[napi(object, js_name = "DownloadFileOptions")]
pub struct DownloadFileOptions {
/// The HuggingFace Hub's xet-read-token endpoint, e.g.
/// `https://huggingface.co/api/models/{repo}/xet-read-token/{ref}`.
pub token_refresh_url: String,
/// Optional bearer token for the refresh endpoint. Required for private
/// repos; for public repos this can be `null`.
pub auth_token: Option<String>,
/// The xet content hash (hex string) of the file to download.
pub xet_hash: String,
/// The file's size in bytes. JS `number` is precise up to 2^53; HF files
/// are well under that.
pub file_size: i64,
/// Optional SHA-256 (hex) used by xet to verify the download.
pub sha256: Option<String>,
/// Local filesystem destination for the downloaded file.
pub dest_path: String,
}
/// Result of a successful [`downloadFile`] call.
#[napi(object, js_name = "DownloadFileResult")]
pub struct DownloadFileResult {
pub dest_path: String,
pub bytes_downloaded: i64,
}
#[napi(js_name = "downloadFile")]
pub fn download_file(opts: DownloadFileOptions) -> Result<DownloadFileResult, NapiError> {
let mut headers = HeaderMap::new();
if let Some(token) = opts.auth_token.as_deref() {
let value = HeaderValue::from_str(&format!("Bearer {token}"))
.map_err(|e| NapiError::new(Status::InvalidArg, format!("invalid auth token: {e}")))?;
headers.insert(header::AUTHORIZATION, value);
}
let file_size: u64 = opts
.file_size
.try_into()
.map_err(|_| NapiError::new(Status::InvalidArg, "fileSize must be non-negative"))?;
let file_info = match opts.sha256 {
Some(sha) => XetFileInfo::new_with_sha256(opts.xet_hash, file_size, sha),
None => XetFileInfo::new(opts.xet_hash, file_size),
};
let session = XetSessionBuilder::new().build().map_err(to_napi_err)?;
let group = session
.new_file_download_group()
.map_err(to_napi_err)?
.with_token_refresh_url(opts.token_refresh_url, headers)
.build_blocking()
.map_err(to_napi_err)?;
let dest_path = std::path::PathBuf::from(&opts.dest_path);
group
.download_file_to_path_blocking(file_info, dest_path.clone())
.map_err(to_napi_err)?;
let report = group.finish_blocking().map_err(to_napi_err)?;
let bytes_downloaded: i64 = report.progress.total_bytes_completed.try_into().unwrap_or(i64::MAX);
Ok(DownloadFileResult {
dest_path: opts.dest_path,
bytes_downloaded,
})
}