feat: smoke tests using hf CLI with bucket and large-file coverage (#710)

## Summary - Rewrites smoke tests to drive everything through the `hf` CLI rather than the huggingface_hub Python API, covering the actual user-facing surface area of hf-xet - Moves smoke tests and diagnostic scripts into a `scripts/` directory for cleaner repo layout - Adds storage bucket test suite exercising the full bucket lifecycle - Adds 50 MB and 100 MB files to repo upload/download tests ## Test matrix (14 tests, all passing) **Repository tests** (`hf upload` / `hf download`) - Upload single file, upload folder - Download individual files + SHA-256 verify - Download entire repo + SHA-256 verify - Overwrite file and verify new content served - Delete file and confirm absent **Bucket tests** (`hf buckets`) - `cp` upload / download + verify - `sync` upload / download + verify - Recursive list confirms expected paths - Overwrite via `cp` + verify - `sync --delete` removes extraneous remote files - `rm` + confirm absent from listing ## Test plan - [x] Run `HF_TOKEN=... ./scripts/smoke_tests/run.sh` and confirm all 14 tests pass - [x] Run `./scripts/smoke_tests/run.sh --skip-buckets` for repo-only path - [x] Run with `--hf-xet-version <version>` to confirm PyPI cache bypass works 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-04 13:30:29 +08:00 · 2026-03-17 19:07:05 -07:00
parent 69c714c01d
commit c0f7980616
9 changed files with 740 additions and 164 deletions
--- a/README.md
+++ b/README.md
@@ -43,180 +43,34 @@ Please join us in making xet-core better. We value everyone's contributions. Cod

 ## Issues, Diagnostics & Debugging

-If you encounter an issue when using `hf-xet` please help us fix the issue by collecting diagnostic information and attaching that when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose). Download the [hf-xet-diag-linux.sh](hf-xet-diag-linux.sh), [hf-xet-diag-macos.sh](hf-xet-diag-macos.sh), or [hf-xet-diag-windows.sh](hf-xet-diag-windows.sh) script based on your operating system and then re-run the python command that resulted in the issue. The diagnostic scripts will download and install debug symbols, setup up logging, and take periodic stack traces throughout process execution in a diagnostics directory that is easy to analyze, package, and upload.
+If you encounter an issue with `hf-xet`, please collect diagnostic information
+and attach it when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose).

-### Diagnostics - Linux (`hf-xet-diag-linux.sh`)
+The [`scripts/diag/`](scripts/diag/) directory contains platform-specific scripts
+that download debug symbols, configure logging, and capture periodic stack traces
+and core dumps:

-* Uses `gdb` + `gcore` to periodically snapshot stacks and produce core dumps.
-* Supports optional ptrace preload helper for debugging.
-* Downloads and installs the appropriate `hf_xet-*.dbg` symbol file automatically.
-
-**Requirements:**
+| OS | Script |
+|----|--------|
+| Linux | [`scripts/diag/hf-xet-diag-linux.sh`](scripts/diag/hf-xet-diag-linux.sh) |
+| macOS | [`scripts/diag/hf-xet-diag-macos.sh`](scripts/diag/hf-xet-diag-macos.sh) |
+| Windows (Git-Bash) | [`scripts/diag/hf-xet-diag-windows.sh`](scripts/diag/hf-xet-diag-windows.sh) |

 ```bash
-sudo apt-get install gdb build-essential
+# prefix your failing command with the script for your OS, e.g.:
+./scripts/diag/hf-xet-diag-macos.sh -- python my-script.py
 ```

-**Example usage:**
+See [**scripts/diag/README.md**](scripts/diag/README.md) for full usage, output layout, dump analysis instructions, and how to install debug symbols manually.
+
+Quick debugging environment variables:

 ```bash
-./hf-xet-diag-linux.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
+RUST_BACKTRACE=full          # full Rust backtraces on panic
+RUST_LOG=info                # enable hf-xet logging
+HF_XET_LOG_FILE=/tmp/xet.log # write logs to a file (defaults to stdout)
 ```

-### Windows (Git-Bash) (`hf-xet-diag-windows.sh`)
-
-* Runs in **Git-Bash**, keeping usage consistent with Linux.
-* Uses **Sysinternals ProcDump** for periodic mini dumps (`-mp`).
-* Auto-downloads `procdump.exe` if not found.
-* Downloads and installs the matching `hf_xet.pdb` debug symbol into the package directory.
-
-**Requirements:**
-
-* Git-Bash (from [Git for Windows](https://gitforwindows.org/))
-* Python installed
-* Internet access (first run downloads ProcDump and debug symbols)
-
-**Example usage:**
-
-```bash
-./hf-xet-diag-windows.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
-```
-
-### Diagnostics - MacOS (`hf-xet-diag-macos.sh`)
-
-* Uses `sample` + `lldb` to periodically snapshot stacks and produce core dumps.
-* Downloads and installs the appropriate `hf_xet-*.dbg` symbol file automatically.
-
-**Requirements:**
-
-```bash
-sudo xcode-select --install
-```
-
-**Example usage:**
-
-```bash
-./hf-xet-diag-macos.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
-```
-
---
-
-### Output Layout
-
-The diagnostic scripts produce a diagnostics directory named:
-
-```
-diag_<command>_<timestamp>/
-  ├── console.log   # Combined stdout/stderr of the process
-  ├── env.log       # System/environment info
-  ├── pid           # Child PID file
-  ├── stacks/       # Periodic stack traces / dumps
-  └── dumps/        # (Linux only) full gcore dumps
-```
-
-This unified layout makes it easier to compare diagnostics across platforms.
-
---
-
-### Analyzing Dumps
-
-Use the [hf-xet-diag-analyze-latest.sh](hf-xet-diag-analyze-latest.sh) script to automatically find and open the most recent dump in the appropriate debugger for your platform.
-
-**Usage:**
-
-```bash
-./hf-xet-diag-analyze-latest.sh
-```
-
-* Auto-detects your OS (Linux, macOS, or Windows)
-* Finds the most recent `diag_*` directory
-* Opens the latest dump in the platform-appropriate debugger:
-  * **Linux:** `gdb` with core dumps from `dumps/`
-  * **macOS:** `lldb` with `.core` files from `dumps/`
-  * **Windows (Git-Bash):** `windbg` with `.dmp` files from `stacks/`
-
-You can also specify a diagnostics directory:
-
-```bash
-./hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
-```
-
-**Manual Analysis**
-
-If you prefer to analyze dumps manually:
-
-**Linux**
-* Stack traces: `stacks/*.txt` (plain text, captured periodically)
-* Core dumps: `dumps/core_*`
-* Analysis:
-  ```bash
-  gdb python dumps/core_<timestamp>.<pid>
-  (gdb) bt                    # backtrace of current thread
-  (gdb) thread apply all bt   # backtrace of all threads
-  (gdb) info threads          # list all threads
-  ```
-* Ensure debug symbols (`hf_xet-*.so.dbg`) are in the `hf_xet` package directory
-
-**macOS**
-* Stack traces: `stacks/*.txt` (from `sample` command)
-* Core dumps: `dumps/dump_<pid>_<timestamp>.core`
-* Analysis:
-  ```bash
-  lldb -c dumps/dump_<pid>_<timestamp>.core python3
-  (lldb) bt                    # backtrace of current thread
-  (lldb) thread backtrace all  # backtrace of all threads
-  (lldb) thread list           # list all threads
-  ```
-* Ensure debug symbols (`hf_xet-*.dylib.dSYM`) are in the `hf_xet` package directory
-
-**Windows**
-* Dumps: `stacks/dump_<timestamp>.dmp`
-* Install [WinDbg via Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/)
-* Analysis:
-  ```cmd
-  windbg -z stacks\dump_<timestamp>.dmp
-  ```
-* Common WinDbg commands:
-  ```
-  !analyze -v     # automatic analysis
-  ~* kb           # backtrace of all threads
-  ~               # list all threads
-  lm              # list loaded modules (verify hf_xet.pdb loaded)
-  ```
-* Ensure debug symbols (`hf_xet.pdb`) are in the `hf_xet` package directory
-
---
-
-⚠️ **Tip:** Share the full `diag_<command>_<timestamp>/` directory when reporting issues — it contains logs, environment info, and dumps needed to reproduce and diagnose problems.
-
-
-### Debugging
-
-To limit the size our our built binaries, we are releasing python wheels with binaries that are stripped of debugging symbols. If you encounter a panic while running hf-xet, you can use the debug symbols to help identify the part of the library that failed. 
-
-Here are the recommended steps:
-
-1. Download and unzip our [debug symbols package](https://github.com/huggingface/xet-core/releases/download/latest/dbg-symbols.zip).
-2. Determine the location of the hf-xet package using `pip show hf-xet`. The `Location` field will show the location of all the site packages. The `hf_xet` package will be within that directory.
-3. Determine the symbols to copy based on the system you are running:
-   * Windows: use `hf_xet.pdb`
-   * Mac: use `libhf_xet-macosx-x86_64.dylib.dSYM` for Intel based Macs and `libhf_xet-macosx-aarch64.dylib.dSYM` for Apple Silicon.
-   * Linux: the choice will depend on the architecture and wheel distribution used. To get this information, `cat` the `WHEEL` file name within the `hf_xet.dist-info` directory in your site packages. The wheel file will have the linux build and architecture in the file name. Eg: `cat /home/ubuntu/.venv/lib/python3.12/site-packages/hf_xet-*.dist-info/WHEEL`. You will use the file named `hf_xet-<manylinux | musllinux>-<x86_64 | arm64>.abi3.so.dbg` choosing the distribution and platform that matches your wheel. Eg: `hf_xet-manylinux-x86_64.abi3.so.dbg`.
-4. Copy the symbols to the site package path from step 2 above + `hf_xet`. Eg: `cp -r hf_xet-1.1.2-manylinux-x86_64.abi3.so.dbg /home/ubuntu/.venv/lib/python3.12/site-packages/hf_xet`
-5. Run your python binary with `RUST_BACKTRACE=full` and recreate your failure.
-
-#### Debugging Environment Variables
-
-To enable logging and see more debugging / diagnostics information, set the following:
-
-```
-RUST_BACKTRACE=full
-RUST_LOG=info
-HF_XET_LOG_FILE=/tmp/xet.log
-```
-
-Note: HF_XET_LOG_FILE expects a full writable path. If one isn't found it will use stdout console for logging.
-
 ## Local Development

 ### Repo Organization - Rust Crates
--- a/scripts/diag/README.md
+++ b/scripts/diag/README.md
@@ -0,0 +1,193 @@
+# hf-xet Diagnostic Scripts
+
+Scripts for collecting diagnostics when `hf-xet` hangs, crashes, or behaves
+unexpectedly. They download debug symbols, configure logging, and periodically
+capture stack traces / core dumps into a self-contained directory that is easy
+to zip and attach to a [GitHub issue](https://github.com/huggingface/xet-core/issues/new/choose).
+
+## Quick start
+
+Pick the script for your OS and prefix your failing command with it:
+
+| OS | Script |
+|----|--------|
+| Linux | `scripts/diag/hf-xet-diag-linux.sh` |
+| macOS | `scripts/diag/hf-xet-diag-macos.sh` |
+| Windows (Git-Bash) | `scripts/diag/hf-xet-diag-windows.sh` |
+
+```bash
+# Linux
+./scripts/diag/hf-xet-diag-linux.sh -- python my-script.py
+
+# macOS
+./scripts/diag/hf-xet-diag-macos.sh -- python my-script.py
+
+# Windows (Git-Bash)
+./scripts/diag/hf-xet-diag-windows.sh -- python my-script.py
+```
+
+## Per-platform details
+
+### Linux (`hf-xet-diag-linux.sh`)
+
+* Uses `gdb` + `gcore` to periodically snapshot stacks and produce core dumps.
+* Supports optional ptrace preload helper for debugging.
+* Downloads and installs the appropriate `hf_xet-*.dbg` symbol file automatically.
+
+**Requirements:**
+
+```bash
+sudo apt-get install gdb build-essential
+```
+
+**Example:**
+
+```bash
+./scripts/diag/hf-xet-diag-linux.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
+```
+
+### macOS (`hf-xet-diag-macos.sh`)
+
+* Uses `sample` + `lldb` to periodically snapshot stacks and produce core dumps.
+* Downloads and installs the appropriate `hf_xet-*.dbg` symbol file automatically.
+
+**Requirements:**
+
+```bash
+sudo xcode-select --install
+```
+
+**Example:**
+
+```bash
+./scripts/diag/hf-xet-diag-macos.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
+```
+
+### Windows / Git-Bash (`hf-xet-diag-windows.sh`)
+
+* Runs in **Git-Bash**, keeping usage consistent with Linux/macOS.
+* Uses **Sysinternals ProcDump** for periodic mini dumps (`-mp`).
+* Auto-downloads `procdump.exe` if not found.
+* Downloads and installs the matching `hf_xet.pdb` debug symbol into the package directory.
+
+**Requirements:**
+
+* Git-Bash (from [Git for Windows](https://gitforwindows.org/))
+* Python installed
+* Internet access (first run downloads ProcDump and debug symbols)
+
+**Example:**
+
+```bash
+./scripts/diag/hf-xet-diag-windows.sh -- python hf-download.py "Qwen/Qwen2.5-VL-3B-Instruct"
+```
+
+## Output layout
+
+Each run produces a directory named `diag_<command>_<timestamp>/`:
+
+```
+diag_<command>_<timestamp>/
+  ├── console.log   # Combined stdout/stderr of the process
+  ├── env.log       # System/environment info
+  ├── pid           # Child PID file
+  ├── stacks/       # Periodic stack traces / mini dumps
+  └── dumps/        # (Linux/macOS) full core dumps
+```
+
+> **Tip:** Zip and attach the entire `diag_<command>_<timestamp>/` directory
+> when filing an issue — it contains everything needed to reproduce and diagnose
+> the problem.
+
+## Analyzing dumps
+
+Use `hf-xet-diag-analyze-latest.sh` to automatically open the most recent dump
+in the appropriate debugger:
+
+```bash
+# Auto-detect latest diag_* directory
+./scripts/diag/hf-xet-diag-analyze-latest.sh
+
+# Or specify a directory explicitly
+./scripts/diag/hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
+```
+
+The script:
+* Auto-detects your OS (Linux, macOS, or Windows)
+* Finds the most recent `diag_*` directory
+* Opens the latest dump in the platform-appropriate debugger:
+  * **Linux:** `gdb` with core dumps from `dumps/`
+  * **macOS:** `lldb` with `.core` files from `dumps/`
+  * **Windows (Git-Bash):** `windbg` with `.dmp` files from `stacks/`
+
+### Manual analysis
+
+**Linux**
+
+```bash
+gdb python dumps/core_<timestamp>.<pid>
+(gdb) bt                    # backtrace of current thread
+(gdb) thread apply all bt   # backtrace of all threads
+(gdb) info threads          # list all threads
+```
+
+Debug symbols: `hf_xet-*.so.dbg` must be in the `hf_xet` package directory.
+
+**macOS**
+
+```bash
+lldb -c dumps/dump_<pid>_<timestamp>.core python3
+(lldb) bt                    # backtrace of current thread
+(lldb) thread backtrace all  # backtrace of all threads
+(lldb) thread list           # list all threads
+```
+
+Debug symbols: `hf_xet-*.dylib.dSYM` must be in the `hf_xet` package directory.
+
+**Windows**
+
+```cmd
+windbg -z stacks\dump_<timestamp>.dmp
+```
+
+Useful WinDbg commands:
+
+```
+!analyze -v     # automatic analysis
+~* kb           # backtrace of all threads
+~               # list all threads
+lm              # list loaded modules (verify hf_xet.pdb loaded)
+```
+
+Debug symbols: `hf_xet.pdb` must be in the `hf_xet` package directory.
+
+## Installing debug symbols manually
+
+The diagnostic scripts install symbols automatically, but you can also do it
+manually:
+
+1. Download and unzip the [debug symbols package](https://github.com/huggingface/xet-core/releases/download/latest/dbg-symbols.zip).
+2. Find the `hf_xet` package location: `pip show hf-xet` — look at the `Location` field.
+3. Choose the right symbol file for your platform:
+   * **Windows:** `hf_xet.pdb`
+   * **macOS (Apple Silicon):** `libhf_xet-macosx-aarch64.dylib.dSYM`
+   * **macOS (Intel):** `libhf_xet-macosx-x86_64.dylib.dSYM`
+   * **Linux:** match your wheel distribution and arch — check with:
+     ```bash
+     cat /path/to/site-packages/hf_xet-*.dist-info/WHEEL
+     ```
+     then use `hf_xet-<manylinux|musllinux>-<x86_64|arm64>.abi3.so.dbg`.
+4. Copy the symbol file into the `hf_xet` package directory:
+   ```bash
+   cp -r hf_xet-1.1.2-manylinux-x86_64.abi3.so.dbg \
+       /path/to/site-packages/hf_xet/
+   ```
+5. Re-run with `RUST_BACKTRACE=full` to get a full backtrace.
+
+## Useful environment variables
+
+```bash
+RUST_BACKTRACE=full          # full Rust backtraces on panic
+RUST_LOG=info                # enable hf-xet logging
+HF_XET_LOG_FILE=/tmp/xet.log # write logs to a file (defaults to stdout)
+```
--- a/scripts/diag/hf-xet-diag-analyze-latest.sh
+++ b/scripts/diag/hf-xet-diag-analyze-latest.sh
--- a/scripts/diag/hf-xet-diag-linux.sh
+++ b/scripts/diag/hf-xet-diag-linux.sh
--- a/scripts/diag/hf-xet-diag-macos.sh
+++ b/scripts/diag/hf-xet-diag-macos.sh
--- a/scripts/diag/hf-xet-diag-windows.sh
+++ b/scripts/diag/hf-xet-diag-windows.sh
--- a/scripts/smoke_tests/README.md
+++ b/scripts/smoke_tests/README.md
@@ -0,0 +1,63 @@
+# hf-xet Smoke Tests
+
+End-to-end tests that exercise the full hf-xet upload/download path against the
+real HuggingFace Hub. They use the `hf` CLI for all Hub operations and verify
+content integrity with SHA-256 checksums.
+
+## Prerequisites
+
+- [`uv`](https://docs.astral.sh/uv/)
+- [`hf` CLI](https://huggingface.co/docs/huggingface_hub/en/guides/cli) — `uv tool install huggingface_hub`
+- `HF_TOKEN` environment variable with write access
+
+## Usage
+
+```bash
+# Test latest hf_xet from PyPI
+./scripts/smoke_tests/run.sh
+
+# Test a specific version (bypasses uv cache, fetches directly from PyPI)
+./scripts/smoke_tests/run.sh --hf-xet-version 1.4.0
+
+# Test a local wheel
+HF_XET_WHEEL=./dist/hf_xet-1.4.0.whl ./scripts/smoke_tests/run.sh
+
+# Skip storage bucket tests
+./scripts/smoke_tests/run.sh --skip-buckets
+
+# Keep the test repo/bucket after the run (useful for debugging)
+./scripts/smoke_tests/run.sh --keep-repo
+```
+
+## What's tested
+
+### Repository tests (`hf upload` / `hf download`)
+
+Uploads test files of varying sizes (1 KB → 100 MB) to a temporary private
+model repo, then downloads and verifies every file's SHA-256 hash.
+
+| Test | Description |
+|------|-------------|
+| Upload single file | `hf upload` of a single file |
+| Upload folder | `hf upload` of an entire directory tree |
+| Download individual files | Per-file `hf download` + hash check |
+| Download all files | Full-repo `hf download` + hash check |
+| Overwrite and re-download | Confirms updated content is served after overwrite |
+| Delete file | `hf repos delete-files` + confirms file is absent |
+
+### Bucket tests (`hf buckets`)
+
+Creates a temporary private bucket and exercises the full bucket lifecycle.
+
+| Test | Description |
+|------|-------------|
+| cp upload | `hf buckets cp` single file upload |
+| sync upload | `hf buckets sync` directory upload |
+| list | Recursive listing confirms all expected paths |
+| cp download | `hf buckets cp` download + hash check |
+| sync download | `hf buckets sync` directory download + hash check |
+| Overwrite | cp overwrite + re-download confirms new content |
+| sync --delete | Extraneous remote files removed when absent locally |
+| rm | `hf buckets rm` + confirms file absent from listing |
+
+All temporary repos and buckets are deleted after the run unless `--keep-repo` is set.
--- a/scripts/smoke_tests/run.sh
+++ b/scripts/smoke_tests/run.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+set -euo pipefail
+
+# Smoke test runner for hf-xet upload/download via the hf CLI.
+#
+# Prerequisites:
+#   - uv        (https://docs.astral.sh/uv/)
+#   - hf CLI    (pip install huggingface_hub, or uv tool install huggingface_hub)
+#   - HF_TOKEN  env var with write access
+#
+# Usage:
+#   ./scripts/smoke_tests/run.sh                              # latest hf_xet from PyPI
+#   ./scripts/smoke_tests/run.sh --hf-xet-version 1.4.0      # specific version (bypasses uv cache)
+#   ./scripts/smoke_tests/run.sh --skip-buckets               # skip storage bucket tests
+#   ./scripts/smoke_tests/run.sh --keep-repo                  # leave test repo/bucket after run
+#   HF_XET_WHEEL=./dist/hf_xet-1.4.0.whl ./scripts/smoke_tests/run.sh  # local wheel
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+if [ -z "${HF_TOKEN:-}" ]; then
+    echo "ERROR: HF_TOKEN environment variable is required" >&2
+    echo "  export HF_TOKEN=hf_..." >&2
+    exit 1
+fi
+
+if ! command -v uv &> /dev/null; then
+    echo "ERROR: uv is required. Install: curl -LsSf https://astral.sh/uv/install.sh | sh" >&2
+    exit 1
+fi
+
+if ! command -v hf &> /dev/null; then
+    echo "ERROR: hf CLI is required. Install: uv tool install huggingface_hub" >&2
+    exit 1
+fi
+
+# Parse --hf-xet-version from args (if present)
+HF_XET_VERSION=""
+for arg in "$@"; do
+    if [[ "${prev_arg:-}" == "--hf-xet-version" ]]; then
+        HF_XET_VERSION="$arg"
+    fi
+    prev_arg="$arg"
+done
+
+echo "Running hf-xet smoke tests..."
+echo ""
+
+if [ -n "${HF_XET_WHEEL:-}" ]; then
+    echo "Using local wheel: ${HF_XET_WHEEL}"
+    uv run --with "${HF_XET_WHEEL}" "${SCRIPT_DIR}/test_upload_download.py" "$@"
+elif [ -n "${HF_XET_VERSION}" ]; then
+    echo "Using hf_xet version: ${HF_XET_VERSION} (fetching from PyPI)"
+    uv run --with "hf_xet==${HF_XET_VERSION}" --refresh-package hf_xet "${SCRIPT_DIR}/test_upload_download.py" "$@"
+else
+    uv run "${SCRIPT_DIR}/test_upload_download.py" "$@"
+fi
--- a/scripts/smoke_tests/test_upload_download.py
+++ b/scripts/smoke_tests/test_upload_download.py
@@ -0,0 +1,410 @@
+"""
+Smoke test for hf-xet using the `hf` CLI for upload/download through both
+HuggingFace model repositories and storage buckets.
+
+Creates temporary resources, exercises upload/download paths, verifies content
+integrity, then cleans up. Requires HF_TOKEN with write access.
+
+Usage:
+    uv run scripts/smoke_tests/test_upload_download.py
+    uv run scripts/smoke_tests/test_upload_download.py --hf-xet-version 1.4.0
+    uv run scripts/smoke_tests/test_upload_download.py --keep-repo
+    uv run scripts/smoke_tests/test_upload_download.py --skip-buckets
+"""
+
+# /// script
+# requires-python = ">=3.10"
+# dependencies = [
+#     "huggingface_hub>=1.0.0",
+#     "hf_xet",
+# ]
+# ///
+
+import argparse
+import hashlib
+import os
+import secrets
+import shutil
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def run(cmd: list[str], check: bool = True) -> str:
+    """Run a CLI command, return stdout. Raises RuntimeError on failure."""
+    result = subprocess.run(cmd, capture_output=True, text=True)
+    if check and result.returncode != 0:
+        raise RuntimeError(
+            f"Command failed: {' '.join(cmd)}\n"
+            f"stdout: {result.stdout.strip()}\n"
+            f"stderr: {result.stderr.strip()}"
+        )
+    return result.stdout.strip()
+
+
+def sha256_bytes(data: bytes) -> str:
+    return hashlib.sha256(data).hexdigest()
+
+
+def sha256_file(path: str | Path) -> str:
+    h = hashlib.sha256()
+    with open(path, "rb") as f:
+        for chunk in iter(lambda: f.read(65536), b""):
+            h.update(chunk)
+    return h.hexdigest()
+
+
+def generate_file(path: str | Path, size_bytes: int) -> str:
+    """Write random bytes to path; return sha256 hex."""
+    data = secrets.token_bytes(size_bytes)
+    Path(path).parent.mkdir(parents=True, exist_ok=True)
+    with open(path, "wb") as f:
+        f.write(data)
+    return sha256_bytes(data)
+
+
+# ---------------------------------------------------------------------------
+# Test runner
+# ---------------------------------------------------------------------------
+
+class Results:
+    def __init__(self):
+        self.passed = 0
+        self.failed = 0
+        self.errors: list[tuple[str, str]] = []
+
+    def run(self, name: str, fn):
+        print(f"\n{'='*60}")
+        print(f"TEST: {name}")
+        print(f"{'='*60}")
+        try:
+            fn()
+            self.passed += 1
+            print(f"PASSED: {name}")
+        except Exception as e:
+            self.failed += 1
+            self.errors.append((name, str(e)))
+            print(f"FAILED: {name}: {e}", file=sys.stderr)
+
+    def summary(self):
+        print(f"\n{'='*60}")
+        print(f"RESULTS: {self.passed} passed, {self.failed} failed")
+        print(f"{'='*60}")
+        if self.errors:
+            for name, err in self.errors:
+                print(f"  FAIL: {name}: {err}")
+            sys.exit(1)
+        else:
+            print("All smoke tests passed!")
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(description="Smoke test hf-xet via hf CLI")
+    parser.add_argument("--hf-xet-version", help="Expected hf_xet version (display/warn only)")
+    parser.add_argument("--keep-repo", action="store_true", help="Skip cleanup of test repo/bucket")
+    parser.add_argument("--repo-prefix", default="smoke-test-xet", help="Prefix for temp resource names")
+    parser.add_argument("--skip-buckets", action="store_true", help="Skip storage bucket tests")
+    args = parser.parse_args()
+
+    # --- preflight checks ---
+    token = os.environ.get("HF_TOKEN")
+    if not token:
+        print("ERROR: HF_TOKEN environment variable is required", file=sys.stderr)
+        sys.exit(1)
+
+    if not shutil.which("hf"):
+        print("ERROR: hf CLI not found. Install: pip install huggingface_hub", file=sys.stderr)
+        sys.exit(1)
+
+    # --- print environment ---
+    import huggingface_hub
+    print(f"huggingface_hub version: {huggingface_hub.__version__}")
+    try:
+        from importlib.metadata import version as pkg_version
+        installed_xet = pkg_version("hf_xet")
+        print(f"hf_xet version: {installed_xet}")
+        if args.hf_xet_version and installed_xet != args.hf_xet_version:
+            print(f"WARNING: hf_xet version mismatch: got {installed_xet}, expected {args.hf_xet_version}")
+    except Exception:
+        print("hf_xet version: unknown")
+    print(f"hf CLI: {run(['hf', 'version'])}")
+
+    # --- resolve username ---
+    from huggingface_hub import HfApi
+    user = HfApi(token=token).whoami()["name"]
+
+    suffix = secrets.token_hex(4)
+    repo_id = f"{user}/{args.repo_prefix}-{suffix}"
+    bucket_id = f"{user}/{args.repo_prefix}-bucket-{suffix}"
+
+    print(f"\nTest repo:   {repo_id}")
+    if not args.skip_buckets:
+        print(f"Test bucket: {bucket_id}")
+
+    results = Results()
+
+    # ===================================================================== #
+    # Repository tests  (hf upload / hf download)
+    # ===================================================================== #
+
+    repo_test_files = {
+        "small.bin":         1024,            # 1 KB   — below chunk size
+        "medium.bin":        256 * 1024,      # 256 KB — a few chunks
+        "large.bin":         5 * 1024 * 1024, # 5 MB   — multiple chunks
+        "xlarge.bin":        50 * 1024 * 1024,# 50 MB  — large multi-xorb
+        "xxlarge.bin":       100 * 1024 * 1024,# 100 MB — stress test
+        "subdir/nested.bin": 128 * 1024,      # 128 KB — subdirectory
+    }
+
+    repo_created = False
+    try:
+        print(f"\nCreating repo {repo_id}...")
+        run(["hf", "repos", "create", repo_id, "--private"])
+        repo_created = True
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            upload_dir = Path(tmpdir) / "upload"
+            download_dir = Path(tmpdir) / "download"
+            upload_dir.mkdir()
+            download_dir.mkdir()
+
+            expected = {}
+            for rel, size in repo_test_files.items():
+                expected[rel] = generate_file(upload_dir / rel, size)
+                print(f"  Generated {rel} ({size:,} bytes)")
+
+            # -- 1. upload single file --
+            def test_repo_upload_single():
+                t = time.time()
+                run(["hf", "upload", repo_id,
+                     str(upload_dir / "small.bin"), "small.bin", "--quiet"])
+                print(f"  Uploaded small.bin in {time.time()-t:.1f}s")
+            results.run("Repo: upload single file", test_repo_upload_single)
+
+            # -- 2. upload folder --
+            def test_repo_upload_folder():
+                t = time.time()
+                run(["hf", "upload", repo_id, str(upload_dir), ".", "--quiet"])
+                print(f"  Uploaded folder in {time.time()-t:.1f}s")
+            results.run("Repo: upload folder", test_repo_upload_folder)
+
+            # -- 3. download individual files and verify --
+            def test_repo_download_single():
+                out = str(download_dir / "single")
+                for rel in repo_test_files:
+                    t = time.time()
+                    run(["hf", "download", repo_id, rel, "--local-dir", out, "--quiet"])
+                    actual = sha256_file(Path(out) / rel)
+                    assert actual == expected[rel], (
+                        f"Hash mismatch for {rel}: "
+                        f"expected {expected[rel][:16]}..., got {actual[:16]}..."
+                    )
+                    print(f"  Downloaded+verified {rel} in {time.time()-t:.1f}s")
+            results.run("Repo: download and verify individual files", test_repo_download_single)
+
+            # -- 4. download entire repo and verify --
+            def test_repo_download_all():
+                out = str(download_dir / "all")
+                t = time.time()
+                run(["hf", "download", repo_id, "--local-dir", out, "--quiet"])
+                print(f"  Downloaded all files in {time.time()-t:.1f}s")
+                for rel in repo_test_files:
+                    p = Path(out) / rel
+                    assert p.exists(), f"Missing file: {rel}"
+                    actual = sha256_file(p)
+                    assert actual == expected[rel], (
+                        f"Hash mismatch for {rel}: "
+                        f"expected {expected[rel][:16]}..., got {actual[:16]}..."
+                    )
+                    print(f"  Verified {rel}")
+            results.run("Repo: download all files and verify", test_repo_download_all)
+
+            # -- 5. overwrite a file and verify new content --
+            def test_repo_overwrite():
+                new_hash = generate_file(upload_dir / "small.bin", 2048)
+                run(["hf", "upload", repo_id,
+                     str(upload_dir / "small.bin"), "small.bin", "--quiet"])
+                out = str(download_dir / "overwrite")
+                run(["hf", "download", repo_id, "small.bin",
+                     "--local-dir", out, "--force-download", "--quiet"])
+                actual = sha256_file(Path(out) / "small.bin")
+                assert actual == new_hash, (
+                    f"Overwrite mismatch: expected {new_hash[:16]}..., got {actual[:16]}..."
+                )
+                print("  Overwrite verified: new content downloaded correctly")
+            results.run("Repo: upload overwrite and verify", test_repo_overwrite)
+
+            # -- 6. delete files from repo --
+            def test_repo_delete_files():
+                run(["hf", "repos", "delete-files", repo_id, "small.bin"])
+                # Re-download all; small.bin should be absent
+                out = str(download_dir / "post-delete")
+                run(["hf", "download", repo_id, "--local-dir", out, "--quiet"])
+                assert not (Path(out) / "small.bin").exists(), \
+                    "small.bin still present after deletion"
+                print("  small.bin confirmed absent after delete")
+            results.run("Repo: delete file from repo", test_repo_delete_files)
+
+    finally:
+        if repo_created and not args.keep_repo:
+            print(f"\nCleaning up repo {repo_id}...")
+            try:
+                run(["hf", "repos", "delete", repo_id])
+                print("  Deleted.")
+            except Exception as e:
+                print(f"  Warning: failed to delete repo: {e}", file=sys.stderr)
+
+    # ===================================================================== #
+    # Storage bucket tests  (hf buckets)
+    # ===================================================================== #
+
+    if args.skip_buckets:
+        results.summary()
+        return
+
+    # Check that hf buckets is available
+    bucket_check = run(["hf", "buckets", "--help"], check=False)
+    if "buckets" not in bucket_check.lower() and "error" in bucket_check.lower():
+        print("\nWARNING: hf buckets not available in this hf CLI version — skipping bucket tests")
+        results.summary()
+        return
+
+    bucket_created = False
+    try:
+        print(f"\nCreating bucket {bucket_id}...")
+        run(["hf", "buckets", "create", bucket_id, "--private"])
+        bucket_created = True
+        handle = f"hf://buckets/{bucket_id}"
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            upload_dir = Path(tmpdir) / "upload"
+            download_dir = Path(tmpdir) / "download"
+            (upload_dir / "subdir").mkdir(parents=True)
+            download_dir.mkdir()
+
+            # Files used in bucket tests
+            single_hash   = generate_file(upload_dir / "single.bin",        512 * 1024)
+            subdir1_hash  = generate_file(upload_dir / "subdir/file1.bin",   256 * 1024)
+            subdir2_hash  = generate_file(upload_dir / "subdir/file2.bin",   256 * 1024)
+            print(f"  Generated single.bin, subdir/file1.bin, subdir/file2.bin")
+
+            # -- 1. cp: upload single file --
+            def test_bucket_cp_upload():
+                t = time.time()
+                run(["hf", "buckets", "cp",
+                     str(upload_dir / "single.bin"), f"{handle}/single.bin"])
+                print(f"  Uploaded single.bin via cp in {time.time()-t:.1f}s")
+            results.run("Bucket: cp upload single file", test_bucket_cp_upload)
+
+            # -- 2. sync: upload directory --
+            def test_bucket_sync_upload():
+                t = time.time()
+                run(["hf", "buckets", "sync",
+                     str(upload_dir / "subdir"), f"{handle}/subdir"])
+                print(f"  Synced subdir/ up in {time.time()-t:.1f}s")
+            results.run("Bucket: sync upload directory", test_bucket_sync_upload)
+
+            # -- 3. list files (recursive quiet) --
+            def test_bucket_list():
+                out = run(["hf", "buckets", "list", bucket_id, "-R", "--quiet"])
+                listed = set(out.splitlines())
+                for path in ("single.bin", "subdir/file1.bin", "subdir/file2.bin"):
+                    assert path in listed, f"Expected {path!r} in listing, got: {listed}"
+                print(f"  Listed {len(listed)} file(s): {sorted(listed)}")
+            results.run("Bucket: list files (recursive)", test_bucket_list)
+
+            # -- 4. cp: download single file and verify --
+            def test_bucket_cp_download():
+                out_path = download_dir / "single.bin"
+                t = time.time()
+                run(["hf", "buckets", "cp", f"{handle}/single.bin", str(out_path)])
+                actual = sha256_file(out_path)
+                assert actual == single_hash, (
+                    f"Hash mismatch: expected {single_hash[:16]}..., got {actual[:16]}..."
+                )
+                print(f"  Downloaded+verified single.bin in {time.time()-t:.1f}s")
+            results.run("Bucket: cp download and verify", test_bucket_cp_download)
+
+            # -- 5. sync: download directory and verify --
+            def test_bucket_sync_download():
+                out_dir = download_dir / "subdir"
+                t = time.time()
+                run(["hf", "buckets", "sync", f"{handle}/subdir", str(out_dir)])
+                print(f"  Synced subdir/ down in {time.time()-t:.1f}s")
+                for fname, expected_hash in (
+                    ("file1.bin", subdir1_hash),
+                    ("file2.bin", subdir2_hash),
+                ):
+                    p = out_dir / fname
+                    assert p.exists(), f"Missing: {p}"
+                    actual = sha256_file(p)
+                    assert actual == expected_hash, (
+                        f"Hash mismatch for {fname}: "
+                        f"expected {expected_hash[:16]}..., got {actual[:16]}..."
+                    )
+                    print(f"  Verified subdir/{fname}")
+            results.run("Bucket: sync download and verify", test_bucket_sync_download)
+
+            # -- 6. overwrite via cp and verify new content --
+            def test_bucket_overwrite():
+                new_hash = generate_file(upload_dir / "single.bin", 1024 * 1024)
+                run(["hf", "buckets", "cp",
+                     str(upload_dir / "single.bin"), f"{handle}/single.bin"])
+                out_path = download_dir / "single_overwrite.bin"
+                run(["hf", "buckets", "cp", f"{handle}/single.bin", str(out_path)])
+                actual = sha256_file(out_path)
+                assert actual == new_hash, (
+                    f"Overwrite mismatch: expected {new_hash[:16]}..., got {actual[:16]}..."
+                )
+                print("  Overwrite verified: new content downloaded correctly")
+            results.run("Bucket: cp overwrite and verify", test_bucket_overwrite)
+
+            # -- 7. sync --delete: remove files absent from source --
+            def test_bucket_sync_delete():
+                # Local subdir now only has file1.bin; sync --delete should remove file2.bin
+                (upload_dir / "subdir" / "file2.bin").unlink()
+                run(["hf", "buckets", "sync",
+                     str(upload_dir / "subdir"), f"{handle}/subdir", "--delete"])
+                out = run(["hf", "buckets", "list", bucket_id, "-R", "--quiet"])
+                listed = set(out.splitlines())
+                assert "subdir/file2.bin" not in listed, \
+                    f"subdir/file2.bin still present after sync --delete: {listed}"
+                assert "subdir/file1.bin" in listed, \
+                    f"subdir/file1.bin missing after sync --delete: {listed}"
+                print(f"  sync --delete verified: remaining files: {sorted(listed)}")
+            results.run("Bucket: sync --delete removes extraneous files", test_bucket_sync_delete)
+
+            # -- 8. rm: delete a file and confirm it's gone --
+            def test_bucket_rm():
+                run(["hf", "buckets", "rm", f"{bucket_id}/single.bin", "--yes"])
+                out = run(["hf", "buckets", "list", bucket_id, "-R", "--quiet"])
+                listed = set(out.splitlines())
+                assert "single.bin" not in listed, \
+                    f"single.bin still present after rm: {listed}"
+                print(f"  rm verified: remaining files: {sorted(listed)}")
+            results.run("Bucket: rm file", test_bucket_rm)
+
+    finally:
+        if bucket_created and not args.keep_repo:
+            print(f"\nCleaning up bucket {bucket_id}...")
+            try:
+                run(["hf", "buckets", "delete", bucket_id, "--yes"])
+                print("  Deleted.")
+            except Exception as e:
+                print(f"  Warning: failed to delete bucket: {e}", file=sys.stderr)
+
+    results.summary()
+
+
+if __name__ == "__main__":
+    main()