Commit Graph

501 Commits

Author SHA1 Message Date
Hoyt Koepke
79df99ad01 Unify sync and async download/upload groups in session interface. (#719)
Currently, `UploadCommitSync` and `DownloadGroupSync` are thin wrappers
around `UploadCommit` and `DownloadGroup` that delegate every method
through `external_run_async_task`. This means two types, two sets of doc
comments, and two test suites covering the same underlying behavior.

This PR removes the separate sync types and adds `_blocking` suffixed
methods directly on `UploadCommit` and `DownloadGroup`. The session
factory methods `new_upload_commit_blocking()` and
`new_download_group_blocking()` now return the same types as their async
counterparts, and the entire `xet_session::sync` module is deleted (~680
lines removed).

This also fixes a minor bug: `UploadCommitSync::upload_from_path` did
not call `std::path::absolute()` on the file path before dispatching,
unlike the async version. The new `upload_from_path_blocking` includes
the `std::path::absolute()` call, matching the async version's behavior.
2026-03-16 13:33:46 -07:00
Hoyt Koepke
71f8570a0e Optimize config struct for direct access in python (#706)
This PR adds in a feature flag, "python" to the xet_runtime package such
that when compiled, the XetConfig struct is built to have python getters
and setters. This integrates the handling of the config struct directly
into the XetConfig struct and the macros used to register the config
values, making the handling of values in the python bindings seamless.
2026-03-16 12:23:43 -07:00
Di Xiao
b7cd43c8cb Add Sha256Policy to XetSession APIs (#701)
Stacking on top of https://github.com/huggingface/xet-core/pull/694,
this updates both the async and sync APIs: update
`XetSession::UploadCommit` and `XetSession::UploadCommitSync` pub APIs
to explicitly take a `sha256: Sha256Policy` to control whether to
compute and embed a sha256 for that file in the upload info.

Resolves XET-898

---------

Co-authored-by: Hoyt Koepke <hoytak@huggingface.co>
2026-03-16 11:03:23 -07:00
Adrien
820f2657c5 fix: bound file reconstruction range using file_size to prevent 416 errors (#716) 2026-03-16 07:07:03 +01:00
Brian Ronan
6232b42591 Xorb download URL debug logs (#714)
It's a bit annoying to try to ensure our CDN routing is correct. Logging
the URL domain for the first fetch term download to the debug logs.

Please don't hesitate to recommend alternative approaches.
2026-03-13 16:05:30 -07:00
Di Xiao
e701aeddac Support XetSession in async context (#694)
`XetSession` always created its own tokio runtime via
`XetRuntime::new_with_config`, and calling `external_run_async_task`
panics when already inside a tokio context. This blocked embedding the
session in async Rust frameworks.

Core strategy:

 - `RuntimeMode` enum — 
`Owned` (session created its own thread pool via
`XetSessionBuilder::build` or `XetSessionBuilder::build_async` when
outside tokio context. Both `_blocking` and async methods are supported.
Async methods use an internal `bridge_to_owned` bridge that routes
futures onto the owned thread pool, so they work from any executor
(tokio, smol, async-std))
vs
`External` (session wraps a caller-supplied tokio handle via
`XetSessionBuilder::with_tokio_handle` or
`XetSessionBuilder::build_async` when inside qualified tokio context.
Only async methods may be called; `_blocking` methods return
`SessionError::WrongRuntimeMode`. No second thread pool is created).
- `XetRuntime::bridge_to_owned` — a new bridge that routes a future onto
the owned tokio thread pool from any executor (smol, async-std,
futures::executor, non-qualified tokio runtime) by delivering the result
via a `tokio::sync::oneshot` channel that can be polled by any async
executor.
- Async public API — `UploadCommit` and `DownloadGroup` methods
(`upload_from_path`, `upload_bytes`, `upload_file`, `commit`, `finish`)
are now async fn. Factory methods `XetSession::new_upload_commit` and
`new_download_group` are async.
Example:
```
let session = XetSessionBuilder::new().build_async().await?;
// Upload
 let commit = session.new_upload_commit().await?;
 let handle = commit.upload_from_path("file.bin".into()).await?;
 let results = commit.commit().await?;

 // Download
 let group = session.new_download_group().await?;
 let info = XetFileInfo {
     hash: ...,
     file_size: ...,
 };
 let dl_handle = group.download_file_to_path(info, "out/file.bin".into())?;
 let finish_results = group.finish().await?;
```

- Sync wrappers — New `UploadCommitSync` / `DownloadGroupSync` in
`xet_session/sync/` expose a fully blocking API for sync Rust and Python
(PyO3) callers. Returned by `new_upload_commit_blocking()` and
`new_download_group_blocking()`.
Example:
```
let session = XetSessionBuilder::new().build()?;
// Upload
let commit = session.new_upload_commit_blocking()?;
 let handle = commit.upload_from_path("file.bin".into())?;
 let results = commit.commit()?;
 let m = results.values().next().unwrap().as_ref().as_ref().unwrap();

// Download
 let group = session.new_download_group_blocking()?;
 let info = XetFileInfo {
     hash: ...,
     file_size: ...,
 };
 let dl_handle = group.download_file_to_path(info, "out/file.bin".into())?;
 let finish_results = group.finish()?;
```



Additional fixes: `download_file_to_path` and `upload_from_path` now
canonicalize paths with `std::path::absolute` before enqueuing; task
status is only overwritten when still `Running`, preventing a race with
concurrent abort().

Fix XET-891

---------

Co-authored-by: Hoyt Koepke <hoytak@huggingface.co>
2026-03-13 14:57:20 -07:00
Hoyt Koepke
3390bdc716 Adjust RTT prediction determining concurrency by transmission size. (#708)
Currently, the condition for increasing connection concurrency is gated
on the model predicting that a 64MB transmission will complete within 90
seconds. However, when the transmissions are primarily composed of small
packets, this can drastically overestimate the round trip, artificially
suppressing the connection concurrency.

This PR fixes this issue by also modeling the average predicted packet
size, using the 95% quantile of that (bounded by two config variables)
to predict the round trip time when considering a concurrency increase.
2026-03-13 10:47:45 -07:00
Adrien
bcce76be63 chore: version bump to 1.4.2 (#712)
## Summary

- Bump hf_xet version from 1.4.1 to 1.4.2 in Cargo.toml and Cargo.lock
- Follows up on 1.4.1 release where the version bump PR was merged after
the release artifacts were built
v1.4.2
2026-03-13 07:46:46 +01:00
Rajat Arya
2589bf05bc version bump to 1.4.1 (#707) 2026-03-12 17:35:33 -07:00
Adrien
0fb930c8d0 feat: expose skip_sha256 parameter in Python upload API (#705)
## Summary

Add `skip_sha256` and `sha256s` parameters to `upload_bytes()` Python
binding for per-file SHA-256 policies:
- `skip_sha256: bool = False` - Skip SHA-256 computation entirely (sets
`Sha256Policy::Skip`)
- `sha256s: Optional[List[str]] = None` - Provide pre-computed SHA-256
hashes (companion to existing parameter on `upload_files()`)
- These parameters are mutually exclusive

## Changes

**Python binding changes:**
- Add `skip_sha256` + `sha256s` params to `upload_bytes()` /
`upload_files()`
- All policy conversion happens at Python boundary

**Internal refactoring:**
- Add `Clone`/`Copy` derives + `from_skip()`/`from_hex()` helpers to
`Sha256Policy`
- Update `upload_bytes_async`, `upload_async`, `clean_file` to use
`Vec<Sha256Policy>`
- Update all internal callers across `git_xet`, `xet_pkg`, migration
tool, tests

## Motivation

`huggingface_hub` already knows whether SHA-256 is required. This change
enables skipping expensive computation when unnecessary, or passing
pre-computed hashes for bulk operations.

Companion to #678.

---------

Co-authored-by: Wauplin <lucainp@gmail.com>
v1.4.1
2026-03-12 18:17:12 +01:00
Di Xiao
cacd713218 Rework the interface for session task to get result from registered upload (#690)
This PR updates the interface for retrieving per-task results after
UploadCommit::commit() or DownloadGroup::finish(). The problem with the
previous interface is that commit() and finish() return a vector of
FileMetadata or DownloadResult, making it difficult for users to
associate each result with a specific task.

The new interface uses `task_id` as a strong binding bridge:

## Upload per-task result access patterns
After commit() completes, there are two equivalent ways to retrieve a
per-task FileMetadata result:

1. Lookup in the global result map:
```
let commit = session.new_upload_commit()?;
let handle = commit.upload_from_path(src)?;
let results = commit.commit()?;
let result = results.get(&handle.task_id)
```

2. Direct access from the handle:
```
let commit = session.new_upload_commit()?;
let handle = commit.upload_from_path(src)?;
commit.commit()?;
// handle.result() is populated by commit() via the shared Arc.
let result = handle.result()
```

## Download per-task result access patterns
The pattern is similar to the above.

## Why not put results in a vector in the same order as tasks are
registered to the commit instance?
After a commit instance is created, it can be cloned (since it is itself
an Arc wrapping an internal struct) and sent to different threads. When
multiple threads are registering tasks, there is no static registration
order that a program can observe upfront.
2026-03-11 16:21:27 -07:00
Hoyt Koepke
6061debc75 Record API changes in api_changes/updates_<date>_<description>.md (#689)
This PR creates a folder, api_changes, in which AI agents can record
updates to the API surface that could affect downstream PRs and
dependencies. This can be scanned by AI agents to reliably perform
merges or to propagate changes. See api_changes/README.md for a
description of how this should work.
2026-03-11 12:31:48 -07:00
Hoyt Koepke
45d38a13a9 Code reorganization towards release of xet cargo package (#693)
This PR is a massive rearrangement of the code base into 5 packages
intended for release on cargo. The directories and corresponding
packages are:

1. xet_runtime/ — compiles into the xet-runtime package. Contains the
runtime, config, and logging management.
2. xet_core_structures/ — compiles into the xet-core-structures package.
Contains core data structures for hashing, shards, and xorbs as well as
internal data structures that depend on these.
3. xet_client/ — compiles into the xet-client package, contains client
code for remotely connecting to the Hugging Face servers.
4. xet_data/ — compiles into the xet-data package, contains the data
processing pipeline: chunking/deduplication, file reconstruction,
clean/smudge operations, and progress tracking.
5. xet_pkg/ — compiles into the hf-xet package, provides the top-level
session-based API for file upload and download with user-facing error
categorization. This is the primary package downstream dependencies
would use. This also contains a single summary error type, XetError,
that translates cleanly into python error types.

In addition, the other tools are: 

- git_xet/ — the git_xet CLI binary crate (location preserved). 
- hf_xet/ -- the hf_xet python package (location preserved).
- simulation/ — the simulation crate for upload scenario benchmarking.
- wasm/ -- the wasm objects. 

The full description — and information for an AI agent to use to update
downstream dependencies — is at
api_changes/update_260309_package_restructure.md.

Summary of moves:

- xet_runtime: became xet_runtime::core inside xet_runtime/.
- utils: became xet_runtime::utils inside xet_runtime/.
- xet_config: became xet_runtime::config inside xet_runtime/.
- xet_logging: became xet_runtime::logging inside xet_runtime/.
- error_printer: became xet_runtime::error_printer inside xet_runtime/.
- file_utils: became xet_runtime::file_utils inside xet_runtime/.
- merklehash: became xet_core_structures::merklehash inside
xet_core_structures/.
- mdb_shard: became xet_core_structures::metadata_shard inside
xet_core_structures/.
- xorb_object: became xet_core_structures::xorb_object inside
xet_core_structures/.
- cas_client: became xet_client::cas_client inside xet_client/.
- hub_client: became xet_client::hub_client inside xet_client/.
- cas_types: became xet_client::cas_types inside xet_client/.
- chunk_cache: became xet_client::chunk_cache inside xet_client/.
- data: became xet_data::processing inside xet_data/.
- deduplication: became xet_data::deduplication inside xet_data/.
- file_reconstruction: became xet_data::file_reconstruction inside
xet_data/.
- progress_tracking: became xet_data::progress_tracking inside
xet_data/.
- xet_session: became xet::xet_session inside xet_pkg/.

- Wasm packages (hf_xet_wasm, hf_xet_thin_wasm): moved from top-level
into wasm/; internal imports updated, public APIs unchanged.
2026-03-11 12:02:38 -07:00
Rajat Arya
02da1d233b version bump to 1.4.0 (#699) v1.4.0 2026-03-11 10:48:23 -07:00
Rajat Arya
83a28271ea fix: no timeout for shard uploads (XET-885) (#685)
Fixes
[XET-885](https://linear.app/xet/issue/XET-885/investigate-unsloth-upload-failure-shard-upload-timeout-on-cas)

## Summary

Shard uploads to CAS can take a long time due to server-side processing
(DynamoDB writes scale with file entry count). The default
`read_timeout(120s)` on the reqwest client kills these uploads.

**Key insight:** reqwest's per-request `RequestBuilder::timeout()` does
NOT override the client-level `read_timeout()` — they are independent
mechanisms polled as separate futures. So the original approach of using
per-request timeouts was ineffective.

**Fix:** Create a dedicated `shard_upload_http_client` on `RemoteClient`
with **no `read_timeout`**, built once at construction time and reused
for all shard uploads. All other settings (connect timeout, pool config,
auth middleware) are identical to the standard client.

## Changes

### `cas_client/src/http_client.rs`
- Added `reqwest_client_no_read_timeout()` — creates a reqwest client
with no `read_timeout`
- Added `build_auth_http_client_no_read_timeout()` — public API wrapping
it with middleware
- 4 unit tests for the new builder

### `cas_client/src/remote_client.rs`
- Added `shard_upload_http_client` field to `RemoteClient` (cfg'd out on
wasm)
- `upload_shard()` uses the pre-built no-timeout client instead of
building one per request

### `cas_client/tests/test_shard_upload_timeout.rs`
- Updated: slow server test now asserts **success** (shard uploads
should wait as long as needed)

### `xet_config/src/groups/client.rs`
- Removed `shard_read_timeout` config field (no longer needed)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-11 09:05:40 -07:00
Adrien
9ba5fb3e5b fix: prevent download stall on large file reconstruction (#698)
## Summary

Fixes download stalls/deadlocks on large file reconstruction (reported
on 48.5 GB GGUF files). The root cause is a circular dependency: the
main reconstruction loop holds a buffer semaphore permit while blocking
on CAS connection permit acquisition, and xorb write locks held during
HTTP downloads cause CAS permit starvation.

### Changes

1. **Single-flight xorb downloads via `OnceCell`** (`xorb_block.rs`):
replaces `RwLock<Option<...>>` with `tokio::sync::OnceCell`. Only one
task per xorb block acquires a CAS permit and downloads the data;
concurrent callers wait on the same result without acquiring permits or
duplicating work. This eliminates duplicate downloads, prevents
double-counted transfer progress, and avoids a failing duplicate from
killing the reconstruction.

2. **Decouple CAS permit from buffer permit** (`file_term.rs`): the main
loop no longer blocks on CAS permits while holding a buffer permit. The
spawned download task delegates to `retrieve_data` which handles permit
acquisition internally via the OnceCell single-flight. This breaks the
circular dependency that causes stalls.

3. **Improve error propagation** (`sequential_writer.rs`): when the
background writer channel closes, check `RunState` for the original
error before returning a generic "channel closed" message.

### Root cause

The reconstruction pipeline has three resource pools: buffer permits
(bounded semaphore), CAS download permits (64 concurrent), and per-xorb
write locks.

Before this fix, the main loop would:
1. Acquire a **buffer permit** (blocking if buffer full)
2. Call `get_data_task()` which acquires a **CAS permit** (blocking if
pool exhausted)
3. Inside `retrieve_data()`, hold a **write lock** during the entire
HTTP download

This creates two deadlock vectors:
- **Buffer vs CAS**: buffer fills up with terms waiting for CAS permits,
but CAS permits are held by tasks blocked behind xorb write locks, and
the writer can't drain the buffer because it's waiting for those tasks
- **CAS vs write lock**: multiple tasks sharing the same xorb each hold
a CAS permit while blocked on the write lock, starving other xorbs of
permits

## Reproduction

Reliably reproducible with small buffer:
```
HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_SIZE=64mb \
HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_LIMIT=64mb \
python3 -c "from huggingface_hub import hf_hub_download; hf_hub_download('unsloth/Qwen3-Coder-Next-GGUF', 'Qwen3-Coder-Next-Q4_K_M.gguf', local_dir='/tmp/test')"
```

- **Before fix**: stalls at ~3.4 GB, no progression (deadlock)
- **After fix**: continuous progression, completes successfully

With default buffer (2 GB), the stall is intermittent depending on
network speed (consistently reproduced on slower connections).
2026-03-11 08:37:42 -07:00
Adrien
a48f1f80e4 feat: add skip_sha256 option to SingleFileCleaner (#679)
## Summary

- Add `ShaGenerator::Skip` variant that skips SHA-256 computation
entirely
- `ShaGenerator::finalize()` now returns `Option<Sha256>` (None when
skipped)
- `SingleFileCleaner::new()` and `FileUploadSession::start_clean()`
accept a `skip_sha256` boolean
- When skipped, no `FileMetadataExt` is included in the shard

## Context

Bucket uploads don't need SHA-256 in the shard metadata — the
`sha_index` GSI is only used for LFS pointer resolution, which doesn't
apply to buckets. Skipping SHA-256 for bucket uploads removes the main
CPU bottleneck in the upload pipeline on non-SHA-NI instances.

## Alternative: dummy SHA-256

Instead of skipping entirely, the client could send a zeroed/dummy
`FileMetadataExt`. The server would still store it but queries would
never match. This avoids the server-side schema change (xetcas PR) but
pollutes the GSI with dummy entries.

Companion PRs:
- xetcas: huggingface-internal/xetcas#498 (make `FileIdItem.sha256`
optional server-side)
2026-03-10 17:36:09 +01:00
Hoyt Koepke
6a5535bc46 Rework simulation pipeline for adaptive concurrency and connection resiliency. (#648)
This PR replaces the previous collection of scripts around setting up
docker containers with a much more nimble and lightweight set of rust
scripts and a simple, reusable proxy that can limit bandwidth and
congestion simulations. The previous scripts are rewritten to be more
nimble and use more reusable components.

New tools: 
- cas_client/src/simulation/network_simulation: A lightweight,
in-process network congestion simulation proxy that lives between the
LocalServer instance and the RemoteClient instance, allowing simulation
tests to run on a network with realistic congestion conditions and a
gated bandwidth. This can be controlled dynamically through a
LocalTestServer instance.
- simulation/: A new package for collecting simulation scripts and
analyzing the results.

To run the new simulation scripts for the adaptive concurrency on
upload, compile in release mode and run one of the scripts in
`simulation/src/adaptive_concurrency/scripts/`. Docker is no longer
needed to run any of the simulations.

The old `cas_client/tests/adaptive_concurrency/` paths were removed.
2026-03-09 10:49:36 -07:00
Hoyt Koepke
ebd780d26d Simulation interface for LocalTestServer: supports deletion, direct access, data dumps, etc. (#681)
This PR adds interface functions to the LocalServer class that will
allow it to become a full simulation environment for testing all the
garbage collection stages.
2026-03-05 12:00:52 -08:00
Hoyt Koepke
70807bf012 Fix for incorrect error propagation on truncated download stream. (#683)
Currently, the async stream logic silently swallows an UnexpectedEOF,
treating it the same as an EOF. This is a bug; this PR fixes it to
propagate UnexpectedEOF while handling correct EOF as the end of the
stream.
2026-03-04 17:00:08 -08:00
Hoyt Koepke
e6e0413d90 Naming clarification: A Xorb is a data object, CAS is the remote server. (#680)
This PR makes the use of the `cas` and `xorb` terms consistent.
Previously, "cas" (for content addressed store) could simultaneously
refer to either the remote server or the data bytes stored as a
collection of chunks. After the renames in this PR, we consistently use
`xorb` to refer to the data object and cas to refer to the remote
server.

This renames quite a few places; to aid in rebasing current work or
updating downstream dependencies, this PR includes a file
`API_UPDATES.md` that can be fed into an AI agent to quickly and
accurately perform the renaming on any downstream dependencies.
2026-03-04 16:05:49 -08:00
Di Xiao
c4a56f889c XetSession API (#657)
This PR introduces a new `xet_session` crate that provides a
session-based hierarchical API: Users create a XetSession to manage
runtime and configuration, then batch uploads into UploadCommit objects
and downloads into DownloadGroup objects — each of which runs transfers
in the background by the inner XetRuntime.

All pub functions are exposed as sync functions - making them easy to
use in other languages, e.g. Python, C, etc.
2026-03-03 20:27:39 -08:00
Adrien
40b45fb0fb feat: accept pre-computed SHA-256 in upload_files() (#678)
## Summary

- Add optional `sha256s` keyword parameter to the Python-exposed
`upload_files()` function
- Forward it to `data_client::upload_async()` which already supports it

## Context

### Double computation today

`huggingface_hub` computes SHA-256 on every file during
`CommitOperationAdd.__post_init__()` for LFS batch negotiation, then
`hf_xet` recomputes it internally because `upload_files()` doesn't
accept pre-computed hashes.

### Performance impact

This change eliminates the redundant computation entirely.

### Backward compatibility

- `sha256s` is a keyword-only parameter with default `None` — no change
for existing callers
- `data_client::upload_async()` already accepts `sha256s:
Option<Vec<String>>` since day one
- When provided, `SingleFileCleaner` uses `ShaGenerator::ProvidedValue`
and skips internal recomputation

Companion PR: huggingface/huggingface_hub#3876
2026-03-03 21:20:09 +01:00
Adrien
e66dcef40b Fix command injection in release workflow (CVE) (#677)
## Summary

- Fix command injection vulnerability in `.github/workflows/release.yml`
(HackerOne #3581567, severity High 8.8)
- `${{ github.event.inputs.tag }}` was interpolated directly in `run:`
blocks, allowing arbitrary RCE via crafted tag input (e.g. `v0.1.0; id;
cat /etc/passwd;#`)
- Moved all 6 occurrences to `env:` variables so the value is passed as
a shell environment variable instead of being interpolated into the
script

## Jobs fixed

- `linux` — "Update version in toml" step
- `musllinux` — "Update version in toml" step
- `windows` — "Update version in toml" step
- `macos` — "Update version in toml" step
- `sdist` — "Update version in toml" step
- `github-release` — "Create GitHub Release" step (`gh release create`)
2026-03-02 20:10:27 +01:00
Hoyt Koepke
9b3278a510 Streaming data writer (#656)
This PR adds an integrated API for streaming downloads, exposing a
DownloadStream object that is integrated with the file reconstructor. It
also uses the same memory management buffer limiting process to work
with the stream object.

It also introduces cancellation support to the FileReconstructor to
ensure that tasks waiting on a long running download or semaphore wait
don't cause things to hang when an error is reported or the user drops
the stream.
2026-02-27 15:08:25 -08:00
Di Xiao
c4111eb6da Feature to monitor client process system usage (#617)
Introduces a client benchmark utility to track system resource usage
(CPU, memory, disk I/O, and network I/O) of a process, so we don't need
to write scripts to capture usage stats according to different OS
standards. This becomes extremely helpful when I benchmark on Python
notebook instances, e.g. Google Colab, where system monitor is not
easily accessible or when running a separate monitor script is not easy.

# Usage #
Users can enable monitoring by setting `HF_XET_SYSTEM_MONITOR_ENABLED`
to true, set usage sample interval using
`HF_XET_SYSTEM_MONITOR_SAMPLE_INTERVAL`, this outputs metrics to the
tracing stream at `INFO` level by default. In addition, these metrics
can be redirected to a separate file by setting sample log path using
`HF_XET_SYSTEM_MONITOR_LOG_PATH`.

# Output #
The stats are output in JSON format, which can be queried using tools
like `jq`, e.g.
1. Trace of peak memory usage: `jq '.memory.peak_used_bytes'
[HF_XET_SYSTEM_MONITOR_LOG_PATH]`
2. Trace of disk write speed: `jq '.disk.average_write_speed'
[HF_XET_SYSTEM_MONITOR_LOG_PATH]`
3. Trace of network receive speed: `jq '.network.average_rx_speed'
[HF_XET_SYSTEM_MONITOR_LOG_PATH]`
2026-02-27 13:36:31 -08:00
Hoyt Koepke
543914dce1 Scale download buffer memory limit by number of active downloads (#666)
Currently, the maximum number of downloaded files is fixed, regardless
of the number of downloads currently in flight. However, as the number
of downloads increases, a fixed size total could lead to waiting on
individual segments that download out-of-order or don't have enough
turnaround time to saturate the output. While writing to disk or the
download itself often becomes the bottleneck before these effects,
planned features such as streaming files and caching could be affected
by this limit. The default formula for the download buffer size now is
(2GB + 512MB * number of concurrent downloads) up to a maximum of 8GB
(these are adjustable).

This PR alleviates this by allocating an additional 512MB buffer
allocation per file, prioritized to the specific download, releasing
that capacity when the file finishes downloading. This is done using the
AdjustableSemaphore class, first introduced for the concurrent scaling,
which allows the number of total permits in a semaphore to be
incremented or decremented; on decrement, permits are discarded upon
return until the total permits is at the target number.
2026-02-27 11:35:55 -08:00
Rajat Arya
e31bbb5ddb hf-xet 1.3.2 version bump (#671) v1.3.2 2026-02-27 08:38:24 -08:00
Hoyt Koepke
3a4a2b8294 Fixes for intermittent test failures on windows. (#669)
This PR addresses two rare but occasional test failures on windows, both
due to window's non-synchronous file system behavior.
- A race condition opening the local test database causing an error.
- Unwanted cleanup conditions in testing the log preservation can
trigger if the test execution is stretched out long enough.
- A null-termination bug in set_file_metadata that causes it to fail
silently if the memory layout is such a way that the string passed in
isn't null-terminated. This causes occasional failures in setting the
metadata time on linux.
2026-02-27 07:53:05 -08:00
Hugo Larcher
73e531a41c fix: wrap TrackingProgressUpdater in AggregatingProgressUpdater (#668)
## Summary

Wrap download progress updaters in `AggregatingProgressUpdater` to
eliminate GIL contention when Python callers provide per-file progress
callbacks.

The upload path has had this aggregation since v1.1.3 (PR #340), but the
download path was missed. Without aggregation, each XORB chunk triggers
a `spawn_blocking` + `Python::with_gil()` callback. With many concurrent
file downloads, this causes severe GIL contention — measured as a **4x
throughput reduction** (3000 MB/s → 750 MB/s on a 25 Gbps link).

The fix wraps the caller-provided `TrackingProgressUpdater` in an
`AggregatingProgressUpdater` (200ms flush interval) inside
`download_file_with_updater()`, matching the pattern already used by
`FileUploadSession`. This reduces Python callback frequency from
thousands/sec to ~5/sec per file while preserving progress bar feedback.

## Root cause

When `huggingface_hub` calls `hf_xet.download_files()`, it passes a
per-file Python callback for progress bar updates. On the Rust side,
each callback invocation goes through:

```
report_bytes_written() / report_transfer_progress()
  → tokio::spawn(register_updates())
    → spawn_blocking(Python::with_gil(callback))
```

With the detailed download progress tracking added in PR #645 (hf-xet
v1.3.0), both `report_bytes_written` and `report_transfer_progress` fire
per chunk, roughly doubling callback frequency. With 8+ concurrent file
downloads, each spawning dozens of concurrent XORB streams, the GIL
becomes a severe bottleneck.

## History

The problem has existed since xet download support was introduced, but
worsened over time:

| Version | Date | Impact |
|---------|------|--------|
| `huggingface_hub v0.30.0` / `hf-xet 0.1.x` | Mar 2025 | Moderate —
synchronous `with_gil()` per chunk, but hf_xet was an optional extra |
| `huggingface_hub v0.31.0` / `hf-xet >=1.1.0` | May 2025 | Moderate —
hf-xet became a hard dependency on x86_64/arm64 |
| `hf-xet v1.1.3` | Jun 2025 | Upload path fixed with
`AggregatingProgressUpdater` (PR #340); download path left unprotected |
| `hf-xet v1.3.0` | Feb 2026 | **Severe** — PR #645 added detailed
per-chunk progress tracking to downloads, doubling callback frequency
without aggregation |

PR #340 explicitly noted: *"each [update] has to acquire a global GIL
lock. This negatively affects the upload speed on fast connections"* —
the same problem, but only the upload side was addressed.

## Benchmarks

Downloading 3 safetensors files (16.1 GB total) from
`Qwen/Qwen3.5-35B-A3B` on a 25 Gbps machine:

| Test | Before | After |
|------|--------|-------|
| `download_files()` with `progress_updater=None` (baseline) | 3119 MB/s
| 3119 MB/s |
| `download_files()` with per-file Python callbacks | **746 MB/s** |
**1789 MB/s** |
| `snapshot_download()` (full Python CLI path with tqdm) | ~750 MB/s |
**2395 MB/s** |

Progress callback overhead drops from **4x slowdown to <1%**.
2026-02-26 21:23:13 -08:00
dependabot[bot]
0d0f4883ad Bump time from 0.3.44 to 0.3.47 in /hf_xet_wasm (#654)
Bumps [time](https://github.com/time-rs/time) from 0.3.44 to 0.3.47.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/time-rs/time/releases">time's
releases</a>.</em></p>
<blockquote>
<h2>v0.3.47</h2>
<p>See the <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">changelog</a>
for details.</p>
<h2>v0.3.46</h2>
<p>See the <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">changelog</a>
for details.</p>
<h2>v0.3.45</h2>
<p>See the <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">changelog</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">time's
changelog</a>.</em></p>
<blockquote>
<h2>0.3.47 [2026-02-05]</h2>
<h3>Security</h3>
<ul>
<li>
<p>The possibility of a stack exhaustion denial of service attack when
parsing RFC 2822 has been
eliminated. Previously, it was possible to craft input that would cause
unbounded recursion. Now,
the depth of the recursion is tracked, causing an error to be returned
if it exceeds a reasonable
limit.</p>
<p>This attack vector requires parsing user-provided input, with any
type, using the RFC 2822 format.</p>
</li>
</ul>
<h3>Compatibility</h3>
<ul>
<li>Attempting to format a value with a well-known format (i.e. RFC
3339, RFC 2822, or ISO 8601) will
error at compile time if the type being formatted does not provide
sufficient information. This
would previously fail at runtime. Similarly, attempting to format a
value with ISO 8601 that is
only configured for parsing (i.e. <code>Iso8601::PARSING</code>) will
error at compile time.</li>
</ul>
<h3>Added</h3>
<ul>
<li>Builder methods for format description modifiers, eliminating the
need for verbose initialization
when done manually.</li>
<li><code>date!(2026-W01-2)</code> is now supported. Previously, a space
was required between <code>W</code> and <code>01</code>.</li>
<li><code>[end]</code> now has a <code>trailing_input</code> modifier
which can either be <code>prohibit</code> (the default) or
<code>discard</code>. When it is <code>discard</code>, all remaining
input is ignored. Note that if there are components
after <code>[end]</code>, they will still attempt to be parsed, likely
resulting in an error.</li>
</ul>
<h3>Changed</h3>
<ul>
<li>More performance gains when parsing.</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>If manually formatting a value, the number of bytes written was one
short for some components.
This has been fixed such that the number of bytes written is always
correct.</li>
<li>The possibility of integer overflow when parsing an owned format
description has been effectively
eliminated. This would previously wrap when overflow checks were
disabled. Instead of storing the
depth as <code>u8</code>, it is stored as <code>u32</code>. This would
require multiple gigabytes of nested input to
overflow, at which point we've got other problems and trivial
mitigations are available by
downstream users.</li>
</ul>
<h2>0.3.46 [2026-01-23]</h2>
<h3>Added</h3>
<ul>
<li>All possible panics are now documented for the relevant
methods.</li>
<li>The need to use <code>#[serde(default)]</code> when using custom
<code>serde</code> formats is documented. This applies
only when deserializing an <code>Option&lt;T&gt;</code>.</li>
<li><code>Duration::nanoseconds_i128</code> has been made public,
mirroring
<code>std::time::Duration::from_nanos_u128</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d5144cd287"><code>d5144cd</code></a>
v0.3.47 release</li>
<li><a
href="f6206b050f"><code>f6206b0</code></a>
Guard against integer overflow in release mode</li>
<li><a
href="1c63dc7985"><code>1c63dc7</code></a>
Avoid denial of service when parsing Rfc2822</li>
<li><a
href="5940df6e72"><code>5940df6</code></a>
Add builder methods to avoid verbose construction</li>
<li><a
href="00881a4da1"><code>00881a4</code></a>
Manually format macros everywhere</li>
<li><a
href="bb723b6d82"><code>bb723b6</code></a>
Add <code>trailing_input</code> modifier to <code>end</code></li>
<li><a
href="31c4f8e0b5"><code>31c4f8e</code></a>
Permit <code>W12</code> in <code>date!</code> macro</li>
<li><a
href="490a17bf30"><code>490a17b</code></a>
Mark error paths in well-known formats as cold</li>
<li><a
href="6cb1896a60"><code>6cb1896</code></a>
Optimize <code>Rfc2822</code> parsing</li>
<li><a
href="6d264d59c2"><code>6d264d5</code></a>
Remove erroneous <code>#[inline(never)]</code> attributes</li>
<li>Additional commits viewable in <a
href="https://github.com/time-rs/time/compare/v0.3.44...v0.3.47">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=time&package-manager=cargo&previous-version=0.3.44&new-version=0.3.47)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/huggingface/xet-core/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-24 20:21:32 -08:00
Rajat Arya
438045a19e Version bump for hf-xet 1.3.1 release (#665) v1.3.1 2026-02-24 15:36:20 -08:00
Rajat Arya
8808f9e64e Add Windows ARM64 build support (#662)
## Summary

Closes #588

- Add `win11-arm` runner with `aarch64-pc-windows-msvc` target to the
hf-xet Python wheel release pipeline
- Add `win11-arm` runner with `aarch64` target to the git-xet CLI
release pipeline, parameterizing the WiX installer `-arch` flag

## Test plan

- [x] Trigger a workflow_dispatch run of the Release workflow and verify
`windows` matrix includes both `x64` and `aarch64` entries
- [x] Verify ARM64 wheels and .pdb debug symbols are built and uploaded
- [ ] Trigger a workflow_dispatch run of the git-xet Release workflow
and verify ARM64 binary and MSI installer are produced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-24 15:23:42 -08:00
Di Xiao
99105937f3 Upgrade hf-xet to 1.3.0 (#664) v1.3.0 2026-02-23 15:26:30 -08:00
Brian Ronan
17e900a70e Feat: optional request_headers on hf_xet API calls (#661)
Adding support for setting an optional `request_header` map on the
hf_xet upload and download API calls. This map is augmented with the
hf_xet user agent string and is passed along with the requests to
xetcas.

This PR also adds some unit tests for testing the map merging behavior
to `hf_xet/lib.rs` and adds support for running these with cargo test
and in github actions CI step.
2026-02-23 14:43:58 -08:00
Hoyt Koepke
b3c5d05fb7 Make specifying the file size at the beginning of an upload optional. (#651)
Currently, the progress and dependency tracking in the upload path
requires that the total size of a file be specified at the start. This
PR changes this so that in cases where the upload is streamed and the
total size is not known, it's updated as soon as new data is processed.
Both routes now work and correctly track the file sizes.
2026-02-23 10:31:09 -08:00
Hoyt Koepke
2176e5d3ed FileDownloadGroup (#652)
This PR adds a FileDownloadSession struct that parallels the
FileUploadSession struct, replacing the FileDownloader. It's an
intermediate step in preparation for a session-based API that integrates
well with interfaces other than the python interface in hf_xet.
2026-02-19 17:43:35 -08:00
Hoyt Koepke
21bc6cfdc3 Removed incorrectly included AGENTS.md. (#660)
The AGENTS.md file was incorrectly checked into the repository (part of
a claude process to prepare and check a diff for PR). This PR removes
that.
2026-02-19 11:32:12 -08:00
Hoyt Koepke
5d6371a296 Progress reporting for downloads. (#645)
This PR adds detailed progress reporting to the download path. 
- Transfer progress is reported as soon as the download streams start;
actual bytes written are reported as the reconstructed file is written
out.
- Currently, each call to download_file creates a separate progress
tracker, but this sets up for download groups with grouped download
progress tracking.
 
To support this, the UploadProgressStream was split into three classes;
a common StreamProgressReporter and download and upload specific
versions. This also allows us to simplify the API to RetryWrapper.

More tracking was added to the file reconstruction paths to properly
report progress.
2026-02-19 11:06:42 -08:00
Rajat Arya
a7661a7e63 Removing pyproject.toml from repo root (#659)
Not being used to build hf-xet package anyway this is confusing the pip
wheel command.

Fixes #658
2026-02-17 15:13:14 -08:00
Hoyt Koepke
9d9fc72d40 XetCommon struct in the runtime to hold global counters, semaphores. (#650)
This PR simplifies the current process of working with
runtime-associated resources such as a cached Client instance or global
resource semaphores. Instead of using macros, all of these are moved
into a XetCommon struct that holds them explicitly. The runtime holds an
instance of this, and it's initialized with a config struct.

In addition, to make the logic around the memory limiting semaphore in
file_reconstructor clearer, we added a ResourceLimiter struct that wraps
the tokio semaphore but scales the total permits and permit requests
appropriately if the total resource quantity is larger than u32::MAX, as
can be the case easily.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 16:47:07 -08:00
Di Xiao
87d2ac9bcd Update git-xet install URls (#655) 2026-02-12 16:40:40 -08:00
Di Xiao
23f68bb798 Upgrade git-xet to 0.2.1 (#653) git-xet-v0.2.1 2026-02-12 15:45:34 -08:00
Di Xiao
7d7582c3dd TemplatedPathBuf utility (#643)
Implements a utility for configuring path-like parameters.

This folds inside the existing function `fn
normalized_path_from_user_string` that expands `~` to home directory and
converts to absolute paths, and evaluates a path template by
substituting **case-insensitive** placeholders with corresponding
values:
- `{pid}` for process ID,
- `{timestamp}` for ISO 8601 local timestamp with offset

For example,
```
let template = TemplatedPathBuf::new("~/logs/app_{PID}_{TIMESTAMP}.txt");
let path = template.as_path();
/// Returns an absolute path like "/home/user/logs/app_12345_2024-01-15T10-30-45-0500.txt"
```
or to be used directly in config groups:
```
crate::config_group!({
    ref log_path: Option<TemplatedPathBuf> = None;
}
```
2026-02-11 14:51:16 -08:00
Hoyt Koepke
cca8f39699 Clippy / fmt / test cleanup. (#649)
- Skip install/uninstall tests when git-lfs unavailable; formatting
fixes in wasm crates
- Add `git_lfs_available()` helper to skip install/uninstall tests in
environments where git-lfs is not installed
- Apply latest nightly rustfmt formatting fixes and clippy fixes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-11 14:19:42 -08:00
Hoyt Koepke
e443ee9260 Upgrade package dependencies (#644)
This PR updates all the package dependencies that would not cause
significant API breakages to the current version. The package versions
in hf_xet_wasm and hf_xet are also updated to match the versions in the
base package. There should be no functional change.
2026-02-11 12:19:29 -08:00
Di Xiao
59219bfcfc Fix writev exceeding limits (#641)
On macOS and Linux, `writev(int fildes, const struct iovec *iov, int
iovcnt)` may return EINVAL if
- the sum of the iov_len values in the iov array overflows a 32-bit
integer (macOS) or an ssize_t value (Linux);
- iovcnt is less than or equal to 0, or greater than UIO_MAXIOV (POSIX
standard IOV_MAX, value 1024); and specially on Linux, the glibc wrapper
functions do some extra work if they detect that the underlying kernel
system call failed because this limit was exceeded. The wrapper function
would allocate a temporary buffer large enough for all of the items
specified by iov, copies data from iov to this buffer, and passes the
buffer in a call to write().

To avoid these potential syscall failures or performance degradation, we
put a limit on the total number of bytes and number of slices to call
`writev()`. Also adding unit tests for these two limits.
2026-02-10 19:17:32 -08:00
Di Xiao
3a17d667dc Change default download buffer size (#637)
Reduces the default download buffer size to a more friendly number.

### Benchmark ###
Download benchmark with different default memory configs on three
scenarios:
• Comparable disk write speed and network ingress speed (i4i.xlarge, up
to 10 Gbps ingress, stable 550 MB/s write SSD)
• Faster disk write (i4i.xlarge, ingress limited to 1 Gbps using "tc"
command, stable 550 MB/s write SSD)
• Faster network ingress (m5d.xlarge, up to 10 Gbps ingress, stable 150
MB/s write SSD)

Benchmark results are at
https://docs.google.com/spreadsheets/d/1ozpk0kU7uM8SGODXxXtXQauc3l5CEoWG/edit?usp=sharing&ouid=108235600614994105911&rtpof=true&sd=true,
implying no substantial improvement with buffer size over 2 GB, with the
default download parallelism of 8 set by huggingface-hub. Setting total
download buffer size to 8GB gives each parallel download task in average
`2 GB / 8 = 256 MB` pending bytes to write, or `256 MB / 64 MB = 4` or
even more pending terms if the network speed is comparable to that of
the disk, and keeps the disk writer always busy.
2026-02-10 18:49:39 -08:00
Di Xiao
a336df4d02 Bake in openssl for git-xet macOS built in Github Action (#626)
Fix issue https://github.com/huggingface/xet-core/issues/621. Fix
XET-819.

The script
https://github.com/huggingface/xet-core/blob/main/git_xet/install.sh
installs the git-xet built in Github Actions, and when git-xet is built
for macOS it is linked to `homebrew/openssl@3` because the `git2` crate
depends on openssl. For users who don't have homebrew and
`homebrew/openssl` installed (so why would prefer the installation
script) running this git-xet on their system immediately crashes.

This PR 
- adds a feature "git2-vendored-openssl" that enables
"git2/vendored-openssl" which bakes openssl statically into git-xet,
- updates Github Actions CI to build git-xet with this feature for macOS
version.

This would increase the git-xet binary size from ~9MB to ~13MB and drops
the `homebrew/openssl` linkage (comparing output of `otool -L git-xet`,
left is from `git-xet` before this change):
<img width="1456" height="390" alt="Screenshot 2026-01-28 at 3 33 39 PM"
src="https://github.com/user-attachments/assets/5c779d78-a042-45d8-99e5-95394db6e774"
/>

The homebrew official bottles for git-xet will not be affected and still
uses `hombrew/openssl` because they build from source code (the above
feature not enabled).
2026-02-04 11:03:07 -08:00
Hoyt Koepke
0ddc268757 CTRL-C interruption for spawn_blocking threads. (#632)
This PR enables easy checking of CTRL-C cancellation in spawn_blocking
threads, such as the background writer in the file reconstruction path
for downloads. It also adds that capability in two places that would
hold up CTRL-C interruption, namely the background loading of shard
files and the serial writer in the new adaptive concurrency file
reconstruction path.
2026-02-03 17:26:06 -08:00