Commit Graph

394 Commits

Author SHA1 Message Date
Hoyt Koepke
3096b3f9c3 Test suite for directory logging functionality (#536) 2025-10-24 10:06:26 -07:00
Hoyt Koepke
54aef8095f Improved logging for cas_client crate (#537) 2025-10-24 10:05:57 -07:00
Rajat Arya
f2d1587b82 Disable DiskCache in hf_xet, continue to use it in git_xet (#535)
1. hf_xet : disables DiskCache by default.
2. git_xet : continues to use DiskCache by default, set to 10GB as
before.

Tested manually locally.
2025-10-23 16:28:15 -07:00
Hoyt Koepke
f0777fcf2c Fix clippy issues with new rust version. (#533)
This PR fixes the clippy issues detected with rust 1.90 and
1.92+nightly.
2025-10-23 11:43:45 -07:00
omahs
84c9f4d412 Fix typos (#508)
Fix typos
2025-10-23 07:38:04 -07:00
Rajat Arya
7315f6d6da Add fallback if unable to get git commit (#531)
Specifies the fallback option on the `git_version!` macro, as
written in the docs for git-version-macro crate.
2025-10-22 08:46:42 -07:00
Rajat Arya
500ee6165d Adding python-version 3.13t and 3.14t to builds (#524)
Add python 3.13t and 3.14t to Release builds, they don't follow the standard abi, i.e. "abi3" so need to be build separately.
Update the build and debug symbol generation and publishing workflow to allow different debug symbols for builds targeting different python version.
Update the diagnosis scripts so they find the correct debug symbols.

---------

Co-authored-by: di <di@huggingface.co>
2025-10-20 17:14:54 -07:00
Hoyt Koepke
9fde4b72c0 Fix breaking build changes due to git safety checks. (#530)
Fix build break due to git safety checks. 

The recent logging PR added the automatic logging of the git repository
commit hash that was used to build the wheel. However, this requires
querying for this hash at build time, which requires running git. It
turns out that in the CI, the user checking out the repository and the
user building the wheel are different, which makes git upset. This PR
adds code to tell git everything will be ok.

---------

Co-authored-by: di <di@huggingface.co>
2025-10-20 11:56:15 -07:00
Hoyt Koepke
69f23d630e Logging to directory + log file management; default to log directory for hf_xet (#502)
This PR switches the default logging to log events to a file in
'~/.cache/huggingface/xet/logs' (or 'xet/logs' under the specified cache
directory if not `~/.cache/huggingface/`).

In this directory, log files older than 2 weeks are cleaned up on
process start, and if the total size of files in the directory is larger
than 1gb, then log files are deleted by age to get the directory size
under 1gb. Log files are named with a timestamp and PID; by default,
logs newer than 1 day or logs with an active associated PID are never
deleted. All of these are user configurable constants.
2025-10-20 14:35:43 +02:00
Di Xiao
2eec20baf1 Create README.md for Git-Xet (#529)
Added a README file for Git-Xet, detailing installation and
functionality.

---------

Co-authored-by: Jared Sulzdorf <j.sulzdorf@gmail.com>
2025-10-16 15:08:13 -07:00
Sam Horradarn
89e549089a fix: explicitly specify main branch for hub client in migration utility (#522)
The current hub client does not pass revision into the argument, which
causes the moon-landing call to append `create_pr=1` query param to the
token API and returns 403 error.
2025-10-03 12:16:19 -07:00
Assaf Vayner
9f69239322 openapi spec and Makefile for it (#518)
fix XET-741

Creates an openapi specification for all CAS API's following the first
version of the protocol specification.

Makefile to generate different language clients for CAS APIs.
2025-10-03 10:24:44 -07:00
Di Xiao
a31df60741 Upgrade macos-13 to macos-15-intel due to closing down (#521)
Upgrade the Intel macos runner due to [GitHub Actions: macOS 13 runner
image is closing
down](https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/).
2025-10-02 14:55:18 -07:00
Qubitium-ModelCloud
975e867b96 Fix python 314 compat (#520)
Fix python 3.14 build compat

1. `pyo3` depend updated to `0.26`: this is required or else it can't be
compiled for python 3.14
2. version update to `1.1.11-dev0`
```
# removed
Fix compat with Rustc 1.86.0
3. change rust conditions that was throwing `unstable` `errors` in `rustc 1.86.0 (05f9846f8 2025-03-31)` (fairly new version, not the latest) _
```
@hoytak  @seanses  @assafvayner
2025-10-02 14:25:10 -07:00
Di Xiao
6fbde98e5e git-xet Windows installer and code signing (#519)
This PR
- adds to the git-xet release workflow to code-sign Windows executable
"git-xet.exe" using the Microsoft Trusted Signing Service;
- builds a Windows installer for git-xet to place "git-xet.exe" in the
system, modify the system PATH environment variable, and run the command
"git-xet install" to configure git-xet; On uninstallation from "Control
Panel\Programs\Programs and Features", it first runs "git-xet uninstall
--all" so it is deregistered from git-lfs custom transfer.
- signs the built Windows installer msi file.
git-xet-v0.1.0
2025-10-02 12:40:09 -07:00
Assaf Vayner
28dd760892 rm all docs files (#517)
Removes docs in xet-core in favor of docs in hub-docs for the xet
protocol: https://github.com/huggingface/hub-docs/pull/1963
2025-09-30 18:12:03 -07:00
Assaf Vayner
5565b37e03 integrate docs debugging (#516)
change docs upload to xet-protocol rather than xet-upload
2025-09-29 15:38:13 -07:00
Assaf Vayner
f0895142cb move spec to docs (#515)
publish to hub docs out of xet-core for xet-spec. Need to merge this
first before iterating to get the github workflows working right.
2025-09-29 12:37:21 -07:00
Hoyt Koepke
4176674a7e Added lazy evaluation functionality to error printer. (#510)
Currently, the full message given to a function in error_printer must be
always created. This causes extra work when there are no errors if the
message should contain additional data.

This PR introduces `_fn` versions of the existing functions that call a
given function on demand to obtain the message. Thus `.warn_error_fn(||
format!("Error processing {context}"))` would only allocate and create
the error string when an error needs to be logged.
2025-09-29 11:23:39 -07:00
Assaf Vayner
0958579c40 spec draft (#422)
fix XET-681

XET protocol specification initial draft

- documentation of core procedures required for file uploads and
downloads
- format specifications for shards and xorbs
2025-09-29 10:25:25 -07:00
SuperKenVery
94fa9449bb Enable socks5 proxy support (#474)
Tested on user's machine with the socks5 proxy specified in `all_proxy` env var.

Co-authored-by: Hoyt Koepke <hoytak@huggingface.co>
2025-09-26 14:25:23 -07:00
Assaf Vayner
c55fabb6bf hashing and chunking example tools (#496)
Adds some basic examples tools (compiled with `cargo build --examples`
on `data` crate) to compute hashes and chunk boundaries.
2025-09-26 12:49:55 -07:00
Di Xiao
8ee0a5c958 Cache rust build in actions (#513)
In response to [A Joint Statement on Sustainable
Stewardship](https://openssf.org/blog/2025/09/23/open-infrastructure-is-not-free-a-joint-statement-on-sustainable-stewardship/)
and [Rust Foundation Signs Joint Statement on Open Source Infrastructure
Stewardship](https://rustfoundation.org/media/rust-foundation-signs-joint-statement-on-open-source-infrastructure-stewardship/),
implements caching of dependency and build artifact, and reduces some CI
runtime. Cache entry keys are formed by `os_type`-`arch_type`-`hash of
Cargo.lock`, cache configuration adapts from
https://docs.github.com/en/actions/tutorials/build-and-test-code/rust#caching-dependencies.
2025-09-26 11:16:04 -07:00
Di Xiao
a8b4764761 Convert status code to error for get_cas_jwt (#509)
Let the error status code be caught early instead of later during json
deserialization.
2025-09-24 12:26:34 -07:00
Di Xiao
f3245326b0 git-xet install script for Linux & macOS (#503)
Try this out:
1. Run the following in your terminal to install git-xet (requires
`curl` and `unzip`):
```
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/d128ff7f5bf029fba5f2f561f3c597c0f22b5147/git_xet/install.sh | sh
```
2. (After
https://github.com/huggingface-internal/moon-landing/pull/15121 gets
deployed to prod, or to your local moon-landing) Do as your regular
workflow to `git lfs track ...` & `git add ...` & `git commit ...` &
`git push`

Resolves XET-747
2025-09-24 08:49:06 -07:00
Di Xiao
76fe533f81 Fix git-xet release bug (#505)
The binaries have the same name "git-xet", uploading them directly leads
to a release failure. We thus zip the binaries with each zip file name
being the name argument in the upload-artifact actions, i.e.
"git-xet-[linux|macos|windows]-${{ matrix.platform.target }}".
2025-09-23 16:16:02 -07:00
Di Xiao
75952ae618 Better support "xet-write-token" API authorization model and LFS Batch Api change (#498)
1. This PR updates the hub client xet access token request to use custom
rev in addition to the default "main". This better supports the
"xet-write-token" API authorization model:
Clients can get a xet write token, if
- the "rev" is a regular branch, with a HF write token;
- the "rev" is a pr branch with an corresponding open PR, with a HF
write or read token;
- it intends to create a pr and repo is enabled for discussion, with a
HF write or read token.

2. Fixed a bug when getting the current branch name in a repo, which
didn't parse branch names with "/" correctly: change
`refs_heads_branch.rsplit('/').next()` to
`refs_heads_branch.strip_prefix("refs/heads/")`.

3. Also updated xet transfer agent to use the refresh route in the LFS
Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1713)).

4. Use the session id in the LFS Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1657))
for token refresh and CAS requests.
2025-09-23 16:07:54 -07:00
Di Xiao
15942e295e Fix git xet release bug (#504)
`gh release create` creates tags and thus requires repo checkout
2025-09-22 16:57:35 -07:00
Di Xiao
55234c489b Build and release git-xet (#499)
The defines the workflow to build git-xet on Linux (amd64 & arm64),
macOS (amd64 & arm64) and Windows (amd64). For the macOS build the
compiled binary is signed and notarized in place.
2025-09-22 15:44:02 -07:00
Hoyt Koepke
610874ab04 Allow Duration and byte sizes in constants for easier use. (#495)
- Duration: Currently, we use a lot of _MS and _SEC suffixes in
constants to denote duration. This PR allows std::time::Duration to be
used directly, with values such as "10sec" or "100ms" or "1d" translated
directly into std::time::Duration.

- ByteSize: It also introduces a new utility type, ByteSize, that simply
wraps a u64 but allows the user to specify "1mb" or "45gb" as the value
when setting constant values. The suffixes mb, mib, kb, kib, gb, gib, b,
etc. are all supported, with the default being the raw value.
2025-09-19 10:59:11 -07:00
Rajat Arya
f612564c25 MacOS diag scripts (#497)
Adds corresponding Diagnostics script for MacOS.

Lightly tested with hf-xet 1.1.10 and MacOS 15.6.1 - correctly takes
stack traces on interval and writes out to diagnostics folder as
expected.
2025-09-17 15:07:35 -07:00
Di Xiao
fa030edcd5 upgrade rust edition to 2024; upgrade rustc to 1.89 (#494)
- Upgrade Rust edition and rustc version to bring in some nice features,
e.g. let chains instead of nested if block.
- Fix clippy and format due to the upgrade.
- Fix a bug identified by the new rustc:
6cb0a7fb4e/xet_runtime/src/runtime.rs (L195)
```
#[cfg(not(target_family = "wasm"))]
{
    // A new multithreaded runtime with a capped number of threads
    TokioRuntimeBuilder::new_multi_thread().worker_threads(get_num_tokio_worker_threads())
}
```
here the end curly bracket drops the temporary builder while a `&mut
Self` to the dropped value is returned. (this may be due to a difference
between compilers regarding how they treat the scope of "{...}" of
`#[cfg(...))] {...}`?)
2025-09-17 10:28:50 -07:00
Hoyt Koepke
6cb0a7fb4e Improved user-configurable constant handling (#493)
This PR adds in two upgrades to the current configurable_constants!
macro that allows for users to specify the values of configuration
constants using environment variables and the like. It adds two things:
- Allows bool values to be parsed by 0, 1, true, false, on, off, etc.
configurable_bool_constants! is no longer needed.
- Allows Option<T> to be a specified type with a default value of None,
which parses the environment value as type T but puts it in Some(Value)
if it's present and None if it's not specified. This allows us to
determine if a value has been specified, e.g. in the case where the
default depends on other things but can be overridden.
2025-09-16 12:43:33 -07:00
Hoyt Koepke
a715926cc7 Rename Threadpool class name to XetRuntime to reflect usage (#491)
The Threadpool class does quite a bit more than just manage a
threadpool; this PR simply changes the name to reflect this usage.
2025-09-15 11:30:28 -07:00
Assaf Vayner
22f86db343 Adding README to few crates for documentation (#492)
Added README to a few crates so that we can link to the crate directory
to link to individual crates and have something rendered for reference
implementation when linking to a specific file doesn't make sense.
2025-09-15 11:15:05 -07:00
Assaf Vayner
81b0833965 hf_xet 1.1.10 (#490)
update version of hf_xet
v1.1.10-rc0 v1.1.10
2025-09-11 14:57:19 -07:00
Rajat Arya
c762c681ef Diagnostic Scripts + README changes (#489)
- Adds diagnostic scripts to root of repo and references them in README.
- Also reorganizes README to make diagnostics & debugging more visible.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-11 14:54:22 -07:00
Hoyt Koepke
fe71e3dd54 Updated chunker to eliminate spurious boundary triggering. (#487)
We must enforce that the next boundary is actually past the minimum
chunk size. In rare cases, a boundary could be triggered before the
minimum chunk size has passed, and this triggering would be based not on
the content of the file but on the previous state of the rolling hash
function. This PR fixes this case.
2025-09-10 16:52:48 -07:00
Joseph Godlewski
f24c97eedb Adding retry for unhandled io errors when sending requests (#468)
Allows us to retry errors when we receive I/O errors when sending
requests. For example, on macOS, when we see:
```
No buffer space available (os error 55)
```
when downloading from S3, we should wait and retry once the system has
more network resources.
2025-09-10 16:44:05 -07:00
Hoyt Koepke
263020646f Fix wheel upload for linux for dev/alpha/beta tags (#379)
When using beta tags, the upload process doesn't have the wheels in the
correct directory. This fixes that.
2025-09-09 10:29:27 -07:00
Di Xiao
e2f7861809 Drop "GaiResolverWithAbsolute" (#486)
Removes this custom DNS resolver that forces domain names to be
absolute. Using relative domain names is a valid practice in Kubernetes
clusters to configure proxy servers, e.g.
https://github.com/huggingface/huggingface_hub/issues/3323

The original [PR](https://github.com/huggingface/xet-core/pull/383) that
added this resolver was trying to resolve
https://github.com/huggingface/huggingface_hub/issues/3155 later
appeared to be a problem of user local misconfiguration.
2025-09-09 10:26:56 -07:00
Di Xiao
0e1f9f4cf0 Git-Xet: LFS custom transfer agent with Xet protocol (#425)
This PR builds a Git integration called `git-xet` that enables users to
upload files using the Xet protocol as part of a standard git push.

This integration builds on the Git LFS custom transfer adapter protocol,
the same mechanism we now use to handle Git LFS uploads for files larger
than 5 GB through multipart PUT.
To enable uploads to Xet, users run `git-xet install`, which writes the
following configuration to the Git config file at a selected scope
[`--system`, `--global` (default), or `--local`]:
```
[lfs "customtransfer.xet"]
	path = git-xet
	args = transfer
	concurrent = true
```
This setup registers a new transfer adapter named xet, allowing Git to
delegate LFS file transfers to the git-xet binary when applicable.

On the server side, support is rolled out in two stages:

Stage 1 (Upload): The Git LFS batch API for the "upload" operation is
updated.

- If a repo is Xet enabled but users didn't run git-xet install,
moon-landing rejects the request when users initiated git push and
returns an instruction to install git-xet.

- If a repo is Xet enabled and users have git-xet configured correctly,
moon-landing accepts the request and replies with CAS server URL and
access token, which git-xet will use to upload files to Xet.

- If a repo is NOT Xet enabled, upload goes through the LFS path.
2025-09-08 16:08:50 -07:00
Assaf Vayner
e01896e074 use u64 rather than usize in file hashing paths (#485)
Using the file hashing components in WASM found a bug that using 32 but
usize causes errors when hashing the file.

This PR enforces the use of u64 everywhere along that path (and also
pins the wasm-bindgen version)
2025-09-08 14:27:58 -07:00
Hoyt Koepke
4d948d1a76 Rename xet_threadpool to xet_runtime to reflect usage (#484)
The xet_threadpool subdirectory is increasingly the place for utilities
related to the runtime, managing file handle limits, etc. This PR simply
renames this directory to reflect this switch.
2025-09-08 13:32:48 -07:00
Assaf Vayner
6203653ecf update api paths to use plural nouns (#482)
Updates paths used by the clients to use latest CAS paths as defined in
the spec.

All paths now use plural nouns and shard upload no longer uses the hash,
removes the prefix and hash from the client trait upload_shard function.
2025-09-08 13:02:49 -07:00
Eliott C.
3ff4eb2d56 Thin wasm: do not automatically set is_dedup to true for first chunk (#481)
Related to https://github.com/huggingface/huggingface.js/pull/1718

We'll want to edit parts of file while loading old data's dedup info

In those case we don't always want to load dedup info for the first
chunk (since it may not be at the beginning of the file)

So the is_dedup = true for first chunk is handled client side
2025-09-06 09:31:58 +02:00
Rajat Arya
cc247a9d5a Add input params to Run name in GH Workflow UI (#478) 2025-08-29 09:21:24 -07:00
Rajat Arya
7f53907434 Bumping version to 1.1.9 (#476) v1.1.9-rc1 v1.1.9 1.1.9-rc1 2025-08-27 15:35:45 -07:00
Rajat Arya
003b154284 Update hf_xet/README.md for hf_xet project (#475)
- wrote hf_xet/README.md about hf_xet
- verified sdist build is successful
- moved docs from hf_xet/README.md to xet-core/README.md
2025-08-27 15:27:56 -07:00
Rajat Arya
50ced6cb65 Update PyPI package metadata for hf-xet (#472)
Fixes #465. Adapts #464.

Verified locally using `pip-licenses` and manual inspection of metadata.

Once merged will verify with PyPI through RC build.

@jsulz, @hoytak, @seanses, @assafvayner: Let me know if you want a
different email address as a maintainer - these tie into your PyPI user
profile.

---------

Co-authored-by: Jared Sulzdorf <j.sulzdorf@gmail.com>
2025-08-26 11:15:52 -07:00