Add python 3.13t and 3.14t to Release builds, they don't follow the standard abi, i.e. "abi3" so need to be build separately.
Update the build and debug symbol generation and publishing workflow to allow different debug symbols for builds targeting different python version.
Update the diagnosis scripts so they find the correct debug symbols.
---------
Co-authored-by: di <di@huggingface.co>
Fix build break due to git safety checks.
The recent logging PR added the automatic logging of the git repository
commit hash that was used to build the wheel. However, this requires
querying for this hash at build time, which requires running git. It
turns out that in the CI, the user checking out the repository and the
user building the wheel are different, which makes git upset. This PR
adds code to tell git everything will be ok.
---------
Co-authored-by: di <di@huggingface.co>
This PR switches the default logging to log events to a file in
'~/.cache/huggingface/xet/logs' (or 'xet/logs' under the specified cache
directory if not `~/.cache/huggingface/`).
In this directory, log files older than 2 weeks are cleaned up on
process start, and if the total size of files in the directory is larger
than 1gb, then log files are deleted by age to get the directory size
under 1gb. Log files are named with a timestamp and PID; by default,
logs newer than 1 day or logs with an active associated PID are never
deleted. All of these are user configurable constants.
The current hub client does not pass revision into the argument, which
causes the moon-landing call to append `create_pr=1` query param to the
token API and returns 403 error.
fix XET-741
Creates an openapi specification for all CAS API's following the first
version of the protocol specification.
Makefile to generate different language clients for CAS APIs.
Fix python 3.14 build compat
1. `pyo3` depend updated to `0.26`: this is required or else it can't be
compiled for python 3.14
2. version update to `1.1.11-dev0`
```
# removed
Fix compat with Rustc 1.86.0
3. change rust conditions that was throwing `unstable` `errors` in `rustc 1.86.0 (05f9846f8 2025-03-31)` (fairly new version, not the latest) _
```
@hoytak @seanses @assafvayner
This PR
- adds to the git-xet release workflow to code-sign Windows executable
"git-xet.exe" using the Microsoft Trusted Signing Service;
- builds a Windows installer for git-xet to place "git-xet.exe" in the
system, modify the system PATH environment variable, and run the command
"git-xet install" to configure git-xet; On uninstallation from "Control
Panel\Programs\Programs and Features", it first runs "git-xet uninstall
--all" so it is deregistered from git-lfs custom transfer.
- signs the built Windows installer msi file.
Currently, the full message given to a function in error_printer must be
always created. This causes extra work when there are no errors if the
message should contain additional data.
This PR introduces `_fn` versions of the existing functions that call a
given function on demand to obtain the message. Thus `.warn_error_fn(||
format!("Error processing {context}"))` would only allocate and create
the error string when an error needs to be logged.
fix XET-681
XET protocol specification initial draft
- documentation of core procedures required for file uploads and
downloads
- format specifications for shards and xorbs
The binaries have the same name "git-xet", uploading them directly leads
to a release failure. We thus zip the binaries with each zip file name
being the name argument in the upload-artifact actions, i.e.
"git-xet-[linux|macos|windows]-${{ matrix.platform.target }}".
1. This PR updates the hub client xet access token request to use custom
rev in addition to the default "main". This better supports the
"xet-write-token" API authorization model:
Clients can get a xet write token, if
- the "rev" is a regular branch, with a HF write token;
- the "rev" is a pr branch with an corresponding open PR, with a HF
write or read token;
- it intends to create a pr and repo is enabled for discussion, with a
HF write or read token.
2. Fixed a bug when getting the current branch name in a repo, which
didn't parse branch names with "/" correctly: change
`refs_heads_branch.rsplit('/').next()` to
`refs_heads_branch.strip_prefix("refs/heads/")`.
3. Also updated xet transfer agent to use the refresh route in the LFS
Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1713)).
4. Use the session id in the LFS Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1657))
for token refresh and CAS requests.
The defines the workflow to build git-xet on Linux (amd64 & arm64),
macOS (amd64 & arm64) and Windows (amd64). For the macOS build the
compiled binary is signed and notarized in place.
- Duration: Currently, we use a lot of _MS and _SEC suffixes in
constants to denote duration. This PR allows std::time::Duration to be
used directly, with values such as "10sec" or "100ms" or "1d" translated
directly into std::time::Duration.
- ByteSize: It also introduces a new utility type, ByteSize, that simply
wraps a u64 but allows the user to specify "1mb" or "45gb" as the value
when setting constant values. The suffixes mb, mib, kb, kib, gb, gib, b,
etc. are all supported, with the default being the raw value.
Adds corresponding Diagnostics script for MacOS.
Lightly tested with hf-xet 1.1.10 and MacOS 15.6.1 - correctly takes
stack traces on interval and writes out to diagnostics folder as
expected.
- Upgrade Rust edition and rustc version to bring in some nice features,
e.g. let chains instead of nested if block.
- Fix clippy and format due to the upgrade.
- Fix a bug identified by the new rustc:
6cb0a7fb4e/xet_runtime/src/runtime.rs (L195)
```
#[cfg(not(target_family = "wasm"))]
{
// A new multithreaded runtime with a capped number of threads
TokioRuntimeBuilder::new_multi_thread().worker_threads(get_num_tokio_worker_threads())
}
```
here the end curly bracket drops the temporary builder while a `&mut
Self` to the dropped value is returned. (this may be due to a difference
between compilers regarding how they treat the scope of "{...}" of
`#[cfg(...))] {...}`?)
This PR adds in two upgrades to the current configurable_constants!
macro that allows for users to specify the values of configuration
constants using environment variables and the like. It adds two things:
- Allows bool values to be parsed by 0, 1, true, false, on, off, etc.
configurable_bool_constants! is no longer needed.
- Allows Option<T> to be a specified type with a default value of None,
which parses the environment value as type T but puts it in Some(Value)
if it's present and None if it's not specified. This allows us to
determine if a value has been specified, e.g. in the case where the
default depends on other things but can be overridden.
Added README to a few crates so that we can link to the crate directory
to link to individual crates and have something rendered for reference
implementation when linking to a specific file doesn't make sense.
- Adds diagnostic scripts to root of repo and references them in README.
- Also reorganizes README to make diagnostics & debugging more visible.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
We must enforce that the next boundary is actually past the minimum
chunk size. In rare cases, a boundary could be triggered before the
minimum chunk size has passed, and this triggering would be based not on
the content of the file but on the previous state of the rolling hash
function. This PR fixes this case.
Allows us to retry errors when we receive I/O errors when sending
requests. For example, on macOS, when we see:
```
No buffer space available (os error 55)
```
when downloading from S3, we should wait and retry once the system has
more network resources.
This PR builds a Git integration called `git-xet` that enables users to
upload files using the Xet protocol as part of a standard git push.
This integration builds on the Git LFS custom transfer adapter protocol,
the same mechanism we now use to handle Git LFS uploads for files larger
than 5 GB through multipart PUT.
To enable uploads to Xet, users run `git-xet install`, which writes the
following configuration to the Git config file at a selected scope
[`--system`, `--global` (default), or `--local`]:
```
[lfs "customtransfer.xet"]
path = git-xet
args = transfer
concurrent = true
```
This setup registers a new transfer adapter named xet, allowing Git to
delegate LFS file transfers to the git-xet binary when applicable.
On the server side, support is rolled out in two stages:
Stage 1 (Upload): The Git LFS batch API for the "upload" operation is
updated.
- If a repo is Xet enabled but users didn't run git-xet install,
moon-landing rejects the request when users initiated git push and
returns an instruction to install git-xet.
- If a repo is Xet enabled and users have git-xet configured correctly,
moon-landing accepts the request and replies with CAS server URL and
access token, which git-xet will use to upload files to Xet.
- If a repo is NOT Xet enabled, upload goes through the LFS path.
Using the file hashing components in WASM found a bug that using 32 but
usize causes errors when hashing the file.
This PR enforces the use of u64 everywhere along that path (and also
pins the wasm-bindgen version)
The xet_threadpool subdirectory is increasingly the place for utilities
related to the runtime, managing file handle limits, etc. This PR simply
renames this directory to reflect this switch.
Updates paths used by the clients to use latest CAS paths as defined in
the spec.
All paths now use plural nouns and shard upload no longer uses the hash,
removes the prefix and hash from the client trait upload_shard function.
Related to https://github.com/huggingface/huggingface.js/pull/1718
We'll want to edit parts of file while loading old data's dedup info
In those case we don't always want to load dedup info for the first
chunk (since it may not be at the beginning of the file)
So the is_dedup = true for first chunk is handled client side
Fixes#465. Adapts #464.
Verified locally using `pip-licenses` and manual inspection of metadata.
Once merged will verify with PyPI through RC build.
@jsulz, @hoytak, @seanses, @assafvayner: Let me know if you want a
different email address as a maintainer - these tie into your PyPI user
profile.
---------
Co-authored-by: Jared Sulzdorf <j.sulzdorf@gmail.com>