1. This PR updates the hub client xet access token request to use custom
rev in addition to the default "main". This better supports the
"xet-write-token" API authorization model:
Clients can get a xet write token, if
- the "rev" is a regular branch, with a HF write token;
- the "rev" is a pr branch with an corresponding open PR, with a HF
write or read token;
- it intends to create a pr and repo is enabled for discussion, with a
HF write or read token.
2. Fixed a bug when getting the current branch name in a repo, which
didn't parse branch names with "/" correctly: change
`refs_heads_branch.rsplit('/').next()` to
`refs_heads_branch.strip_prefix("refs/heads/")`.
3. Also updated xet transfer agent to use the refresh route in the LFS
Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1713)).
4. Use the session id in the LFS Batch Api
[response](e3be2b3c8f/server/app/gitHostingRoutes.ts (L1657))
for token refresh and CAS requests.
The defines the workflow to build git-xet on Linux (amd64 & arm64),
macOS (amd64 & arm64) and Windows (amd64). For the macOS build the
compiled binary is signed and notarized in place.
- Duration: Currently, we use a lot of _MS and _SEC suffixes in
constants to denote duration. This PR allows std::time::Duration to be
used directly, with values such as "10sec" or "100ms" or "1d" translated
directly into std::time::Duration.
- ByteSize: It also introduces a new utility type, ByteSize, that simply
wraps a u64 but allows the user to specify "1mb" or "45gb" as the value
when setting constant values. The suffixes mb, mib, kb, kib, gb, gib, b,
etc. are all supported, with the default being the raw value.
Adds corresponding Diagnostics script for MacOS.
Lightly tested with hf-xet 1.1.10 and MacOS 15.6.1 - correctly takes
stack traces on interval and writes out to diagnostics folder as
expected.
- Upgrade Rust edition and rustc version to bring in some nice features,
e.g. let chains instead of nested if block.
- Fix clippy and format due to the upgrade.
- Fix a bug identified by the new rustc:
6cb0a7fb4e/xet_runtime/src/runtime.rs (L195)
```
#[cfg(not(target_family = "wasm"))]
{
// A new multithreaded runtime with a capped number of threads
TokioRuntimeBuilder::new_multi_thread().worker_threads(get_num_tokio_worker_threads())
}
```
here the end curly bracket drops the temporary builder while a `&mut
Self` to the dropped value is returned. (this may be due to a difference
between compilers regarding how they treat the scope of "{...}" of
`#[cfg(...))] {...}`?)
This PR adds in two upgrades to the current configurable_constants!
macro that allows for users to specify the values of configuration
constants using environment variables and the like. It adds two things:
- Allows bool values to be parsed by 0, 1, true, false, on, off, etc.
configurable_bool_constants! is no longer needed.
- Allows Option<T> to be a specified type with a default value of None,
which parses the environment value as type T but puts it in Some(Value)
if it's present and None if it's not specified. This allows us to
determine if a value has been specified, e.g. in the case where the
default depends on other things but can be overridden.
Added README to a few crates so that we can link to the crate directory
to link to individual crates and have something rendered for reference
implementation when linking to a specific file doesn't make sense.
- Adds diagnostic scripts to root of repo and references them in README.
- Also reorganizes README to make diagnostics & debugging more visible.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
We must enforce that the next boundary is actually past the minimum
chunk size. In rare cases, a boundary could be triggered before the
minimum chunk size has passed, and this triggering would be based not on
the content of the file but on the previous state of the rolling hash
function. This PR fixes this case.
Allows us to retry errors when we receive I/O errors when sending
requests. For example, on macOS, when we see:
```
No buffer space available (os error 55)
```
when downloading from S3, we should wait and retry once the system has
more network resources.
This PR builds a Git integration called `git-xet` that enables users to
upload files using the Xet protocol as part of a standard git push.
This integration builds on the Git LFS custom transfer adapter protocol,
the same mechanism we now use to handle Git LFS uploads for files larger
than 5 GB through multipart PUT.
To enable uploads to Xet, users run `git-xet install`, which writes the
following configuration to the Git config file at a selected scope
[`--system`, `--global` (default), or `--local`]:
```
[lfs "customtransfer.xet"]
path = git-xet
args = transfer
concurrent = true
```
This setup registers a new transfer adapter named xet, allowing Git to
delegate LFS file transfers to the git-xet binary when applicable.
On the server side, support is rolled out in two stages:
Stage 1 (Upload): The Git LFS batch API for the "upload" operation is
updated.
- If a repo is Xet enabled but users didn't run git-xet install,
moon-landing rejects the request when users initiated git push and
returns an instruction to install git-xet.
- If a repo is Xet enabled and users have git-xet configured correctly,
moon-landing accepts the request and replies with CAS server URL and
access token, which git-xet will use to upload files to Xet.
- If a repo is NOT Xet enabled, upload goes through the LFS path.
Using the file hashing components in WASM found a bug that using 32 but
usize causes errors when hashing the file.
This PR enforces the use of u64 everywhere along that path (and also
pins the wasm-bindgen version)
The xet_threadpool subdirectory is increasingly the place for utilities
related to the runtime, managing file handle limits, etc. This PR simply
renames this directory to reflect this switch.
Updates paths used by the clients to use latest CAS paths as defined in
the spec.
All paths now use plural nouns and shard upload no longer uses the hash,
removes the prefix and hash from the client trait upload_shard function.
Related to https://github.com/huggingface/huggingface.js/pull/1718
We'll want to edit parts of file while loading old data's dedup info
In those case we don't always want to load dedup info for the first
chunk (since it may not be at the beginning of the file)
So the is_dedup = true for first chunk is handled client side
Fixes#465. Adapts #464.
Verified locally using `pip-licenses` and manual inspection of metadata.
Once merged will verify with PyPI through RC build.
@jsulz, @hoytak, @seanses, @assafvayner: Let me know if you want a
different email address as a maintainer - these tie into your PyPI user
profile.
---------
Co-authored-by: Jared Sulzdorf <j.sulzdorf@gmail.com>
we had 1 case of using "raw" tokio_retry rather than the retry utility.
This was due to using a special custom parsing logic for chunks, rather
than built in json functionality. This PR adds a run_and_extract custom
that let's a user specify the function to parse the response body.
On OSX, raise the soft file handle limits to the hard limits (which
cannot be changed in the process).
---------
Co-authored-by: Assaf Vayner <assaf@huggingface.co>
This PR adds an explicit lint command on the hf_xet directory. This is
necessary because it is excluded from the workspace. Other excluded
directories aren't touched very often and are less important for now.
removing async_scoped dep and creating new parallel util to replace
tokio_par_for_each that relied on async_scoped.
Any usage of tokio_par_for_each which is the only fn used out of
parutils has been replaced with the new
`tokio_run_max_concurrency_fold_result_with_semaphore`
TODO:
- [x] add more tests
- [x] use semaphore acquired from the global semaphore provider where/if
relevant.
This PR caches the reqwest::Client in a runtime if it exists and re-uses
it in all wrapped clients in the `RemoteClient` object. This effectively
shares the connection pool and thus reduces opening sockets.
Fix XET-704
Different programming languages and platforms may have different byte
order layout of a u64 in memory, this is not an issue when we typecast a
[u8;32] to [u64;4] (which is essentially a pointer typecasting and thus
doesn't reorder bytes at all), until these bytes are actually used as
u64s. For such cases we make it explicit to use the little-endian order.
This PR fixes the regression that part of the retry logic for
downloading was accidentally removed. The added back retry logic
complements the retry for deserializing the data stream of responses.
When a singleflight `Group` is called by many tasks for a particular
key, one of these tasks is chosen as the `owner` to actually perform the
work. The other tasks are considered `waiters` and will wait until the
`owner` completes the work.
In the normal case, the `owner` runs the work, takes the result and
provides it to any `waiter` tasks. It is also the responsibility of the
`owner` to clean up the state it added to the singleflight `Group`,
namely, the `Call` record in the `Group`'s `CallMap`.
However, if the `owner` is `drop`'ed while waiting for the work to
complete (e.g. by its task being canceled), then the work will still
finish and any waiter tasks will be notified (as those actions are part
of a separately spawned task), but the state the owner added to the
`Group` is not cleaned up (i.e. there is still a lingering mapping of
key -> `Call`).
What this means is that if the `owner` is dropped before it can remove
the mapping, then the results of the work are permanently "cached" in
the singleflight `Group` and any subsequent tasks looking for the key
will always get back those results until the `Group` is dropped (usually
on process restart).
Since much of the work we use singleflight for is downloading immutable
blobs of data, "caching" the `Call` results doesn't sound that terrible
(we already cache some stuff outside of singleflight). However, this is
caching the `Result`, not just the `Ok(…)` variant, so if the work
returned an `Error`, then that becomes what is permanently cached,
which, for intermittent networking issues can cause problems.
This PR fixes the issue by adding a RAII guard struct when the `Call` is
added to the `CallMap` so that the `work()` function will remove the
`Call` when the owner exits the function (either successfully, due to
error/panic, or if it is dropped).
See the `test_dropped_owner` test for an example of how this situation
can occur.
Some more testing found that 64 parallel range gets can still possibly
exhaust the existing file handles on OSX; this addressed this issue
without (hopefully) impacting the download performance.
This PR removes the unused telemetry code from hf_xet.
In addition, it also removes the Mutex around the logging setup, which
appears to cause an intermittent hang when os.fork() gets involved.
This PR ensures that none of the tokio thread state exists through a
call to python's os.fork() as used in the multiprocessing library. For
an explanation of the issue, see
https://github.com/vllm-project/vllm/blob/main/docs/design/multiprocessing.md#tradeoffs.
It does this by offloading all the async calls to a separate and
transient OS thread, which would not exist after the spawn process. Thus
any possible restart of the tokio runtime due to a spawn would occur in
a clean environment and without thread-local storage causing issues.
To accomplish this, this PR refactors the hf_xet logging layer to
separate it out from the python runtime, as the python runtime is not
Send/Sync. This also simplifies this layer somewhat and isolates the
telemetry reporting logic so that only the background sending thread of
the telemetry logic is restarted after a spawn.
In addition, this PR removes the use of parking_lot, both in
singleflight.rs and as part of tokio. The library is not safe across
fork(); in particular, note
9c810e4a11/core/src/parking_lot.rs (L51).
If a parent process spawns a child process while permits from a static
semaphore are issued, the number of permits available to the child
process will be reduced for the entirety of the child's lifetime, even
when the parent process permits are returned. This could potentially
cause a deadlock or painful slowdown on upload or download. This PR
moves all our static semaphores to ones associated with the runtime, so
after a spawn they are reset.
Currently, tokio spins up async worker threads equal to the number of
cores, which can be quite large on huge machines, e.g. 128. This isn't
needed to keep everything running; we already offload much of the
compute to blocking threads and so here the number is a significant
overkill, especially if hf_xet is used for downloading only a few files.
This PR limits the number of async worker threads to 32 unless
TOKIO_WORKER_THREADS is set, in which case that value is used. It also
removes the cap on the number of blocking threads tokio can spin up as
needed as there is no real reason to not use tokio's default value
there.
---------
Co-authored-by: Di Xiao <di@huggingface.co>