fix XET-681
XET protocol specification initial draft
- documentation of core procedures required for file uploads and
downloads
- format specifications for shards and xorbs
- Upgrade Rust edition and rustc version to bring in some nice features,
e.g. let chains instead of nested if block.
- Fix clippy and format due to the upgrade.
- Fix a bug identified by the new rustc:
6cb0a7fb4e/xet_runtime/src/runtime.rs (L195)
```
#[cfg(not(target_family = "wasm"))]
{
// A new multithreaded runtime with a capped number of threads
TokioRuntimeBuilder::new_multi_thread().worker_threads(get_num_tokio_worker_threads())
}
```
here the end curly bracket drops the temporary builder while a `&mut
Self` to the dropped value is returned. (this may be due to a difference
between compilers regarding how they treat the scope of "{...}" of
`#[cfg(...))] {...}`?)
This PR builds a Git integration called `git-xet` that enables users to
upload files using the Xet protocol as part of a standard git push.
This integration builds on the Git LFS custom transfer adapter protocol,
the same mechanism we now use to handle Git LFS uploads for files larger
than 5 GB through multipart PUT.
To enable uploads to Xet, users run `git-xet install`, which writes the
following configuration to the Git config file at a selected scope
[`--system`, `--global` (default), or `--local`]:
```
[lfs "customtransfer.xet"]
path = git-xet
args = transfer
concurrent = true
```
This setup registers a new transfer adapter named xet, allowing Git to
delegate LFS file transfers to the git-xet binary when applicable.
On the server side, support is rolled out in two stages:
Stage 1 (Upload): The Git LFS batch API for the "upload" operation is
updated.
- If a repo is Xet enabled but users didn't run git-xet install,
moon-landing rejects the request when users initiated git push and
returns an instruction to install git-xet.
- If a repo is Xet enabled and users have git-xet configured correctly,
moon-landing accepts the request and replies with CAS server URL and
access token, which git-xet will use to upload files to Xet.
- If a repo is NOT Xet enabled, upload goes through the LFS path.
Updates paths used by the clients to use latest CAS paths as defined in
the spec.
All paths now use plural nouns and shard upload no longer uses the hash,
removes the prefix and hash from the client trait upload_shard function.
This PR adds an explicit lint command on the hf_xet directory. This is
necessary because it is excluded from the workspace. Other excluded
directories aren't touched very often and are less important for now.
Re-adding a thin wasm crate for JS client development.
checks build in ci job build_and_test-wasm
only includes a wrapper over a chunker and function to compute xorb hash
at the moment.
This implements uploading through Xet protocol in WASM environment, and
makes necessary changes to make dependent crates WASM compatible.
1. Uploading through Xet protocol is done in hf_xet_wasm crate;
2. Separate Cas Client trait definitions into upload and download
functionality groups and disable download for WASM;
3. Disable Cas Client request retry in WASM environment, which isn't
critical for a POC (until we have a retry strategy that doesn't depends
on time);
4. Disable async CasObject deserialization;
5. Enable in-memory global dedup;
---------
Co-authored-by: Assaf Vayner <assaf@huggingface.co>
We keep having out of date hf_xet/Cargo.lock, likely people are not
building hf_xet 100% of the time they are pushing to the repo. This PR
enforces that hf_xet/Cargo.lock and the root Cargo.lock must be up to
date, a CI job will fail if this is not true.
* Adds a job to run tests on windows in addition to linux.
* Fixes linting for windows builds
* Fixes the LocalClient that tries to set files as read-only, which,
during tests, will break on windows due to file deletion behaving
differently than on Linux.
* Identified an issue with the disk cache deleting items if there are
simultaneous `put`s to the cache for the same key, range. This is fine
on unix, but on windows, causes errors (again, due to file deletion
behavior differences). This change mitigates the issue, allowing
huggingface_hub tests to pass on windows, but opens up another issue of
us needing to vet our filesystem deletes (e.g. cache eviction) for
correctness on windows.