mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
Add README.md files and Cargo.toml updates needed for publishing hf-xet (#773)
This PR adds crates.io-facing metadata (homepage, readme, keywords, categories) for the publishable crates, along with crate README files and concise crate-level docs so crates.io and docs.rs pages have better context.
This commit is contained in:
@@ -18,6 +18,7 @@ exclude = ["simulation/chunk_cache_bench", "hf_xet", "wasm/hf_xet_wasm", "wasm/h
|
||||
version = "1.4.0"
|
||||
edition = "2024"
|
||||
license = "Apache-2.0"
|
||||
homepage = "https://github.com/huggingface/xet-core"
|
||||
repository = "https://github.com/huggingface/xet-core"
|
||||
|
||||
[profile.release]
|
||||
|
||||
54
README.md
54
README.md
@@ -37,6 +37,36 @@ xet-core enables huggingface_hub to utilize xet storage for uploading and downlo
|
||||
|
||||
🔖 **local disk caching**: chunk-based cache that sits alongside the existing [huggingface_hub disk cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache).
|
||||
|
||||
## Packages
|
||||
|
||||
This repository produces the following packages:
|
||||
|
||||
### Rust Crates (crates.io)
|
||||
|
||||
| Crate | Description |
|
||||
|-------|-------------|
|
||||
| [`hf-xet`](https://crates.io/crates/hf-xet) | High-level client library for uploading and downloading files with chunk-based deduplication |
|
||||
| [`xet-client`](https://crates.io/crates/xet-client) | HTTP client for communicating with Hugging Face Xet storage servers |
|
||||
| [`xet-data`](https://crates.io/crates/xet-data) | Data processing pipeline for chunking, deduplication, and file reconstruction |
|
||||
| [`xet-core-structures`](https://crates.io/crates/xet-core-structures) | Core data structures including MerkleHash, metadata shards, and Xorb objects |
|
||||
| [`xet-runtime`](https://crates.io/crates/xet-runtime) | Async runtime, configuration, logging, and utility infrastructure |
|
||||
|
||||
### Python Package (PyPI)
|
||||
|
||||
| Package | Description |
|
||||
|---------|-------------|
|
||||
| [`hf-xet`](https://pypi.org/project/hf-xet/) | Python bindings for the Xet storage system, used by [huggingface_hub](https://github.com/huggingface/huggingface_hub) |
|
||||
|
||||
Built from the [`hf_xet/`](./hf_xet) directory using [maturin](https://github.com/PyO3/maturin).
|
||||
|
||||
### CLI Binary
|
||||
|
||||
| Binary | Description |
|
||||
|--------|-------------|
|
||||
| `git-xet` | Git LFS compatible command-line tool for Xet storage |
|
||||
|
||||
Built from the [`git_xet/`](./git_xet) directory. Distributed via [GitHub releases](https://github.com/huggingface/xet-core/releases).
|
||||
|
||||
## Contributions (feature requests, bugs, etc.) are encouraged & appreciated 💙💚💛💜🧡❤️
|
||||
|
||||
Please join us in making xet-core better. We value everyone's contributions. Code is not the only way to help. Answering questions, helping each other, improving documentation, filing issues all help immensely. If you are interested in contributing (please do!), check out the [contribution guide](https://github.com/huggingface/xet-core/blob/main/CONTRIBUTING.md) for this repository.
|
||||
@@ -73,21 +103,17 @@ HF_XET_LOG_FILE=/tmp/xet.log # write logs to a file (defaults to stdout)
|
||||
|
||||
## Local Development
|
||||
|
||||
### Repo Organization - Rust Crates
|
||||
### Repo Organization
|
||||
|
||||
* [cas_client](./cas_client): communication with CAS backend services, which include APIs for Xorbs and Shards.
|
||||
* [cas_object](./cas_object): CAS object (Xorb) format and associated APIs, including chunks (ranges within Xorbs).
|
||||
* [cas_types](./cas_types): common types shared across crates in xet-core and xetcas.
|
||||
* [chunk_cache](./chunk_cache): local disk cache of Xorb chunks.
|
||||
* [chunk_cache_bench](./chunk_cache_bench): benchmarking crate for chunk_cache.
|
||||
* [data](./data): main driver for client operations - FilePointerTranslator drives hydrating or shrinking files, chunking + deduplication here.
|
||||
* [error_printer](./error_printer): utility for printing errors conveniently.
|
||||
* [file_utils](./file_utils): SafeFileCreator utility, used by chunk_cache.
|
||||
* [hf_xet](./hf_xet): Python integration with Rust code, uses maturin to build `hf-xet` Python package. Main integration with HF Hub Python package.
|
||||
* [mdb_shard](./mdb_shard): Shard operations, including Shard format, dedupe probing, benchmarks, and utilities.
|
||||
* [merklehash](./merklehash): MerkleHash type, 256-bit hash, widely used across many crates.
|
||||
* [progress_reporting](./progress_reporting): offers ReportedWriter so progress for Writer operations can be displayed.
|
||||
* [utils](./utils): general utilities, including singleflight, progress, serialization_utils and threadpool.
|
||||
* [`xet_pkg/`](./xet_pkg) (`hf-xet`): High-level session API for uploading and downloading files with deduplication.
|
||||
* [`xet_client/`](./xet_client) (`xet-client`): HTTP client for CAS and Hub backend services.
|
||||
* [`xet_data/`](./xet_data) (`xet-data`): Chunking, deduplication, and file reconstruction pipeline.
|
||||
* [`xet_core_structures/`](./xet_core_structures) (`xet-core-structures`): MerkleHash, metadata shards, Xorb objects, and shared data structures.
|
||||
* [`xet_runtime/`](./xet_runtime) (`xet-runtime`): Async runtime, configuration, logging, and utilities.
|
||||
* [`hf_xet/`](./hf_xet): Python bindings (maturin/PyO3), produces the `hf-xet` PyPI package.
|
||||
* [`git_xet/`](./git_xet): Git LFS compatible CLI tool (`git-xet`).
|
||||
* [`wasm/`](./wasm): WebAssembly builds (`hf_xet_wasm`, `hf_xet_thin_wasm`).
|
||||
* [`simulation/`](./simulation): Simulation and benchmarking infrastructure.
|
||||
|
||||
### Build, Test & Benchmark
|
||||
|
||||
|
||||
@@ -3,8 +3,12 @@ name = "xet-client"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
homepage.workspace = true
|
||||
repository.workspace = true
|
||||
description = "HTTPS client for communicating with Hugging Face Xet storage servers"
|
||||
description = "Client library for communicating with Hugging Face Xet storage servers. Use through the hf-xet crate."
|
||||
readme = "README.md"
|
||||
keywords = ["huggingface"]
|
||||
categories = ["artificial-intelligence", "network-programming"]
|
||||
|
||||
[lib]
|
||||
name = "xet_client"
|
||||
@@ -42,7 +46,13 @@ url = { workspace = true }
|
||||
urlencoding = { workspace = true }
|
||||
|
||||
[target.'cfg(target_family = "wasm")'.dependencies]
|
||||
tokio = { workspace = true, features = ["sync", "macros", "io-util", "rt", "time"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"sync",
|
||||
"macros",
|
||||
"io-util",
|
||||
"rt",
|
||||
"time",
|
||||
] }
|
||||
web-time = { workspace = true }
|
||||
|
||||
[target.'cfg(not(target_family = "wasm"))'.dependencies]
|
||||
@@ -65,7 +75,14 @@ rustls-tls = ["reqwest/rustls"]
|
||||
native-tls = ["reqwest/native-tls"]
|
||||
native-tls-vendored = ["reqwest/native-tls-vendored"]
|
||||
analysis = []
|
||||
simulation = ["dep:axum", "dep:humantime", "dep:futures-util", "dep:human-bandwidth", "dep:tower-http", "xet-core-structures/simulation"]
|
||||
simulation = [
|
||||
"dep:axum",
|
||||
"dep:humantime",
|
||||
"dep:futures-util",
|
||||
"dep:human-bandwidth",
|
||||
"dep:tower-http",
|
||||
"xet-core-structures/simulation",
|
||||
]
|
||||
|
||||
[[bin]]
|
||||
name = "local_cas_server"
|
||||
|
||||
17
xet_client/README.md
Normal file
17
xet_client/README.md
Normal file
@@ -0,0 +1,17 @@
|
||||
# xet-client
|
||||
|
||||
[](https://crates.io/crates/xet-client)
|
||||
[](https://docs.rs/xet-client)
|
||||
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
|
||||
|
||||
Client for communicating with Hugging Face Xet storage servers.
|
||||
|
||||
## Overview
|
||||
|
||||
Upload and download data and metadata objects from the backend Hugging Face Xet storage servers. Features automatic concurrency adaptations, connection pooling, and retry resiliency. Intended to be used through the API in the hf-xet package.
|
||||
|
||||
This crate is part of [xet-core](https://github.com/huggingface/xet-core).
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0
|
||||
@@ -1,3 +1,9 @@
|
||||
//! HTTPS client for communicating with Hugging Face Xet storage servers.
|
||||
//!
|
||||
//! Includes the [`cas_client`] for uploading/downloading Xorb objects and
|
||||
//! metadata shards, the [`hub_client`] for Hugging Face Hub API
|
||||
//! interactions, and a local [`chunk_cache`] with LRU eviction.
|
||||
|
||||
#![cfg_attr(feature = "strict", deny(warnings))]
|
||||
|
||||
pub mod error;
|
||||
|
||||
@@ -3,8 +3,12 @@ name = "xet-core-structures"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
homepage.workspace = true
|
||||
repository.workspace = true
|
||||
description = "Core data structures including MerkleHash, metadata shards, and xorb objects"
|
||||
description = "Core data structures including MerkleHash, metadata shards, and Xorb objects."
|
||||
readme = "README.md"
|
||||
keywords = ["huggingface"]
|
||||
categories = ["artificial-intelligence", "data-structures"]
|
||||
|
||||
[lib]
|
||||
name = "xet_core_structures"
|
||||
@@ -48,13 +52,27 @@ tracing = { workspace = true }
|
||||
|
||||
[target.'cfg(not(target_family = "wasm"))'.dependencies]
|
||||
bytemuck = { workspace = true }
|
||||
tokio = { workspace = true, features = ["time", "rt", "macros", "sync", "test-util", "io-util", "rt-multi-thread"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"time",
|
||||
"rt",
|
||||
"macros",
|
||||
"sync",
|
||||
"test-util",
|
||||
"io-util",
|
||||
"rt-multi-thread",
|
||||
] }
|
||||
tokio-util = { workspace = true, features = ["io"] }
|
||||
uuid = { workspace = true, features = ["v4"] }
|
||||
|
||||
[target.'cfg(target_family = "wasm")'.dependencies]
|
||||
getrandom = { workspace = true, features = ["wasm_js"] }
|
||||
tokio = { workspace = true, features = ["sync", "macros", "io-util", "rt", "time"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"sync",
|
||||
"macros",
|
||||
"io-util",
|
||||
"rt",
|
||||
"time",
|
||||
] }
|
||||
uuid = { workspace = true, features = ["v4", "js"] }
|
||||
web-time = { workspace = true }
|
||||
|
||||
|
||||
23
xet_core_structures/README.md
Normal file
23
xet_core_structures/README.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# xet-core-structures
|
||||
|
||||
[](https://crates.io/crates/xet-core-structures)
|
||||
[](https://docs.rs/xet-core-structures)
|
||||
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
|
||||
|
||||
Core data structures for the
|
||||
[Hugging Face Xet](https://github.com/huggingface/xet-core) storage system,
|
||||
including Merkle hashes, metadata shards, and Xorb objects.
|
||||
|
||||
## Overview
|
||||
|
||||
- **MerkleHash** — 256-bit content-addressed hash used throughout the system
|
||||
- **Metadata shards** — Compact shard format mapping file ranges to Xorb chunks
|
||||
- **Xorb objects** — Content-addressed storage objects with byte-grouping compression
|
||||
- **Data structures** — Specialized hash maps and utilities for deduplication
|
||||
|
||||
This crate is part of [xet-core](https://github.com/huggingface/xet-core),
|
||||
the Rust backend for [huggingface_hub](https://github.com/huggingface/huggingface_hub).
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0
|
||||
@@ -1,3 +1,10 @@
|
||||
//! Core data structures for the Hugging Face Xet storage system.
|
||||
//!
|
||||
//! Provides [`merklehash::MerkleHash`] (256-bit content-addressed hashes),
|
||||
//! [`metadata_shard`] (compact shard format mapping file ranges to Xorb
|
||||
//! chunks), and [`xorb_object`] (content-addressed storage objects with
|
||||
//! byte-grouping compression).
|
||||
|
||||
#![cfg_attr(feature = "strict", deny(warnings))]
|
||||
|
||||
pub mod error;
|
||||
|
||||
@@ -3,8 +3,12 @@ name = "xet-data"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
homepage.workspace = true
|
||||
repository.workspace = true
|
||||
description = "Data processing pipeline for chunking, deduplication, and file reconstruction; used in the Hugging Face Xet client tools"
|
||||
description = "Data processing pipeline for chunking, deduplication, and file reconstruction; used in the Hugging Face Xet client tools. Intended to be used through the API in the hf-xet package."
|
||||
readme = "README.md"
|
||||
keywords = ["huggingface"]
|
||||
categories = ["artificial-intelligence", "data-structures", "filesystem"]
|
||||
|
||||
[lib]
|
||||
name = "xet_data"
|
||||
@@ -33,13 +37,19 @@ tempfile = { workspace = true }
|
||||
thiserror = { workspace = true }
|
||||
tokio-util = { workspace = true }
|
||||
tracing = { workspace = true }
|
||||
ulid = {workspace = true }
|
||||
ulid = { workspace = true }
|
||||
url = { workspace = true }
|
||||
walkdir = { workspace = true }
|
||||
pyo3 = { version = "0.26", features = ["abi3-py37"], optional = true }
|
||||
|
||||
[target.'cfg(target_family = "wasm")'.dependencies]
|
||||
tokio = { workspace = true, features = ["sync", "macros", "io-util", "rt", "time"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"sync",
|
||||
"macros",
|
||||
"io-util",
|
||||
"rt",
|
||||
"time",
|
||||
] }
|
||||
|
||||
[target.'cfg(not(target_family = "wasm"))'.dependencies]
|
||||
tokio = { workspace = true, features = ["rt-multi-thread", "rt", "time"] }
|
||||
|
||||
20
xet_data/README.md
Normal file
20
xet_data/README.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# xet-data
|
||||
|
||||
[](https://crates.io/crates/xet-data)
|
||||
[](https://docs.rs/xet-data)
|
||||
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
|
||||
|
||||
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Content-defined chunking** — Gear-hash based chunking for deduplication
|
||||
- **Deduplication** — Probe and register chunks against metadata shards
|
||||
- **File reconstruction** — Reassemble files from deduplicated chunk references
|
||||
- **Progress tracking** — Hooks for upload/download progress reporting
|
||||
|
||||
This crate is part of [xet-core](https://github.com/huggingface/xet-core).
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0
|
||||
@@ -1,3 +1,10 @@
|
||||
//! Data processing pipeline for chunking, deduplication, and file
|
||||
//! reconstruction, used in the Hugging Face Xet storage tools.
|
||||
//!
|
||||
//! Provides content-defined chunking via gear hashing, deduplication
|
||||
//! against metadata shards, and file reconstruction from deduplicated
|
||||
//! chunk references.
|
||||
|
||||
#![cfg_attr(feature = "strict", deny(warnings))]
|
||||
|
||||
pub mod error;
|
||||
|
||||
@@ -3,10 +3,23 @@ name = "hf-xet"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
homepage.workspace = true
|
||||
repository.workspace = true
|
||||
description = "Client library and tooling for the Hugging Face Xet data storage system"
|
||||
keywords = ["huggingface", "xet", "data", "ai"]
|
||||
categories = ["api-bindings"]
|
||||
description = "Client library and tooling for the Hugging Face Xet data storage system."
|
||||
readme = "README.md"
|
||||
keywords = [
|
||||
"huggingface",
|
||||
"datasets",
|
||||
"large-files",
|
||||
"deduplication",
|
||||
"cloud-storage",
|
||||
]
|
||||
categories = [
|
||||
"artificial-intelligence",
|
||||
"asynchronous",
|
||||
"data-structures",
|
||||
"filesystem",
|
||||
]
|
||||
|
||||
[lib]
|
||||
name = "xet"
|
||||
@@ -46,6 +59,11 @@ serde_json = { workspace = true }
|
||||
serial_test = { workspace = true }
|
||||
smol = { workspace = true }
|
||||
tempfile = { workspace = true }
|
||||
tokio = { workspace = true, features = ["rt-multi-thread", "rt", "time", "macros"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"rt-multi-thread",
|
||||
"rt",
|
||||
"time",
|
||||
"macros",
|
||||
] }
|
||||
tracing-subscriber = { workspace = true }
|
||||
wiremock = { workspace = true }
|
||||
|
||||
34
xet_pkg/README.md
Normal file
34
xet_pkg/README.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# hf-xet
|
||||
|
||||
[](https://crates.io/crates/hf-xet)
|
||||
[](https://docs.rs/hf-xet)
|
||||
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
|
||||
|
||||
Client library for the [Hugging Face Xet](https://github.com/huggingface/xet-core)
|
||||
data storage system. Provides the high-level session API for uploading and
|
||||
downloading files with chunk-based deduplication.
|
||||
|
||||
## Overview
|
||||
|
||||
- **XetSession** — Top-level session managing authentication, configuration,
|
||||
and concurrent file transfers
|
||||
- **Upload & download** — Stream files to/from Hugging Face Hub with automatic
|
||||
chunking, deduplication, and local caching
|
||||
|
||||
## Crate Ecosystem
|
||||
|
||||
`hf-xet` ties together the lower-level xet-core crates:
|
||||
|
||||
| Crate | Role |
|
||||
|-------|------|
|
||||
| [`xet-runtime`](https://crates.io/crates/xet-runtime) | Async runtime, config, logging |
|
||||
| [`xet-core-structures`](https://crates.io/crates/xet-core-structures) | Merkle hashes, shards, Xorb objects |
|
||||
| [`xet-client`](https://crates.io/crates/xet-client) | HTTP client for CAS and Hub APIs |
|
||||
| [`xet-data`](https://crates.io/crates/xet-data) | Chunking, dedup, file reconstruction |
|
||||
|
||||
This crate is part of [xet-core](https://github.com/huggingface/xet-core),
|
||||
the Rust backend for [huggingface_hub](https://github.com/huggingface/huggingface_hub).
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0
|
||||
@@ -1,3 +1,10 @@
|
||||
//! Client library for the Hugging Face Xet data storage system.
|
||||
//!
|
||||
//! Provides the high-level [`xet_session::XetSession`] API for uploading
|
||||
//! and downloading files with chunk-based deduplication, tying together
|
||||
//! the lower-level [`xet_runtime`], [`xet_core_structures`],
|
||||
//! [`xet_client`], and [`xet_data`] crates.
|
||||
|
||||
pub mod error;
|
||||
pub use error::XetError;
|
||||
#[cfg(feature = "python")]
|
||||
|
||||
@@ -3,8 +3,12 @@ name = "xet-runtime"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
license.workspace = true
|
||||
homepage.workspace = true
|
||||
repository.workspace = true
|
||||
description = "Async runtime, configuration, logging, and utility infrastructure for the Hugging Face Xet client tools"
|
||||
description = "Async runtime, configuration, logging, and utility infrastructure for the Hugging Face Xet client tools."
|
||||
readme = "README.md"
|
||||
keywords = ["huggingface"]
|
||||
categories = ["artificial-intelligence", "asynchronous"]
|
||||
|
||||
[lib]
|
||||
name = "xet_runtime"
|
||||
@@ -48,11 +52,25 @@ winapi = { workspace = true }
|
||||
whoami = { workspace = true }
|
||||
|
||||
[target.'cfg(target_family = "wasm")'.dependencies]
|
||||
tokio = { workspace = true, features = ["sync", "macros", "io-util", "rt", "time"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"sync",
|
||||
"macros",
|
||||
"io-util",
|
||||
"rt",
|
||||
"time",
|
||||
] }
|
||||
|
||||
[target.'cfg(not(target_family = "wasm"))'.dependencies]
|
||||
shellexpand = { workspace = true, features = ["path"] }
|
||||
tokio = { workspace = true, features = ["time", "rt", "macros", "sync", "test-util", "io-util", "rt-multi-thread"] }
|
||||
tokio = { workspace = true, features = [
|
||||
"time",
|
||||
"rt",
|
||||
"macros",
|
||||
"sync",
|
||||
"test-util",
|
||||
"io-util",
|
||||
"rt-multi-thread",
|
||||
] }
|
||||
tokio-util = { workspace = true, features = ["io"] }
|
||||
|
||||
[target.'cfg(target_os = "macos")'.dependencies]
|
||||
|
||||
25
xet_runtime/README.md
Normal file
25
xet_runtime/README.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# xet-runtime
|
||||
|
||||
[](https://crates.io/crates/xet-runtime)
|
||||
[](https://docs.rs/xet-runtime)
|
||||
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
|
||||
|
||||
Async runtime, configuration storage, logging, and utility infrastructure for the
|
||||
[Hugging Face Xet](https://github.com/huggingface/xet-core) storage tools. This is meant to be used through the API in the hf-xet package.
|
||||
|
||||
## Overview
|
||||
|
||||
`xet-runtime` provides the shared foundation used by all crates in the
|
||||
xet-core ecosystem:
|
||||
|
||||
- **Async runtime** — Tokio-based runtime with configurable thread pools
|
||||
- **Configuration** — Hierarchical configuration for Xet clients
|
||||
- **Structured logging** — Tracing-based logging with file and console outputs
|
||||
- **Error handling** — `RuntimeError` type for the runtime layer
|
||||
- **Utilities** — File operations, sync primitives, and platform abstractions
|
||||
|
||||
This crate is part of [xet-core](https://github.com/huggingface/xet-core).
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0
|
||||
@@ -1,3 +1,10 @@
|
||||
//! Async runtime, configuration, logging, and utility infrastructure for
|
||||
//! the Hugging Face Xet storage tools.
|
||||
//!
|
||||
//! This crate provides the shared foundation used by all crates in the
|
||||
//! xet-core ecosystem: a Tokio-based async runtime, hierarchical
|
||||
//! configuration, structured tracing-based logging, and common error types.
|
||||
|
||||
#![cfg_attr(feature = "strict", deny(warnings))]
|
||||
|
||||
pub mod error;
|
||||
|
||||
Reference in New Issue
Block a user