mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
This PR adds crates.io-facing metadata (homepage, readme, keywords, categories) for the publishable crates, along with crate README files and concise crate-level docs so crates.io and docs.rs pages have better context.
xet-data
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
Overview
- Content-defined chunking — Gear-hash based chunking for deduplication
- Deduplication — Probe and register chunks against metadata shards
- File reconstruction — Reassemble files from deduplicated chunk references
- Progress tracking — Hooks for upload/download progress reporting
This crate is part of xet-core.
License
Apache-2.0