mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
Replaces the old `upload_files` / `download_files` / `hash_files` Python functions with a new object-oriented API that exposes `XetSession` and its child objects directly as PyO3 classes. This gives Python callers full control over session lifecycle, connection pooling, and progress reporting. The previous module-level functions are kept under `hf_xet/src/legacy/` and remain importable as `from hf_xet import upload_files` etc., but now emit `DeprecationWarning`.
xet-data
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
Overview
- Content-defined chunking — Gear-hash based chunking for deduplication
- Deduplication — Probe and register chunks against metadata shards
- File reconstruction — Reassemble files from deduplicated chunk references
- Progress tracking — Hooks for upload/download progress reporting
This crate is part of xet-core.
License
Apache-2.0