mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
Introduces a client benchmark utility to track system resource usage (CPU, memory, disk I/O, and network I/O) of a process, so we don't need to write scripts to capture usage stats according to different OS standards. This becomes extremely helpful when I benchmark on Python notebook instances, e.g. Google Colab, where system monitor is not easily accessible or when running a separate monitor script is not easy. # Usage # Users can enable monitoring by setting `HF_XET_SYSTEM_MONITOR_ENABLED` to true, set usage sample interval using `HF_XET_SYSTEM_MONITOR_SAMPLE_INTERVAL`, this outputs metrics to the tracing stream at `INFO` level by default. In addition, these metrics can be redirected to a separate file by setting sample log path using `HF_XET_SYSTEM_MONITOR_LOG_PATH`. # Output # The stats are output in JSON format, which can be queried using tools like `jq`, e.g. 1. Trace of peak memory usage: `jq '.memory.peak_used_bytes' [HF_XET_SYSTEM_MONITOR_LOG_PATH]` 2. Trace of disk write speed: `jq '.disk.average_write_speed' [HF_XET_SYSTEM_MONITOR_LOG_PATH]` 3. Trace of network receive speed: `jq '.network.average_rx_speed' [HF_XET_SYSTEM_MONITOR_LOG_PATH]`
hf_xet_thin_wasm
Exports limited functionality from xet-core in a WebAssembly compile-able/compatible way for use primarily by huggingface.js.
Exports:
- Xorb hash computation
- File hash computation
- Verification range hash computation
- Chunker struct/class
- Generate chunk boundaries
- Compute chunk hashes