mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
Updated diagnostics scripts to collect logs (#542)
- also updated README - added analysis script to load latest dump collected --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
79
README.md
79
README.md
@@ -43,7 +43,7 @@ Please join us in making xet-core better. We value everyone's contributions. Cod
|
||||
|
||||
## Issues, Diagnostics & Debugging
|
||||
|
||||
If you encounter an issue when using `hf-xet` please help us fix the issue by collecting diagnostic information and attaching that when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose). Download the [hf-xet-diag-linux.sh](hf-xet-diag-linux.sh) or [hf-xet-diag-windows.sh](hf-xet-diag-windows.sh) script based on your operating system and then re-run the python command that resulted in the issue. The diagnostic scripts will download and install debug symbols, setup up logging, and take periodic stack traces throughout process execution in a diagnostics directory that is easy to analyze, package, and upload.
|
||||
If you encounter an issue when using `hf-xet` please help us fix the issue by collecting diagnostic information and attaching that when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose). Download the [hf-xet-diag-linux.sh](hf-xet-diag-linux.sh), [hf-xet-diag-macos.sh](hf-xet-diag-macos.sh), or [hf-xet-diag-windows.sh](hf-xet-diag-windows.sh) script based on your operating system and then re-run the python command that resulted in the issue. The diagnostic scripts will download and install debug symbols, setup up logging, and take periodic stack traces throughout process execution in a diagnostics directory that is easy to analyze, package, and upload.
|
||||
|
||||
### Diagnostics - Linux (`hf-xet-diag-linux.sh`)
|
||||
|
||||
@@ -103,7 +103,7 @@ sudo xcode-select --install
|
||||
|
||||
### Output Layout
|
||||
|
||||
Both scripts produce a diagnostics directory named:
|
||||
The diagnostic scripts produce a diagnostics directory named:
|
||||
|
||||
```
|
||||
diag_<command>_<timestamp>/
|
||||
@@ -120,53 +120,70 @@ This unified layout makes it easier to compare diagnostics across platforms.
|
||||
|
||||
### Analyzing Dumps
|
||||
|
||||
### Usage
|
||||
Use the [hf-xet-diag-analyze-latest.sh](hf-xet-diag-analyze-latest.sh) script to automatically find and open the most recent dump in the appropriate debugger for your platform.
|
||||
|
||||
From your repo root:
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./analyze-latest.sh
|
||||
./hf-xet-diag-analyze-latest.sh
|
||||
```
|
||||
|
||||
* Finds the most recent `diag_*` directory.
|
||||
* Opens the latest dump inside:
|
||||
* Auto-detects your OS (Linux, macOS, or Windows)
|
||||
* Finds the most recent `diag_*` directory
|
||||
* Opens the latest dump in the platform-appropriate debugger:
|
||||
* **Linux:** `gdb` with core dumps from `dumps/`
|
||||
* **macOS:** `lldb` with `.core` files from `dumps/`
|
||||
* **Windows (Git-Bash):** `windbg` with `.dmp` files from `stacks/`
|
||||
|
||||
* **Linux:** opens `dumps/core_*` in `gdb`.
|
||||
* **Windows (Git-Bash):** opens `stacks/*.dmp` in **WinDbg** (`windbg` must be on PATH).
|
||||
* You can also pass a base directory if your diagnostics are stored elsewhere:
|
||||
You can also specify a diagnostics directory:
|
||||
|
||||
```bash
|
||||
./analyze-latest.sh /path/to/diagnostics
|
||||
```
|
||||
```bash
|
||||
./hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
|
||||
```
|
||||
|
||||
**Manual Analysis**
|
||||
|
||||
If you prefer to analyze dumps manually:
|
||||
|
||||
**Linux**
|
||||
|
||||
* Stack traces are saved under `stacks/` as plain text.
|
||||
* Core dumps (`dumps/core_*`) can be analyzed with gdb:
|
||||
|
||||
* Stack traces: `stacks/*.txt` (plain text, captured periodically)
|
||||
* Core dumps: `dumps/core_*`
|
||||
* Analysis:
|
||||
```bash
|
||||
gdb python dumps/core_<pid>
|
||||
(gdb) bt # backtrace
|
||||
(gdb) thread apply all bt
|
||||
gdb python dumps/core_<timestamp>.<pid>
|
||||
(gdb) bt # backtrace of current thread
|
||||
(gdb) thread apply all bt # backtrace of all threads
|
||||
(gdb) info threads # list all threads
|
||||
```
|
||||
* Ensure the matching debug symbols (`hf_xet-*.dbg`) are in the `hf_xet` package directory.
|
||||
* Ensure debug symbols (`hf_xet-*.so.dbg`) are in the `hf_xet` package directory
|
||||
|
||||
**macOS**
|
||||
* Stack traces: `stacks/*.txt` (from `sample` command)
|
||||
* Core dumps: `dumps/dump_<pid>_<timestamp>.core`
|
||||
* Analysis:
|
||||
```bash
|
||||
lldb -c dumps/dump_<pid>_<timestamp>.core python3
|
||||
(lldb) bt # backtrace of current thread
|
||||
(lldb) thread backtrace all # backtrace of all threads
|
||||
(lldb) thread list # list all threads
|
||||
```
|
||||
* Ensure debug symbols (`hf_xet-*.dylib.dSYM`) are in the `hf_xet` package directory
|
||||
|
||||
**Windows**
|
||||
|
||||
* Dumps are saved under `stacks/` as `.dmp` files.
|
||||
* Open `.dmp` files in **WinDbg** (install via [Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk/)):
|
||||
|
||||
* Dumps: `stacks/dump_<timestamp>.dmp`
|
||||
* Install [WinDbg via Windows SDK](https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/)
|
||||
* Analysis:
|
||||
```cmd
|
||||
windbg -z dump_20250101_120000.dmp
|
||||
windbg -z stacks\dump_<timestamp>.dmp
|
||||
```
|
||||
* Common WinDbg commands:
|
||||
|
||||
```
|
||||
!analyze -v # Automatic analysis
|
||||
~* kb # Show stack traces for all threads
|
||||
lm # List loaded modules (verify hf_xet.pdb loaded)
|
||||
!analyze -v # automatic analysis
|
||||
~* kb # backtrace of all threads
|
||||
~ # list all threads
|
||||
lm # list loaded modules (verify hf_xet.pdb loaded)
|
||||
```
|
||||
* Ensure `hf_xet.pdb` is installed in the `hf_xet` package directory so symbols load correctly.
|
||||
* Ensure debug symbols (`hf_xet.pdb`) are in the `hf_xet` package directory
|
||||
|
||||
---
|
||||
|
||||
|
||||
246
hf-xet-diag-analyze-latest.sh
Executable file
246
hf-xet-diag-analyze-latest.sh
Executable file
@@ -0,0 +1,246 @@
|
||||
#!/usr/bin/env bash
|
||||
# hf-xet-diag-analyze-latest.sh — Cross-platform dump analyzer
|
||||
# Finds the latest diagnostics directory and opens the most recent dump
|
||||
# in the appropriate debugger for your platform (gdb, lldb, or WinDbg).
|
||||
|
||||
set -Eeuo pipefail
|
||||
|
||||
print_usage() {
|
||||
cat <<'USAGE'
|
||||
Usage: hf-xet-diag-analyze-latest.sh [diagnostics-directory]
|
||||
|
||||
Finds and analyzes the latest dump from a diagnostics collection.
|
||||
|
||||
Arguments:
|
||||
diagnostics-directory Path to a specific diag_* directory
|
||||
(default: latest diag_* in current directory)
|
||||
|
||||
Examples:
|
||||
./hf-xet-diag-analyze-latest.sh
|
||||
./hf-xet-diag-analyze-latest.sh diag_python_hfxet_test_20250127120000
|
||||
|
||||
This script will:
|
||||
- Auto-detect your OS (Linux, macOS, or Windows)
|
||||
- Find the most recent dump file
|
||||
- Launch the appropriate debugger:
|
||||
* Linux: gdb with core dumps from dumps/
|
||||
* macOS: lldb with .core files from dumps/
|
||||
* Windows: WinDbg with .dmp files from stacks/
|
||||
USAGE
|
||||
}
|
||||
|
||||
# --- option parsing ---
|
||||
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
|
||||
print_usage
|
||||
exit 0
|
||||
fi
|
||||
|
||||
DIAG_DIR="${1:-}"
|
||||
|
||||
# --- find diagnostics directory ---
|
||||
if [[ -z "$DIAG_DIR" ]]; then
|
||||
# Find the latest diag_* directory in current directory
|
||||
DIAG_DIR=$(find . -maxdepth 1 -type d -name "diag_*" -print0 2>/dev/null | \
|
||||
xargs -0 ls -dt 2>/dev/null | head -1 || true)
|
||||
|
||||
if [[ -z "$DIAG_DIR" ]]; then
|
||||
echo "ERROR: No diag_* directories found in current directory."
|
||||
echo "Please specify a diagnostics directory or run from a directory containing diag_* folders."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Found latest diagnostics directory: $DIAG_DIR"
|
||||
elif [[ ! -d "$DIAG_DIR" ]]; then
|
||||
echo "ERROR: Directory not found: $DIAG_DIR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- detect OS ---
|
||||
OS_TYPE=""
|
||||
case "${OSTYPE:-}" in
|
||||
linux*) OS_TYPE="linux" ;;
|
||||
darwin*) OS_TYPE="macos" ;;
|
||||
msys*|mingw*|cygwin*) OS_TYPE="windows" ;;
|
||||
*)
|
||||
# Fallback: check uname
|
||||
UNAME=$(uname -s 2>/dev/null || echo "")
|
||||
case "$UNAME" in
|
||||
Linux*) OS_TYPE="linux" ;;
|
||||
Darwin*) OS_TYPE="macos" ;;
|
||||
MINGW*|MSYS*|CYGWIN*) OS_TYPE="windows" ;;
|
||||
*)
|
||||
echo "ERROR: Unsupported OS: ${OSTYPE:-unknown} / ${UNAME:-unknown}"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Detected OS: $OS_TYPE"
|
||||
|
||||
# --- find latest dump file ---
|
||||
DUMP_FILE=""
|
||||
|
||||
case "$OS_TYPE" in
|
||||
linux)
|
||||
# Linux: look for core dumps in dumps/ directory
|
||||
if [[ -d "$DIAG_DIR/dumps" ]]; then
|
||||
DUMP_FILE=$(find "$DIAG_DIR/dumps" -type f -name "core_*" -print0 2>/dev/null | \
|
||||
xargs -0 ls -t 2>/dev/null | head -1 || true)
|
||||
fi
|
||||
|
||||
if [[ -z "$DUMP_FILE" ]]; then
|
||||
echo "ERROR: No core dumps found in $DIAG_DIR/dumps/"
|
||||
echo "Core dumps should be named: core_<timestamp>.<pid>"
|
||||
exit 1
|
||||
fi
|
||||
;;
|
||||
|
||||
macos)
|
||||
# macOS: look for .core files in dumps/ directory
|
||||
if [[ -d "$DIAG_DIR/dumps" ]]; then
|
||||
DUMP_FILE=$(find "$DIAG_DIR/dumps" -type f -name "*.core" -print0 2>/dev/null | \
|
||||
xargs -0 ls -t 2>/dev/null | head -1 || true)
|
||||
fi
|
||||
|
||||
if [[ -z "$DUMP_FILE" ]]; then
|
||||
echo "ERROR: No core dumps found in $DIAG_DIR/dumps/"
|
||||
echo "Core dumps should be named: dump_<pid>_<timestamp>.core"
|
||||
exit 1
|
||||
fi
|
||||
;;
|
||||
|
||||
windows)
|
||||
# Windows: look for .dmp files in stacks/ directory
|
||||
if [[ -d "$DIAG_DIR/stacks" ]]; then
|
||||
DUMP_FILE=$(find "$DIAG_DIR/stacks" -type f -name "*.dmp" -print0 2>/dev/null | \
|
||||
xargs -0 ls -t 2>/dev/null | head -1 || true)
|
||||
fi
|
||||
|
||||
if [[ -z "$DUMP_FILE" ]]; then
|
||||
echo "ERROR: No dump files found in $DIAG_DIR/stacks/"
|
||||
echo "Dump files should be named: dump_<timestamp>.dmp"
|
||||
exit 1
|
||||
fi
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Found dump file: $DUMP_FILE"
|
||||
|
||||
# --- determine python executable ---
|
||||
PYTHON_EXE=""
|
||||
for py_candidate in python3 python; do
|
||||
if command -v "$py_candidate" >/dev/null 2>&1; then
|
||||
PYTHON_EXE=$(command -v "$py_candidate")
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ -z "$PYTHON_EXE" ]]; then
|
||||
echo "WARNING: Could not find python executable. Using 'python' as fallback."
|
||||
PYTHON_EXE="python"
|
||||
else
|
||||
echo "Using python executable: $PYTHON_EXE"
|
||||
fi
|
||||
|
||||
# --- launch debugger ---
|
||||
case "$OS_TYPE" in
|
||||
linux)
|
||||
if ! command -v gdb >/dev/null 2>&1; then
|
||||
echo "ERROR: gdb not found. Install with: sudo apt-get install gdb"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "======================================"
|
||||
echo "Opening dump in GDB..."
|
||||
echo "======================================"
|
||||
echo "Useful commands:"
|
||||
echo " (gdb) bt # backtrace of current thread"
|
||||
echo " (gdb) thread apply all bt # backtrace of all threads"
|
||||
echo " (gdb) info threads # list all threads"
|
||||
echo " (gdb) quit # exit gdb"
|
||||
echo "======================================"
|
||||
echo ""
|
||||
|
||||
exec gdb "$PYTHON_EXE" "$DUMP_FILE"
|
||||
;;
|
||||
|
||||
macos)
|
||||
if ! command -v lldb >/dev/null 2>&1; then
|
||||
echo "ERROR: lldb not found. Install with: xcode-select --install"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "======================================"
|
||||
echo "Opening dump in LLDB..."
|
||||
echo "======================================"
|
||||
echo "Useful commands:"
|
||||
echo " (lldb) bt # backtrace of current thread"
|
||||
echo " (lldb) thread backtrace all # backtrace of all threads"
|
||||
echo " (lldb) thread list # list all threads"
|
||||
echo " (lldb) quit # exit lldb"
|
||||
echo "======================================"
|
||||
echo ""
|
||||
|
||||
exec lldb -c "$DUMP_FILE" "$PYTHON_EXE"
|
||||
;;
|
||||
|
||||
windows)
|
||||
# Check for various WinDbg installations
|
||||
WINDBG_EXE=""
|
||||
|
||||
# Check if windbg is on PATH
|
||||
if command -v windbg.exe >/dev/null 2>&1; then
|
||||
WINDBG_EXE="windbg.exe"
|
||||
elif command -v windbgx.exe >/dev/null 2>&1; then
|
||||
WINDBG_EXE="windbgx.exe"
|
||||
else
|
||||
# Common installation paths
|
||||
for dbg_path in \
|
||||
"/c/Program Files (x86)/Windows Kits/10/Debuggers/x64/windbg.exe" \
|
||||
"/c/Program Files (x86)/Windows Kits/10/Debuggers/x86/windbg.exe" \
|
||||
"$PROGRAMFILES/Windows Kits/10/Debuggers/x64/windbg.exe" \
|
||||
"${PROGRAMFILES_X86}/Windows Kits/10/Debuggers/x64/windbg.exe"
|
||||
do
|
||||
if [[ -f "$dbg_path" ]]; then
|
||||
WINDBG_EXE="$dbg_path"
|
||||
break
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
if [[ -z "$WINDBG_EXE" ]]; then
|
||||
echo "ERROR: WinDbg not found."
|
||||
echo ""
|
||||
echo "Please install WinDbg from:"
|
||||
echo " https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/"
|
||||
echo ""
|
||||
echo "Or add WinDbg to your PATH."
|
||||
echo ""
|
||||
echo "You can manually open the dump file:"
|
||||
echo " windbg -z \"$DUMP_FILE\""
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "======================================"
|
||||
echo "Opening dump in WinDbg..."
|
||||
echo "======================================"
|
||||
echo "Useful commands:"
|
||||
echo " !analyze -v # automatic analysis"
|
||||
echo " ~* kb # backtrace of all threads"
|
||||
echo " ~ # list all threads"
|
||||
echo " lm # list loaded modules"
|
||||
echo " q # quit"
|
||||
echo "======================================"
|
||||
echo ""
|
||||
|
||||
# Convert to Windows path format
|
||||
DUMP_FILE_WIN=$(cygpath -w "$DUMP_FILE" 2>/dev/null || echo "$DUMP_FILE")
|
||||
|
||||
exec "$WINDBG_EXE" -z "$DUMP_FILE_WIN"
|
||||
;;
|
||||
esac
|
||||
|
||||
@@ -167,6 +167,7 @@ else
|
||||
fi
|
||||
|
||||
# --- launch target ---
|
||||
SCRIPT_START_TIME=$(date +%s)
|
||||
echo "Launching target at $(date -Is) ..." | tee -a "$CONSOLE_LOG"
|
||||
|
||||
LAUNCH_ENV=()
|
||||
@@ -275,5 +276,22 @@ while kill -0 "$TARGET_PID" 2>/dev/null; do
|
||||
done
|
||||
|
||||
echo "Process $TARGET_PID has exited at $(date -Is)." | tee -a "$CONSOLE_LOG"
|
||||
|
||||
# --- collect xet log files from this execution ---
|
||||
HF_HOME="${HF_HOME:-$HOME/.cache/huggingface}"
|
||||
XET_LOG_DIR="$HF_HOME/xet/logs"
|
||||
if [[ -d "$XET_LOG_DIR" ]]; then
|
||||
echo "Collecting xet logs from $XET_LOG_DIR ..." | tee -a "$CONSOLE_LOG"
|
||||
mkdir -p "$OUTDIR/xet_logs"
|
||||
|
||||
# Find log files created during or after script start time using GNU find
|
||||
find "$XET_LOG_DIR" -name "xet_*.log" -type f -newermt "@$SCRIPT_START_TIME" 2>/dev/null | while read -r logfile; do
|
||||
cp "$logfile" "$OUTDIR/xet_logs/" 2>/dev/null && \
|
||||
echo " Copied: $(basename "$logfile")" | tee -a "$CONSOLE_LOG"
|
||||
done
|
||||
else
|
||||
echo "No xet log directory found at $XET_LOG_DIR" | tee -a "$CONSOLE_LOG"
|
||||
fi
|
||||
|
||||
echo "Logs and stacks are in: $OUTDIR"
|
||||
disown "$LOGGER_BG" 2>/dev/null || true
|
||||
|
||||
@@ -132,8 +132,12 @@ else
|
||||
fi
|
||||
|
||||
# --- launch target ---
|
||||
SCRIPT_START_TIME=$(date +%s)
|
||||
REF_FILE="$OUTDIR/.ref_timestamp"
|
||||
touch "$REF_FILE" # Reference file for finding logs created after this point
|
||||
# Ensure REF_FILE is cleaned up on exit
|
||||
trap 'rm -f "$REF_FILE"' EXIT
|
||||
echo "Launching target at $(date "+%Y-%m-%dT%H:%M:%S%z") ..." | tee -a "$CONSOLE_LOG"
|
||||
|
||||
(
|
||||
"${CMD[@]}" & echo $! > "$PID_FILE"
|
||||
) 2>&1 | tee -a "$CONSOLE_LOG" &
|
||||
@@ -213,6 +217,23 @@ while kill -0 "$TARGET_PID" 2>/dev/null; do
|
||||
done
|
||||
|
||||
echo "Process $TARGET_PID has exited at $(date "+%Y-%m-%dT%H:%M:%S%z")." | tee -a "$CONSOLE_LOG"
|
||||
|
||||
# --- collect xet log files from this execution ---
|
||||
HF_HOME="${HF_HOME:-$HOME/.cache/huggingface}"
|
||||
XET_LOG_DIR="$HF_HOME/xet/logs"
|
||||
if [[ -d "$XET_LOG_DIR" ]]; then
|
||||
echo "Collecting xet logs from $XET_LOG_DIR ..." | tee -a "$CONSOLE_LOG"
|
||||
mkdir -p "$OUTDIR/xet_logs"
|
||||
|
||||
# Find log files created after script start using reference file
|
||||
find "$XET_LOG_DIR" -name "xet_*.log" -type f -newer "$REF_FILE" 2>/dev/null | while read -r logfile; do
|
||||
cp "$logfile" "$OUTDIR/xet_logs/" 2>/dev/null && \
|
||||
echo " Copied: $(basename "$logfile")" | tee -a "$CONSOLE_LOG"
|
||||
done
|
||||
else
|
||||
echo "No xet log directory found at $XET_LOG_DIR" | tee -a "$CONSOLE_LOG"
|
||||
fi
|
||||
|
||||
echo "Logs and stacks are in: $OUTDIR"
|
||||
disown "$LOGGER_BG" 2>/dev/null || true
|
||||
|
||||
|
||||
@@ -119,6 +119,7 @@ else
|
||||
fi
|
||||
|
||||
# --- launch target ---
|
||||
SCRIPT_START_TIME=$(date +%s)
|
||||
(
|
||||
"${CMD[@]}" & echo $! > "$PID_FILE"
|
||||
) 2>&1 | tee "$CONSOLE_LOG" &
|
||||
@@ -159,5 +160,22 @@ while kill -0 "$TARGET_PID" 2>/dev/null; do
|
||||
done
|
||||
|
||||
echo "Process $TARGET_PID has exited at $(date -Is)." | tee -a "$CONSOLE_LOG"
|
||||
|
||||
# --- collect xet log files from this execution ---
|
||||
HF_HOME="${HF_HOME:-$HOME/.cache/huggingface}"
|
||||
XET_LOG_DIR="$HF_HOME/xet/logs"
|
||||
if [[ -d "$XET_LOG_DIR" ]]; then
|
||||
echo "Collecting xet logs from $XET_LOG_DIR ..." | tee -a "$CONSOLE_LOG"
|
||||
mkdir -p "$OUTDIR/xet_logs"
|
||||
|
||||
# Find log files created during or after script start time using GNU find
|
||||
find "$XET_LOG_DIR" -name "xet_*.log" -type f -newermt "@$SCRIPT_START_TIME" 2>/dev/null | while read -r logfile; do
|
||||
cp "$logfile" "$OUTDIR/xet_logs/" 2>/dev/null && \
|
||||
echo " Copied: $(basename "$logfile")" | tee -a "$CONSOLE_LOG"
|
||||
done
|
||||
else
|
||||
echo "No xet log directory found at $XET_LOG_DIR" | tee -a "$CONSOLE_LOG"
|
||||
fi
|
||||
|
||||
echo "Logs and dumps are in: $OUTDIR"
|
||||
disown "$LOGGER_BG" 2>/dev/null || true
|
||||
|
||||
Reference in New Issue
Block a user