mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
Adds a five-segment progress model for the document parsing pipeline so the UI (PR③) can render a timeline showing where each document is (DocReader → Chunking → Embedding → Multimodal → PostProcess) and which stage failed with what error code. - New table `knowledge_processing_stages` (migration 000052) with one row per (knowledge_id, stage). UPSERT on Begin/Done/Fail bumps an attempt counter so re-parses don't lose history. - StageTracker service exposes Begin/Done/Fail/Skip; all calls are best-effort and never break the pipeline if persistence fails. - Stable error codes (DOCREADER_TIMEOUT / EMBEDDING_RATE_LIMIT / VECTORSTORE_WRITE_FAILED / ...) the UI can map to localized remediation hints. - Tracker call sites added at the four meaningful failure points: convert (DocReader), CreateChunks (Chunking), BatchIndex (Embedding), enqueueImageMultimodalTasks (Multimodal start), KnowledgePostProcess.Handle (Multimodal close + PostProcess). - New endpoint `GET /api/v1/knowledge/:id/stages` returns the five canonical stages — missing rows are synthesized as "pending" so the timeline always renders five segments. Includes current_stage and last_error block.