WeKnora/internal at baa4e75d4ae6adbed7bb04bc44f6bc42c517139e - WeKnora - Gitea: Git with a cup of tea

pub_soft/WeKnora

mirror of https://github.com/Tencent/WeKnora.git synced 2026-06-04 13:30:32 +08:00

Files

History

wizardchen 5dc0a49a9b feat(timeline): enrich per-image multimodal subspan output

The per-image multimodal subspan only captured image_url / enable_ocr /
enable_caption on input and chunk_id on output, so the trace viewer
could not answer "what did THIS image actually produce?" without
joining back to the chunks table.

Adds to the per-image span output:
- vlm_model_id (or "legacy_inline" for inline-config KBs)
- image_bytes (read size)
- ocr_prompt: "default" | "scanned_pdf"
- ocr_chars + ocr_preview (sanitized text, capped at 200 runes)
- caption_chars + caption_preview
- chunks_created (count of OCR/caption child chunks)
- indexed (true after BatchIndex completes)
- per-step error fields (read_error / ocr_error / caption_error /
  skipped reason) when something fails

Also adds parent_chunk_id to the span input so the trace links back to
the text chunk this image hangs off — useful when a doc has hundreds
of inline images and you need to know WHERE in the text this one came
from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-28 15:14:45 +08:00

..

fix(agent): harden tool parameter parsing against LLM type mismatches (#1505 )

2026-05-28 07:50:26 +08:00

feat(timeline): enrich per-image multimodal subspan output

2026-05-28 15:14:45 +08:00

feat(assets): add ASR test audio file and embed it in the application

2026-04-02 21:27:27 +08:00

feat: Implement deadlock retry mechanism for chunk creation

2026-04-22 21:17:21 +08:00

fix(knowledge): prevent documents from getting stuck in "processing"

2026-05-28 15:14:45 +08:00

refactor(knowledge): replace flat stage table with langfuse-style span tree

2026-05-28 15:14:45 +08:00

feat(system-info): surface DB migration errors with troubleshooting links

2026-05-14 16:34:50 +08:00

fix(datasource): support Yuque team token in connector

2026-05-26 20:46:22 +08:00

feat(knowledge): track per-stage parsing progress with /stages API

2026-05-28 15:14:45 +08:00

feat(agent): human-in-the-loop approval for MCP tool calls (#1173 )

2026-05-10 22:57:12 +08:00

fix(knowledge): infer synthesized stage status from parse_status

2026-05-28 15:14:45 +08:00

fix(im): make presigned URL flow diagnosable end-to-end

2026-05-28 08:03:57 +08:00

feat(retriever): add OpenSearch driver skeleton + interface stubs (PR 2a of 3)

2026-05-26 20:54:58 +08:00

refactor(logger): support LOG_FORMAT template and harden level coloring

2026-05-22 20:31:54 +08:00

feat(mcp): implement reconnection logic for MCP tool calls and tool listing

2026-03-31 11:57:15 +08:00

fix(knowledge): prevent documents from getting stuck in "processing"

2026-05-28 15:14:45 +08:00

docs(chat): clarify cached-token semantics for explicit-cache providers

2026-05-25 16:47:14 +08:00

fix(knowledge): close root span on terminal state, enrich stage metadata

2026-05-28 15:14:45 +08:00

chore(runtime): silence gin per-route logs and emit env config banner at startup

2026-05-17 15:27:52 +08:00

增加sandbox对windows编译支持，现在默认是linux的实现，windows直接编译报错

2026-05-25 16:57:56 +08:00

fix(summary): preserve image caption/OCR text in document summaries

2026-05-22 17:25:39 +08:00

feat(redis): add REDIS_USERNAME support for Redis ACL

2026-02-04 19:38:40 +08:00

feat(observability): extend Langfuse tracing across asynq pipeline

2026-04-24 13:16:47 +08:00

feat(knowledge): instrument postprocess subspans + polish timeline UI

2026-05-28 15:14:45 +08:00

feat(system): consolidate system admin and settings into one Settings panel

2026-05-26 21:13:56 +08:00