Files
WeKnora/migrations
wizardchen d0144f3586 perf(wiki): move ingest log to event table and index to on-demand API
Fixes O(n²) write amplification during wiki ingest on large KBs. Previously
every ingest/retract op re-wrote the single `slug='log'` row end-to-end and
every batch re-wrote the entire `slug='index'` directory markdown. On a 40k-
doc KB the log row grew to tens of MB and the index row to several MB, so
each batch triggered giant TOAST updates that dominated ingest wall time.

Log: new `wiki_log_entries` event table (`id DESC` indexed per KB) replaces
the single TEXT row. Batch ingest now collects entries and flushes them
once per batch via `AppendBatch`. Each entry stores `pages_affected` as
JSONB `[{slug,title}]` so the UI can render real titles; custom Scan falls
back to legacy `[]string` so older rows still deserialize.

Index: `wiki_pages[slug=index].content` keeps only the LLM-generated intro
(a few KB). The directory is now served by a structured paginated API
(`GetIndexView`) that reads `slug/title/summary` per type with cursor
pagination, so the agent and the frontend only pull the slice they need.
`RebuildIndexPage` degrades to a no-op; agent `wiki_read_page('index')`
synthesizes a small top-K overview and points callers at `wiki_search`.

Ingest resilience: LLM calls wrap with 3-attempt exponential backoff on
transient errors (5xx/408/429, transport resets/timeouts). Summary/extract
failures now bubble up so the batch's failed-op requeue path runs instead
of silently dropping the doc.

Frontend: sidebar Index/Log entries switch to dedicated views. Index view
streams intro → Summary → Entity → Concept → Synthesis → Comparison via
IntersectionObserver (with a nextTick re-check so small KBs still load
every section). Log view uses cursor pagination; pages_affected renders
titles with slug tooltip.

- new migration: migrations/versioned/000040_wiki_log_entries.{up,down}.sql
- tests: log repo pagination + legacy Scan, ListByTypeLight windowing,
  renderIndexOverviewForAgent output, isTransientLLMError classifier

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:48:52 +08:00
..