Files
WeKnora/internal/searchutil
wizardchen c0e4a1d2f1 fix(summary): preserve image caption/OCR text in document summaries
Documents whose only payload is an embedded image (e.g. a docx with a
single picture) intermittently produced the refusal line "No textual
content was extractable from this document." even though the vision
model had successfully extracted a caption.

Three coordinated fixes:

- Clarify the summary prompt that text inside `<image_caption>` and
  `<image_ocr>` is first-class extracted content, not an image
  reference, so the model only triggers the empty-content branch when
  the body is genuinely textless.
- For image-dominated documents (real text < 200 runes after stripping
  image markup) include OCR alongside captions so screenshots and
  scanned figures contribute their actual content; text-heavy
  documents continue to use caption-only enrichment to avoid OCR
  noise from incidental figures.
- Add `EnrichContentCaptionAndOCR` which embeds caption + OCR text
  inline next to the original Markdown image link, deliberately
  omitting the `<image url=...>` and `<image_original>` wrapper
  blocks. Those wrappers carry only opaque export hashes that consume
  tokens and have been observed to retrigger the LLM's "image
  reference with no extracted text" heuristic.
2026-05-22 17:25:39 +08:00
..