Lets users stop an in-flight document parse to free up LLM / worker
resources without losing the chunks and index already written. The
core insight is that the previous parse_status=completed flipped as
soon as primary chunks landed, while the most expensive subtasks
(graph extract = N LLM calls per chunk, plus summary, question
generation) were still running in the background — so "completed"
wasn't actually terminal from a resource standpoint.
State machine
pending -> processing -> finalizing -> completed
|
+-> cancelled (any of the three
in-flight states)
+-> failed
+-> deleting
`finalizing` is the new post-process fan-out window. parse_status
only promotes to `completed` once pending_subtasks_count (a new
column tracking summary + question + per-chunk graph extract)
drains to zero via atomic FinalizeSubtask. Wiki ingest is
intentionally excluded from the counter — it's a KB-scoped
debounced batch and would otherwise pin parse_status in
`finalizing` for the wiki batch window.
Backend
- New ParseStatusFinalizing + pending_subtasks_count column with
migration 000056.
- knowledgeRepository.SetFinalizing transitions processing -> finalizing
conditionally so a racing cancel cannot be clobbered.
- knowledgeRepository.FinalizeSubtask atomically decrements the
counter and self-promotes the row to completed when it hits zero.
- KnowledgePostProcess restructured to compute expected subtask
count up front, flip to finalizing (or completed when no
enrichment is enabled), and only then fan out subtasks. Subtask
handlers (summary, question, graph extract) defer-decrement on
terminal exit using the existing isFinalAsynqAttempt convention.
- New POST /api/v1/knowledge/{id}/cancel-parse handler accepting
pending / processing / finalizing. Marks the row cancelled,
zeroes the counter, best-effort dequeues asynq tasks via a new
TaskInspector abstraction (asynq-mode walks pending/scheduled/
retry queues; Lite-mode noop), and scrubs wiki ingest pending op.
- SpanTracker.AbortAttempt flat-sweeps every still-running span
for the attempt via a new repo.CancelAllOpenSpans helper so the
trace viewer's striped bars all flip to cancelled, even leaf
generations whose parent stage already EndSpan'd (multimodal
fan-out pattern). knowledge_post_process closes its postSpan
via SkipSpan on the cancel/deleting entry guard so a worker
that opens a span AFTER the cancel sweep doesn't leak it.
- Housekeeping and resetPendingTasks sweep finalizing rows
identically to processing so a crash/restart can't strand them.
- DeleteKnowledge/DeleteKnowledgeList proactively dequeue
downstream tasks via the same TaskInspector path.
- ChunkExtractService gets a cancel entry guard so the most
expensive enrichment (graph extract) bails immediately when the
parent knowledge is aborted.
Frontend
- New cancelKnowledgeParse API client + "Stop parsing" entry in
both list view and card view more menus, gated on
pending/processing/finalizing.
- Polling predicate refactored to a shared isParseInFlight helper
that recognises `finalizing` (previously the doc list silently
stopped polling once parse_status flipped from processing).
- Knowledge processing timeline: isPolling includes finalizing,
new isHardTerminal short-circuits LIVE for cancelled/failed/
completed so stranded child spans cannot pin LIVE on.
- DocumentListView.computeStatus distinguishes finalizing
("增强中") from completed and shows the previous "生成摘要中"
copy when summary_status is still pending under finalizing.
Added cancelled badge as well.
- i18n: statusFinalizing / statusCancelled / cancelParse* keys
across zh-CN, en-US, ko-KR, ru-RU.
Docs / SDK
- docs/api/knowledge.md: documents the new finalizing state,
cancel-parse semantics, and which statuses accept cancel.
- client (Go SDK): CancelKnowledgeParse with docstring listing
the cancellable statuses.
This commit introduces several improvements to the knowledge processing timeline and related components. Key changes include:
1. Added a `gracePoll` prop to the `KnowledgeProcessingTimeline` component to manage polling behavior more effectively.
2. Enhanced the UI by displaying the document title in the drawer, improving user visibility of the current document context.
3. Implemented new CSS classes for better styling of the drawer title bar, ensuring a more polished appearance.
4. Updated the backend to support the new `WikiSpan` tracking, allowing for detailed monitoring of document processing stages.
These changes aim to improve user experience and provide better insights into the document processing workflow.
This commit introduces a new method to open the trace drawer directly from the card menu, enhancing user experience by allowing immediate access to trace details without navigating through the document detail drawer. The implementation includes updates to the `handleViewTrace` function to ensure the correct knowledge ID and parse status are set before opening the trace drawer. Additionally, minor adjustments were made to the UI for better consistency and clarity.
Three fixes in response to user feedback:
1. Span input disappearing on End/Fail
The Upsert's DoUpdates always listed input/output/metadata, so calls
that only set output (EndSpan) or only set error_* (FailSpan) wrote
NULL into input/metadata, clobbering whatever Begin had recorded.
Build the column list dynamically: skip input/output/metadata when
the incoming row's value is nil. nil now means "preserve existing"
(matches user's intuition "Begin recorded it, End shouldn't erase it").
2. Subspans not auto-expanded
Stages with children (multimodal.image[*], postprocess.summary,
postprocess.question, postprocess.graph.chunk[*]) required a click
on the ▸ caret to surface — easy to miss. On the FIRST successful
fetch per (knowledgeId × attempt), auto-expand any stage that has
children. Subsequent polls honor whatever the user has collapsed,
so manual collapse mid-parse stays collapsed.
3. Auto-poll still not firing
Force-arm the polling interval in onMounted regardless of state.
The per-tick callback decides whether to actually fetch based on
current parse_status — so the loop can never get stranded waiting
on a status transition that already happened. Added a console.debug
when the interval arms, so we can verify from DevTools console that
polling is actually running.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-image multimodal subspan only captured image_url / enable_ocr /
enable_caption on input and chunk_id on output, so the trace viewer
could not answer "what did THIS image actually produce?" without
joining back to the chunks table.
Adds to the per-image span output:
- vlm_model_id (or "legacy_inline" for inline-config KBs)
- image_bytes (read size)
- ocr_prompt: "default" | "scanned_pdf"
- ocr_chars + ocr_preview (sanitized text, capped at 200 runes)
- caption_chars + caption_preview
- chunks_created (count of OCR/caption child chunks)
- indexed (true after BatchIndex completes)
- per-step error fields (read_error / ocr_error / caption_error /
skipped reason) when something fails
Also adds parent_chunk_id to the span input so the trace links back to
the text chunk this image hangs off — useful when a doc has hundreds
of inline images and you need to know WHERE in the text this one came
from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-process stage closes in ~9ms (just enqueue work) but its async
subspans (postprocess.summary, postprocess.question, postprocess.graph)
keep producing rows for tens of seconds AFTER the root finalizes. The
old timeline used trace.duration_ms as the time axis maximum, which
clipped those subspan bars past the right edge.
Timeline:
- totalMs now always takes max(trace.duration_ms, observed-tail), so
the axis stretches to fit the latest descendant end regardless of
parse_status.
- Render a faint dashed wrapping outline behind a parent span when
its descendants extend past its own finished_at, so the postprocess
stage row visibly spans the full window without overloading the
9ms self-time bar.
- Tree expand/collapse caret bumped from 10 to 14px in a 16x16 hit
area; copy icons in detail panel bumped from 11/14 to 14/18px;
.kp-kv-copy button grown from 18 to 22px.
- Short input/output payloads (<= 8 entries / <= 600 bytes JSON)
auto-expand inline so users see the actual data without an extra
click; longer payloads keep the click-to-expand summary.
Span payloads (subspans only - root keeps the canonical identity, no
duplicate knowledge_id/kb_id/tenant_id on every child):
- extract.go: graph subspan output gains chunk_chars, chunk_preview,
sample_nodes, sample_relations.
- summary subspan output gains model_id, summary_preview.
- question subspan output gains model_id and a sample_question
captured from the first non-empty LLM response.
i18n: new key knowledgeStages.detail.includingChildren for the
wrapping-bar tooltip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend: summary, question, and graph-extract async tasks now record
real processing time as subspans hanging off the (closed) postprocess
stage, so the trace viewer no longer caps the postprocess row at the
~10 ms enqueue duration. Carries Attempt through SummaryGeneration /
QuestionGeneration / ExtractChunk payloads so cross-process workers
can resolve the right parent attempt.
Frontend: drawer now uses attach="body" so the secondary 820px detail
drawer escapes the 654px container of the main drawer; timeline
timestamps include date prefix (MM-DD, or YYYY-MM-DD across years);
"updated X ago" caption only shows during live polling.
Tests: 4 new cases covering postprocess subspan attaching under a
closed parent, missing-parent fallthrough, and Attempt JSON round-trip
on the three task payloads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-tracker historical knowledge has zero rows in
knowledge_processing_spans but parse_status correctly reads
"completed" or "failed". The /spans handler was synthesizing five
"pending" placeholders unconditionally, so legacy completed documents
rendered as if they were stuck waiting in the queue forever.
buildSpanTree now takes parse_status and chooses the placeholder
status accordingly:
- ParseStatusCompleted -> done
- ParseStatusFailed -> failed
- everything else -> pending (existing behaviour)
Real rows always take precedence; this only changes what we put in
the gaps. So healthy in-flight parses (parse_status=processing,
some real rows, some still pending) keep showing pending placeholders
exactly as before — the synthesized "completed" inference only fires
when the parse already hit terminal state.
Adds TestBuildSpanTree_LegacyCompletedRendersAsDone covering both
the completed-legacy and failed-legacy branches.
The root span created by OpenAttempt was never closed: PostProcess only
ended the postprocess stage, so the root row stayed at status=running
forever even after parse_status flipped to completed/failed. The
timeline rendered "进行中" indefinitely on the root, defeating the
whole "is the document done" question the timeline is meant to answer.
- SpanTracker.FinalizeAttempt(kid, attempt, status, output, code, msg):
closes the root row idempotently. Re-closing a terminal root no-ops
so success / cascade-fail / dead-letter paths can fire without
coordination.
- PostProcess.Handle calls FinalizeAttempt(done) after EndSpan(postprocess)
on the success path. Async downstream work (summary/question/wiki/
graph) still records its own spans; their completion extends the
trace's wall-clock end-time but doesn't reopen the root.
- FailSpan auto-closes the root when a MAIN pipeline stage fails
(docreader / chunking / embedding / multimodal / postprocess).
Cascade-cancelled siblings stay closed-with-the-cascade as before.
- Dead-letter callback (router/task.go) accepts the SpanTracker via
DI and calls FinalizeAttempt(failed, TASK_TIMEOUT) when a
document-related task exhausts retries. The probe payload now
extracts the Attempt field that Document/Manual/PostProcess
payloads already thread through.
Stage spans were also being recorded with nil input/output, leaving
the new detail panel with timestamps only. Each Begin/End site now
emits useful work metrics:
- DocReader input: file_name, file_type, is_url, url
output: text_length, images_found, is_audio, pages
- Chunking input: chunks_planned
output: chunks_written, total_text_chars
- Embedding input: chunks_to_embed, model_id, dim
output: vectors_written, storage_bytes
- Multimodal input: image_count, enable_ocr, enable_caption
- PostProcess output: chunks_total, enqueued_summary, _question, _wiki, _graph
i18n: add knowledgeStages.root ("Knowledge processing") so the UI
can render a localized name instead of the raw span identifier.
- Collapse migrations 000052 (flat stages) + 000053 (span tree) into a
single 000052_knowledge_processing_spans migration; the flat stages
table never escaped this branch and the create-then-drop sequence had
no value.
- BeginStage: detect an existing (kid, attempt, stage) row before
inserting and reuse its span_id with reset state, so re-entry from
asynq retries or adjacent code paths no longer produces duplicate
timeline segments.
- FailSpan: when sibling-stage cascade flips a dependent stage to
cancelled, also CancelDescendants on its subtree so already-running
subspans (embedding.batch[i] etc.) cannot remain as orphan running
rows under a cancelled parent.
- Dead-letter callback: replace two sequential UpdateKnowledgeColumn
writes with a single UpdateKnowledgeColumns map update so we cannot
end up with parse_status=failed and stale error_message (or vice
versa) when one of the two writes fails.
- touchKnowledgeHeartbeat: skip subspan/generation transitions; only
root and stage transitions poke knowledge.updated_at. Spans-table
MAX(updated_at) already covers subspan progress for housekeeping, so
this avoids 2*N+ UPDATE bursts on the same hot row when a multimodal
stage fans out to many images.
- Add regression tests for the BeginStage idempotency contract and the
cascade-into-subspans behaviour.
Addresses three review concerns from the prior PR:
1. No tests existed for any of the stuck-parsing fixes (PR① / PR②.5).
This commit adds coverage for the four most regression-prone
surfaces: span repo upsert/cascade/attempt isolation, span
tracker cascade-cancel and cross-process LookupStage,
housekeeping false-kill protection, and handler tree assembly.
2. Housekeeping was using only knowledge.updated_at as its staleness
signal, but knowledge.updated_at advances only at parse_status
transitions — a long DocReader call (or large embedding batch)
can run for an hour with no updated_at change, so a tight
DocumentProcessTimeout setting would falsely flip an actively
running parse to "failed".
The sweep now does a two-stage check: candidates by knowledge
updated_at, then filtered by MAX(spans.updated_at). Every
SpanTracker Begin/End/Fail/Skip now also pokes
knowledge.updated_at as a side-channel heartbeat, so the
filter sees recent activity even when no parse_status
transition has fired.
3. parseHeartbeatTime accepts the timestamp formats both Postgres
and SQLite emit for an aggregated MAX() column (the SQLite
driver doesn't auto-cast aggregates to time.Time the way
Postgres does), so the same code path works in Lite mode.
The new TestHousekeeping_NoFalseKill_ActiveSpan is the regression
test for the user-flagged scenario: a 3-hour-stale knowledge.updated_at
combined with a 2-minute-fresh span row must NOT be killed.
Addresses review feedback that the PR② design had four shortcomings:
1. The pipeline is a DAG, not a sequence — Embedding and Multimodal
are independent of each other, both downstream of Chunking, both
upstream of PostProcess. The flat (knowledge_id, stage) table
couldn't represent that, so a Chunking failure left dependents
stranded as "pending" forever instead of being marked as
impossible-to-run.
2. No history across attempts. A reparse erased the previous run's
status before the new run started, leaving operators with no way
to investigate "why did this fail twice?".
3. Stages had only status + duration. Operators want to know how big
the work was — pages parsed, chunks created, tokens embedded, VLM
calls made — to distinguish "slow because the file is huge" from
"slow because the docreader is wedged".
4. Multimodal fans out N image tasks; Embedding fans out M batches;
PostProcess fans out into Summary/Question/Wiki/Graph. Each unit
is interesting on its own (Langfuse already captures this for
LLM calls). The flat model couldn't express it.
The redesign mirrors Langfuse's trace/span/generation hierarchy:
* Migration 000053 supersedes 000052: knowledge_processing_spans
table with (knowledge_id, attempt, span_id) primary key, plus
parent_span_id, kind ∈ {root, stage, subspan, generation},
status ∈ {pending,running,done,failed,skipped,cancelled}, and
JSONB input/output/metadata fields.
* SpanTracker (replacing StageTracker) exposes OpenAttempt /
BeginStage / BeginSubSpan / EndSpan / FailSpan / SkipSpan /
LookupStage. Cross-process workers (image_multimodal) get the
parent's attempt + span via payload + LookupStage so subspans
attach correctly.
* StageDependencies declares the DAG; FailSpan now cascades —
descendants of the failed span and dependent stages are flipped
to "cancelled" with a UPSTREAM_FAILED code. The UI sees a clear
blast radius instead of orphan spinners.
* Reparse now calls OpenAttempt up front so the timeline reflects
"new attempt, all pending" instead of letting the previous run's
status linger until the worker picks up the task.
* Image_multimodal records each image as a generation subspan with
its own success/failure on the parent attempt's multimodal stage.
The finalize-on-last-attempt counter logic is preserved unchanged.
* GET /api/v1/knowledge/:id/spans (also kept /stages alias) returns
a tree shape with synthesized pending placeholders so the
frontend always renders five timeline segments. ?attempt=N
enables history navigation.
Adds a five-segment progress model for the document parsing pipeline so
the UI (PR③) can render a timeline showing where each document is
(DocReader → Chunking → Embedding → Multimodal → PostProcess) and
which stage failed with what error code.
- New table `knowledge_processing_stages` (migration 000052) with one
row per (knowledge_id, stage). UPSERT on Begin/Done/Fail bumps an
attempt counter so re-parses don't lose history.
- StageTracker service exposes Begin/Done/Fail/Skip; all calls are
best-effort and never break the pipeline if persistence fails.
- Stable error codes (DOCREADER_TIMEOUT / EMBEDDING_RATE_LIMIT /
VECTORSTORE_WRITE_FAILED / ...) the UI can map to localized
remediation hints.
- Tracker call sites added at the four meaningful failure points:
convert (DocReader), CreateChunks (Chunking), BatchIndex (Embedding),
enqueueImageMultimodalTasks (Multimodal start),
KnowledgePostProcess.Handle (Multimodal close + PostProcess).
- New endpoint `GET /api/v1/knowledge/:id/stages` returns the five
canonical stages — missing rows are synthesized as "pending" so
the timeline always renders five segments. Includes current_stage
and last_error block.
Several failure modes left Knowledge.parse_status pinned at "processing"
forever, with no signal to users beyond a permanent spinner. This commit
addresses the root causes and adds a safety net.
- Asynq worker pool: explicit Concurrency (default 16, env-tunable via
WEKNORA_ASYNQ_CONCURRENCY) so batch uploads don't queue behind a
CPU-count-sized worker pool. Redis op timeouts raised to 500ms/1000ms
(WEKNORA_REDIS_OP_TIMEOUT_MS) to absorb bursty multimodal counter ops.
- DocReader RPC: cap each call with WEKNORA_DOCREADER_CALL_TIMEOUT
(default 30m). Without this, a hung docreader pinned a worker for the
full DocumentProcessTimeout window.
- ImageMultimodal: finalize-on-last-attempt semantics. A permanently
failing single image no longer strands the parent — the asynq retry
is allowed to run, but on the final attempt we count the image
regardless of outcome. Redis DECR errors fall back to enqueuing the
post-process task instead of returning silently.
- Dead-letter callback: when DocumentProcess / KnowledgePostProcess /
ManualProcess exhausts retries, immediately mark the corresponding
Knowledge as failed with the last error. This surfaces the failure
in the UI without waiting for the housekeeping sweep.
- HousekeepingService: 5-minute cron that flips knowledge rows stuck
in "processing" past DocumentProcessTimeout + 10m to failed, plus
summary rows stuck > 1h. Catches anything the other safety nets
miss (worker SIGKILL mid-handler, etc.). Disable with
WEKNORA_HOUSEKEEPING_ENABLED=false.
- Distributed startup recovery: previously the post-restart sweep was
skipped whenever REDIS_ADDR was set, even though Asynq does not
reschedule the task that was actively running on the dead instance.
Now the sweep runs in distributed mode too, but only against rows
older than 30 minutes to avoid racing peer instances.
When IM image rendering breaks, operators previously had no log line to
correlate against the IM platform's fetch attempt. Add observability
hooks on both ends and unblock HEAD probes so common IM previews work
at all.
- log rewriteStorageURLs success/failure/no-op with the full signed URL
(operators can copy it from logs and verify public reachability)
- log presigned handler 4xx with client_ip + UA + tenant_id + file_path
so failures correlate against IM platform fetch logs; use the request
context so trace IDs are preserved
- accept HEAD on /api/v1/files/presigned: IM platforms (Feishu, Slack
etc.) probe with HEAD before GET when rendering image previews, and a
401 there is enough to break the inline image even when the GET would
have succeeded
- add Admin-only GET /api/v1/files/presigned-preview that returns the
exact URL an IM channel would embed for the calling tenant, for
self-service verification without sending a real IM message
- clarify APP_EXTERNAL_URL and MINIO_ENDPOINT public-reachability rules
in .env.example; misconfigured endpoints are the most common cause of
"image broken in IM" reports
Second half of the gated OpenSearch k-NN driver (#1440). Replaces the
PR 2a stubs for Save / BatchSave / DeleteBy* / Retrieve with their
real implementations, drops the stubs from `stubs.go`, and adds the
27 corresponding test cases that were intentionally held back. The
driver remains gated dead code (no registry / factory / env path
mentions OpenSearch — PR 3 ships the activation switch); the
behaviour added here only fires once PR 3 lands.
Depends on PR 2a of #1440 (driver skeleton).
What lands in PR 2b
-------------------
Three new production files (+762 LoC):
- `query.go` (+154) — typed retrieveFilters struct as the only entry
to query construction (closes the map[string]any injection surface;
hostile chunk_id stays JSON-escaped). buildKNNQuery (pure k-NN,
applies min_score when Threshold > 0) and buildKeywordQuery (BM25);
native hybrid intentionally omitted (out of scope for Phase 3 —
service-layer RRF stays in charge). Filter struct carries
ExcludeKnowledgeIDs and an opt-in IncludeDisabled for admin-mode
use cases.
- `retrieve.go` (+204) — Retrieve resolves the per-dim alias from
(1) params.AdditionalParams["dim"], (2) len(params.Embedding), or
(3) the cross-dim wildcard <base>_* for BM25-only callers. Always
returns exactly one *RetrieveResult (the Phase 2 fan-out contract).
Hit parsing caps response body at 16 MB and warns if _id !=
_source.chunk_id. Score passes through raw in [0, 1] — the k-NN
plugin's SpaceType.COSINESIMIL.scoreTranslation has already mapped
(1 + cos) / 2 before the client sees it.
- `crud.go` (+405) — Save / BatchSave with _id = chunk_id for
idempotency, dim-aware pre-marshal NDJSON size cap (rejects with
ErrBatchTooLarge before allocating a 50 MB body for a too-big
batch), and per-item error inspection that does NOT leak cluster-
side Reason strings into wrapped public errors. additionalParams
contract verified against elasticsearch/structs.go
ToDBVectorEmbedding + keywords_vector_hybrid_indexer.go: embedding
is map[string][]float32 keyed by SourceID, chunk_enabled is
map[string]bool keyed by ChunkID. Missing embedding routes Save /
BatchSave to the dim-less <base>_keywords index. Sync DeleteBy*
paths capped at 1000 IDs per call with ErrBatchTooLarge.
toDoc writes the is_recommended boolean from IndexInfo. The
matching mapping entry was added in PR 2a alongside is_enabled —
the pair stays in sync between the vector and the keyword-only
index so the field round-trips through CopyIndices (PR 3) and any
future FAQ-priority filtering.
Stubs removed (`stubs.go`, -86 LoC): Save / BatchSave / Retrieve /
DeleteByChunkIDList / DeleteBySourceIDList / DeleteByKnowledgeIDList
no longer return ErrFeatureNotEnabled — they delegate to the
production implementations above. The remaining stubs (CopyIndices,
BatchUpdateChunkEnabledStatus, BatchUpdateChunkTagID, swapToVersion,
EstimateStorageSize lower-bound) stay until PR 3.
Tests (+490 net to repository_test.go, 28 new cases):
- retrieveFilters: JSON injection guard, ExcludeKnowledgeIDs,
IncludeDisabled skip semantics, default is_enabled=true pin.
- buildKNNQuery: min_score applied when Threshold > 0; omitted at
threshold zero.
- buildKeywordQuery: content match appended after typed filters.
- toDoc: embedding field omitted when empty, present when set,
source_type serialised as integer, **is_recommended round-trips
both true and false**.
- lookupEmbedding / lookupChunkEnabled: SourceID / ChunkID key
contract; nil-params and wrong-type degrade paths.
- extractBatchEmbeddings: mixed-dim rejection; all-empty embeddings
yield dim=0.
- estimateBulkBodyBytes: size estimator boundary.
- effectiveTopK: clamping when caller leaves TopK at the magic-10
default.
- resolveDim: AdditionalParams takes precedence; Embedding fallback;
zero-zero triggers cross-dim multi-index search.
- inspectBulkResponse: all-succeeded fast path; total-count + leak-
guard for cluster-side Reason strings.
- BatchSave cap: 1001 infos returns ErrBatchTooLarge (pre-marshal);
empty list is a no-op.
- DeleteByChunkIDList cap: 1001 IDs returns ErrBatchTooLarge; empty
list is a no-op.
SDK quirk addressed in PR 2b
----------------------------
DocumentDeleteByQueryParams.Refresh is *bool, so the wire-level
"wait_for" value documented in the OpenSearch REST API is not
directly expressible. deleteByTerms uses Refresh:&true which forces
immediate segment flush (read-your-writes guaranteed but more
expensive than wait_for). A follow-up PR can drop to
*opensearchapi.Client.Transport.Perform for wait_for if telemetry
shows the cost is material.
Backward compatibility
----------------------
- Additive within the opensearch package only. No file outside the
package is touched.
- Driver still unreachable: PR 3 ships the activation switch.
- The PR 1 normalizer case for OpenSearch remains unreachable here
(no driver instance produces a result yet).
Test plan
---------
- [x] go build ./... clean
- [x] go vet ./... clean
- [x] go test -race -count=1 ./internal/application/repository/retriever/opensearch/... passes
- [x] grep -r "case types.OpenSearchRetrieverEngineType" internal/
shows only PR 1's normalizer case + this driver's EngineType()
and tests — no activation path.
- [x] grep -r "case \"opensearch\"" internal/ shows no hits.
* feat(rbac): add multi-use share-link invitations for invite_only mode
When the auth.registration_mode toggle flips to invite_only, the public
/auth/register endpoint returns 403 — but until now there was no
channel for unregistered users to actually join a tenant. The existing
invitation table required a registered invitee, so new emails could
not enter the system at all.
This change introduces a share-link model:
- Owner generates a multi-use registration link with a per-tenant role
- Recipient opens the link, registers with their own email, and is
added to the tenant on the same request
- Token is stored in plaintext (not hashed) so the management UI can
re-display the URL on demand without a "copy now or revoke" trap.
Threat model is bounded by 7-day TTL, revocability, and the fact
that all the link grants is membership in one tenant.
- accepted_count tracks how many users joined through the link so the
Owner can tell whether a link is fresh or has already been spread
Per-user invitations (registered email -> in-app inbox accept) are
unchanged. Share-link rows use empty invitee_user_id as discriminator;
the partial unique index on (tenant_id, invitee_user_id) was relaxed
to skip empty values so multiple share links can coexist per tenant.
Frontend reuses Login.vue for /register?token=xxx instead of a
parallel page, and adds icon-only revoke / remove actions plus a
distinct "active" status tag for share-link rows.
Migration 000054 adds the token column, the accepted_count column,
and the relaxed pending unique index in a single step.
* feat(auth): update invitation token handling and add rate limiting
- Refactor the invitation token retrieval to use POST instead of GET, enhancing security by preventing the plaintext token from appearing in logs and browser history.
- Update the API endpoint from `/auth/invitations/:token` to `/auth/invitations/lookup` to reflect the new method.
- Introduce a rate limiter for unauthenticated share-link endpoints to mitigate brute-force attacks and abuse, ensuring a maximum of 30 requests per minute per IP.
- Adjust the registration flow to maintain user experience while securing the invitation process.
The recovery branch added in c578fdba (for #1046) checked
`errors.Is(err, gorm.ErrRecordNotFound)`, but the session repository
translates the GORM error into `apperrors.ErrSessionNotFound` before it
reaches the IM service. The predicate therefore never matched, the
stale ChannelSession mapping was never recycled, and the bot stayed
permanently unresponsive after the underlying session was deleted from
the UI — exactly the symptom reported in #1499.
Extract the predicate into `isSessionNotFound` that matches both
sentinels (application sentinel as actually returned today; GORM
sentinel kept as a safety net against future repository reverts), and
add a focused regression test guarding the invariant.
Refs: https://github.com/Tencent/WeKnora/issues/1499
Two test surfaces, picked for cost/value:
internal/types/audit_log_test.go — extend the existing invariant suite
to include the system namespace:
- DotNamespaceConvention now covers system.setting_changed,
system.admin_promoted, system.admin_revoked.
- NoCollisionsAcrossNamespaces guards against duplicates across all
three new constants.
- New SystemNamespacePrefix test pins the shared "system." prefix —
this is the contract by which GET /system/admin/audit-log filters
out per-tenant rbac.* rows. Drift here would either leak per-tenant
events into the platform feed or hide platform events from
SystemAdmin.
- New SystemWireValues test pins the exact wire strings consumed by
the new frontend audit drawer, Langfuse exporters, and future SIEM
integrations; changes to these are a breaking change.
internal/handler/system_admin_audit_test.go — direct unit tests for
SystemHandler.emitAdminAudit, the helper that promote/revoke /
ApplyDefaultStorageQuotaToAllTenants all delegate to. Uses a
capturingAuditService stub (interface-embedded so any other method
call surfaces drift loudly) and a minimal SystemHandler with only
auditSvc wired — the helper deliberately doesn't touch other deps.
Coverage:
- NilServiceIsNoop: degraded-mode contract — a handler built without
an audit service must not panic on the audit hook.
- PopulatesCanonicalFields: every responsibility of the helper —
TenantID=0 (system scope), actor from ctx, role hard-pinned to
"system_admin", action passed through, outcome=success,
TargetType="user", TargetID/TargetUserID echoing user.ID, details
round-tripping through JSON.
- NilDetailsLeavesEmptyPayload: nil details map must NOT fabricate a
payload; the DB column defaults to '{}' and emitting an explicit
null would muddle "no extra context" filters.
- NilTargetStillEmitsRow: guards the nil-target defensive branch —
promote/revoke always supply one today, but the row still goes out
with empty target ids rather than crashing.
- IdempotentBranchSurvivesMarshal: pins the two boolean discriminator
flags (promote.idempotent, revoke.changed) so the audit reader can
distinguish a real grant from a probe and a real revoke from a noop.
Regression guard against accidentally swapping the payload to
stringly-typed shapes.
- LogErrorIsSwallowed: best-effort contract — a failing audit write
must NOT propagate, because the underlying privilege change has
already succeeded and bubbling the error would force the caller to
retry or roll back, both strictly worse than log-and-continue.
Mirrors the existing TestAuditLogHandler_* suite for the new
GET /system/admin/audit-log endpoint:
- AlwaysQueriesTenantZero: the defining contract — handler must call
AuditLogService.List with tenant_id=0 unconditionally, regardless of
any URL/header input. A regression here would leak per-tenant rbac.*
rows into the platform feed (or hide system.* rows from SystemAdmin).
- PassesQueryFiltersThrough: every advertised query key (after_id,
limit, action, outcome, actor) propagates exactly. Catches typos in
the param-key list.
- EmptyResultProducesZeroCursor: an empty service response must
collapse next_cursor to 0 so the drawer's infinite-scroll watcher
stops paginating.
- GarbageCursorAndLimitTolerated: malformed after_id / non-positive
limit fall back to defaults (matches ListTenantAuditLog) instead of
hard-failing, so stale URL params never blank-screen the drawer.
- ServiceErrorReturns500: List() errors surface as 500 via
errors.NewInternalServerError + ErrorHandler middleware, with a
non-empty body so the drawer alert has something to render.
The system_settings update, system admin promote/revoke, and
apply-default-storage-quota routes have been writing audit rows since
the prior commits, but with TenantID=0 (system-scope). The per-tenant
GET /tenants/:id/audit-log endpoint filters by tenant_id and never
returns them, so until now those rows existed only in the DB with no UI
surface. This commit closes the loop:
Backend:
- GET /api/v1/system/admin/audit-log: new SystemAdmin-gated endpoint
reusing AuditLogService.List with tenant_id=0. Same cursor-paged
shape and query params (after_id / limit / action / outcome / actor)
as the per-tenant feed, so the frontend reuses the same client logic.
- Wired through RegisterSystemAdminRoutes (mounted on the existing
adminRoutes group so it inherits the SystemAdmin() guard). The
handler dependency is optional: nil auditLogHandler skips the route,
mirroring RegisterTenantRoutes' /audit-log handling.
Frontend — new platform audit drawer in SystemSettings.vue:
- "审计日志" entry button in the section header opens a side drawer
(880px) listing system-scope events. Lazy-loaded on first open;
refresh is explicit via a button inside the drawer.
- Table columns: stacked date/time (so 50 events in the same minute
remain distinguishable), stacked actor/role, action tag, structured
target (subject key + diff line), outcome. The dead request column
(system actions don't go through middleware path capture) is dropped
in favour of richer target rendering.
- Per-action target formatters:
* system.setting_changed: subject = registry key, diff = `old → new`
(JSON-encoded, 80-char truncation). Reset shows `old → (空)`.
* tenant_storage_quota bulk apply: subject = "批量同步", diff =
"applied to N tenants (X GB)".
* system.admin_promoted / revoked: subject = "name (email)", diff
annotates idempotent / noop branches so an audit reader can tell
a real grant from a probe.
- Click-to-expand row reveals the full audit context: actor UUID,
target_user_id / target_type / target_id, and raw details JSON in
monospaced scroll-capped block. No psql round-trip needed for
forensic spot checks.
- Sticky thead pinned to the scroll container so column labels survive
long scrolls. Cells vertical-aligned middle to keep single-line tag
cells visually balanced against multi-line target cells. No zebra
stripes — the stacked content already provides row separation, and
stripes on top read as noise.
Frontend — same polish back-ported to TenantMembers.vue audit drawer:
- Same drawer width, stacked time / actor cells, structured target +
diff layout, expandable raw-details row, sticky thead, vertical-
align middle, no stripes. Refresh button reformulated as a text
button with label (was an outlined square icon-only).
- request_path column kept (rbac.access_denied carries meaningful
paths) but empty values render as a placeholder dash so they don't
read as broken.
- Diff line now covers rbac.invitation_sent / invitation_revoked role
in addition to the existing role_changed / access_denied details.
API:
- frontend/src/api/system/index.ts: listSystemAuditLog() reuses the
AuditLog / ListAuditLogParams types from @/api/tenant/audit-log
(re-exported) so consumers don't need to cross-import.
i18n (zh-CN / en-US / ko-KR / ru-RU):
- system.globalSettings.audit.*: full drawer copy + per-action labels
(system.setting_changed / admin_promoted / admin_revoked) + target
diff templates + expanded-row labels.
- tenantMember.audit.expanded.*: expanded-row labels added so the
shared drawer treatment renders cleanly under tenant scope.
Replace the standalone /platform/system/* routes with a single
"系统设置" section inside the canonical Settings modal. The previous
SystemLayout.vue / SystemAdmins.vue surfaces are removed and their
functionality (system admin roster, global settings) is hosted directly
in SystemSettings.vue under the standard `.section-header` /
`.settings-group` skeleton. Legacy URLs redirect to the modal section
so external bookmarks don't 404.
Backend:
- SystemSettingService.Reset + repo.Delete: drop the DB override for a
key so the 3-tier resolver falls back to ENV / built-in default.
Idempotent (resetting a never-persisted key returns nil); emits an
audit row only on real deletions, invalidates the local cache, and
publishes to peers via the existing pubsub channel.
- TenantService.BulkSetStorageQuota: overwrite every tenant's
storage_quota in one statement. Powers POST /system/admin/tenants/
apply-default-storage-quota; bypasses the per-tenant whitelist on
PUT /tenants/:id which intentionally forbids storage_quota edits.
- AuditAction{SystemAdminPromoted,SystemAdminRevoked} constants and
emitAdminAudit() in SystemHandler — promote / revoke now leave an
audit trail with TenantID=0 and {target_email,target_username,
idempotent|changed} in details.
- SystemSetting.LastModifiedByName: derived per-request display label
(username, email fallback) so the UI shows "wizardchen" instead of
a UUID prefix without storing a denormalised column.
Frontend:
- SystemSettings.vue rewritten against the Settings modal skeleton with
auto-persisting controls (switch/select @change, input/inputnumber
@blur, tag-input + per-delta popconfirm for SSRF whitelist and admin
roster). auth.registration_mode change goes through an inline
popconfirm; cancel rolls back. Reset / bulk-apply also inline
popconfirms — no dialog modal for these per-row affordances.
- Priority hint panel surfaces the DB > ENV > default resolver order
so operators can reason about "I set the env but it doesn't show up".
- Router: /platform/system, /platform/system/settings, /platform/system/
admins are now compatibility redirects to /platform/settings?section=
system-global.
- Settings modal sized 900x700 → 1080x780 and content-wrapper 600 → 760
so the wider tables (members, system settings) breathe; <1100px
viewport still flexes to the screen.
- i18n: system.globalSettings.* keys for title / description / loading /
empty / badges / priority hint / per-key labels / reset / bulkApply /
admins (label, placeholder, save messages, popconfirm copy) across
zh-CN, en-US, ko-KR, ru-RU.
Misc:
- internal/utils/filesize.go doc: clarify MAX_FILE_SIZE_MB is a
deploy-time-only knob (nginx + docreader + frontend each cache the
env at startup); a SystemAdmin UI override would mislead operators
because nginx would still 413. Until all four layers can hot-reload
the limit in lockstep, this stays env-only.
- internal/utils/security.go: SSRF whitelist parser/runtime now drives
off SystemSettingService for live updates; ENV remains the fallback
for never-overridden deployments.
- Added RevokeSystemAdmin functionality to the user service and repository, ensuring atomic checks for self-revoke and last admin scenarios.
- Updated the system handler to utilize the new revocation method, improving error handling for various edge cases.
- Enhanced the bootstrap process to prevent unintended promotions when system admins already exist.
- Refactored related comments and documentation for clarity on the new behavior and safeguards in place.
- Added WEKNORA_BOOTSTRAP_SYSTEM_ADMIN_EMAIL environment variable to promote a specified user to system admin on startup.
- Introduced a new bootstrap process in `bootstrap.go` to handle the promotion logic.
- Updated `.env.example` to document the new environment variable and its behavior.
- Created new views for managing system administrators and system settings, including listing, promoting, and revoking admin privileges.
- Enhanced the frontend to reflect the new system admin features, including UI elements for admin management and settings configuration.
- Updated API interfaces to support system admin functionalities, ensuring proper data handling and user management.
The intent prompt override logic in query_understand applied
strings.TrimSpace to the value before assigning it as the system prompt
override, which silently stripped trailing newlines and intentional
formatting from agent-supplied prompts. Use TrimSpace only to detect
emptiness (so whitespace-only strings still fall back to the global
default) while passing the raw string through verbatim.
Extract the resolution into applyIntentPromptOverride and add unit tests
covering agent-wins, whitespace preservation, blank fallback, no-config,
and global-only paths.
First half of the gated OpenSearch k-NN driver introduced in PR 1
(#1445) by way of #1440. PR 2a ships a hollow, interface-compliant
shell of the `internal/application/repository/retriever/opensearch/`
package — every behavioural method (Save / BatchSave / DeleteBy* /
Retrieve, plus the previously-stubbed CopyIndices / BatchUpdate* /
EstimateStorageSize / swapToVersion) returns `ErrFeatureNotEnabled`
or a conservative sentinel value. PR 2b lands the real read/write
implementations in dedicated files (`query.go` + `retrieve.go` +
`crud.go`) and replaces the stubs accordingly.
Strict feature-gate (unchanged from PR 1): no entry is added to
validEngineTypes / GetVectorStoreTypes / retrieverEngineMapping /
BuildEnvVectorStores / container env path / engine factory switch,
so the driver remains unreachable. Attempting to register an
`engine_type=opensearch` VectorStore continues to fail with the
existing "not a valid engine type" error.
What lands in PR 2a
-------------------
Driver skeleton (6 production files + 2 test files, ~1170 + ~1115 LoC):
- `repository.go` — Repository struct + NewRepository constructor
that validates cluster reachability + OS version (2.4+ / 3.x;
primary tested 3.3.2) + k-NN plugin presence on every cluster
node. sync.Once-guarded ensureReady(ctx, dim) for lazy per-
dimension index creation, with transient errors not cached so a
momentary cluster blip does not permanently poison a dim.
sanitizeIndexName enforces a strict OS-compatible name spec.
probeVersion uses robust strings.Split/Atoi parsing for
pre-release suffixes and missing-patch versions. EngineType
returns the PR 1 constant; Support returns [keywords, vector].
- `transport.go` — newOpenSearchClient ships TLS posture
(MinVersion TLS 1.2, opt-in InsecureSkipVerify, forward-secrecy-
only cipher list) and transport tuning for the driver. Caller
exists only in PR 3 (container.go + engine_factory.go); PR 2a
remains gated dead code.
- `mapping.go` — buildIndexMapping(cfg, dim) produces the full
knn_vector + HNSW + content-analyzer mapping with every *_id
field as an explicit keyword and source_type as integer.
buildKeywordsMapping ships the dim-less keyword-only index
mapping used by the no-embedding save path. createIndexAndAlias
creates <alias>_v1 and aliases <alias> to it, with best-effort
orphan cleanup and mapping-drift detection.
- `config.go` — internalCfg (value type) applying OpenSearch
defaults (hnsw_m=16, ef_construction=100, ef_search=100,
shards=4, replicas=1, engine=lucene).
- `errors.go` — nine sentinels (ErrIndexNotFound,
ErrDimensionMismatch, ErrAuth, ErrTransport,
ErrVersionUnsupported, ErrConfigInvalid, ErrFeatureNotEnabled,
ErrBatchTooLarge, ErrCircuitBreaker). Repository never imports
apperrors; PR 3's engine factory wraps these to typed AppError
2200/2201.
- `stubs.go` — every behavioural method returns
ErrFeatureNotEnabled. EstimateStorageSize returns a conservative
HNSW lower-bound estimate (not 0) so the Phase 2 KB-delete guard
fails-closed for non-empty KBs.
Tests (~1115 LoC, 50 cases):
- `repository_test.go` — interface satisfaction, sentinel mapping,
sanitizeIndexName positive/negative matrix, semver parsing
(pre-release / missing-patch), buildIndexMapping JSON shape pin
(Lucene + Faiss + Keywords), probeVersion matrix (OS 1.x / 2.2 /
2.5 / 2.11 / 3.x / 3.0.0-rc1 / ES rejection), probeKNNPlugin
multi-node coverage, ensureReady concurrency + per-dim isolation
+ transient retry, NewRepository storeID validation, all 11
stubs (CopyIndices, BatchUpdate*, EstimateStorageSize,
SwapToVersion + Save / BatchSave / Retrieve / DeleteBy*),
wrapTransport sentinel mapping + leak guard, isNotFound /
isAlreadyExistsError, drainAndClose / limitedDecode helpers.
- `transport_test.go` — TLS defaults / opt-in InsecureSkipVerify /
TLS 1.2 pinning / cipher list / transport tuning.
Single dependency addition: github.com/opensearch-project/
opensearch-go/v4 v4.6.0 in go.mod/go.sum.
SDK quirks discovered (opensearch-go v4.6.0)
--------------------------------------------
PR 2a includes the workarounds for two of three SDK limitations
that landed during full implementation (the third, Refresh:*bool,
only affects the delete path that ships in PR 2b):
- AliasExists method passes dataPointer=nil to its internal do(),
which means non-2xx responses come back as a plain
*errors.errorString ("status: 404 Not Found") rather than as
*opensearch.StructError. aliasExists therefore inspects
resp.StatusCode directly (resp is returned even when err is
non-nil) and only falls back to wrapTransport for the "no
response at all" case.
- sync.OnceReset is not in the standard library; the keyword-only
index uses a mutex + ready/err flag pattern so transient failures
can be retried by the next caller. The per-dimension path uses
the `once map[int]*sync.Once` delete-and-recreate trick.
Test fixes folded in
--------------------
While doing a full `go test ./...` against PR 1-merged main, two
deterministic regressions surfaced that block a clean run-everything
signal. Both are unrelated to the driver and are folded into PR 2a
so the PR's own CI run is green:
(1) Follow-up to #1445 — fanout test missed the new normalizer policy
(internal/application/service/knowledgebase_search_fanout_test.go,
+46 / -6). #1445 changed EngineAwareNormalizer for ES /
ElasticFaiss / OpenSearch / Weaviate / Postgres / SQLite /
Qdrant / TencentVectorDB / Doris from (score+1)/2 to clamp01
passthrough (those engines surface non-negative cosine to the
normalizer per Lucene script_score non-negative invariant for
ES, k-NN plugin SpaceType.COSINESIMIL.scoreTranslation for
OpenSearch, engine-internal or IR-normalized conversions for
the rest). Milvus is now the only engine that still surfaces
raw signed cosine in [-1, 1].
TestRetrieveFromStores_MixedEngine_Normalizes still asserted
the old cosine-shift behaviour for ES (raw -0.4 → expected 0.3)
which under passthrough now becomes clamp01(-0.4) = 0. The
normalizer's own _test.go was updated at #1445 time, but this
fan-out integration test was not.
Fix: rewrite the godoc to spell out the two engine groups;
restate sub-case 2 as ES passthrough on a production-possible
mid-range cosine (0.3 → 0.3, PG out-ranks ES); add sub-case 3
pinning the cosine-shift branch via Milvus -0.4 → 0.3.
(2) Pre-existing — SSRF whitelist singleton race surfaced by this run
(internal/utils/security.go + internal/utils/security_test.go +
internal/infrastructure/web_search/searxng_test.go,
+33 / -9). loadSSRFWhitelist in internal/utils/security.go is
cached via sync.Once on first call. The internal reset helper
resetSSRFWhitelistForTest was unexported, so tests in other
packages could not reset and saw whatever whitelist was cached
by the first sync.Once.Do() in the same test binary. In
internal/infrastructure/web_search/, TestValidateProxyURL runs
before TestValidateSearxngBaseURL alphabetically and exercises
ValidateURLForSSRF with no SSRF_WHITELIST set, caching an empty
whitelist; the later setenv in searxng_test then has no effect
and 127.0.0.1 is rejected with "hostname 127.0.0.1 is restricted".
Pre-existing on main; surfaced now because this PR was the
first to do a full `go test ./...` run on top of #1445.
Fix: capitalize the helper to ResetSSRFWhitelistForTest (the
ForTest suffix is the test-only contract); update in-package
callers; in web_search/searxng_test.go import internal/utils
and call ResetSSRFWhitelistForTest around the env mutation in
both TestValidateSearxngBaseURL and TestSearxngProvider_Search.
No production code path changes.
Roadmap
-------
- PR 2b (next, depends on this PR) — read/write implementations:
query.go + retrieve.go + crud.go land their real bodies; stubs
for Save / BatchSave / DeleteBy* / Retrieve in stubs.go are
removed; corresponding CRUD/retrieve/filter test cases (~430
LoC) join repository_test.go.
- PR 3 — activation switch + async paths (CopyIndices,
BatchUpdate*, large-batch async deletes) + i18n + docker-compose
dev profile. After PR 3 merges, the OpenSearch driver becomes
reachable via either `engine_type=opensearch` VectorStore or
`RETRIEVE_DRIVER=opensearch` env.
Backward compatibility
----------------------
- New package — additive only. No existing file modified except
go.mod / go.sum, the two test files in (1)/(2), and the
test-only export rename in utils/security.go.
- Driver is unreachable: no registry path activates it.
- No SQL migration.
- The PR 1 normalizer case for OpenSearch remains unreachable
here (no driver instance produces a result yet).
Test plan
---------
- [x] go build ./... clean
- [x] go vet ./... clean
- [x] go test -race -count=1 ./internal/application/repository/retriever/opensearch/... passes
- [x] grep -r "case types.OpenSearchRetrieverEngineType" internal/
shows only PR 1's normalizer case + this driver's EngineType()
and tests — no activation path.
- [x] grep -r "case \"opensearch\"" internal/ shows no hits.
- Add Type field to v2User struct to distinguish personal/team tokens
- Route team tokens directly to ListGroupRepos (skip ListUserGroups)
- Gracefully handle ListUserGroups 404 for personal tokens without teams
- Add rate-limiting delay between GetDocDetail calls to avoid API throttling
When the rerank model returns an error (e.g. 401 Unauthorized), the
pipeline previously discarded all retrieved candidates and returned
empty results to the caller.
Now p.rerank returns ([]RankResult, error) to distinguish API failure
from threshold-filtered empty results. On API error, the pipeline
falls back to the original retrieval candidates (directLoad +
candidatesToRerank) and continues to CHUNK_MERGE/FILTER_TOP_K,
so users still get results even when the rerank model is misconfigured.
Problem:
Doris does not support ? placeholders for LIMIT and OFFSET clauses, causing
VectorRetrieve, KeywordsRetrieve, and CopyIndices queries to fail at runtime
with SQL parse errors.
Root cause:
1. Query builders passed TopK/pageSize/offset as ? bind parameters (valid in
MySQL/PostgreSQL but not in Doris's SQL dialect)
2. Doris MySQL driver was not configured to interpolate parameters, preventing
automatic client-side placeholder replacement
Fix:
1. Enable parameter interpolation in Doris connection DSN:
- Add &interpolateParams=true to mysql.Open() DSN in container.go
- This enables driver-level parameter substitution at query time
2. Inline LIMIT/OFFSET as string literals in Doris queries:
- VectorRetrieve: LIMIT ? → LIMIT %d (params.TopK)
- KeywordsRetrieve: LIMIT ? → LIMIT %d (params.TopK)
- CopyIndices: LIMIT ? OFFSET ? → LIMIT %d OFFSET %d (pageSize, offset)
- Remove inlined values from args slice
Result:
Queries are now built with literal LIMIT/OFFSET values instead of placeholders,
compatible with Doris's SQL parser.
Three correctness fixes that the lifecycle PR deliberately deferred:
1. ID length / struct-tag drift
- models.id is varchar(64) on both PG and SQLite (per the init
migrations), but Model.ID's GORM tag said varchar(36) — a remnant
from when the field only held UUIDs. The mismatch is harmless under
golang-migrate (struct tag is ignored), but misleading on AutoMigrate
paths and in IDE tooltips. Tag now matches the real column width.
- New ModelIDMaxLen constant (=64) is the single source of truth for
anyone accepting user-provided ids. The YAML loader uses it to
reject too-long ids up front with a clear message instead of letting
the INSERT explode with a generic "value too long for type" error.
2. Field validation in the YAML loader
- Type, Source, and Status are typed strings but YAML can supply any
value. Misspellings (e.g. `type: knowledgeqa` lowercase, `type: LLM`)
were previously persisted as-is and produced rows that looked fine
in the table but failed at provider-factory lookup time, which is
hard to debug.
- validateBuiltinModelEntry now checks: empty id, id length, empty
type, type ∈ {KnowledgeQA, Embedding, Rerank, VLLM, ASR}, and
status ∈ {active, downloading, download_failed, empty}. Source is
intentionally NOT validated because the provider matrix in
internal/models/* keeps growing and a strict allow-list here would
force changes in two places per new provider.
- Invalid entries are warned + skipped (not aborting the whole load),
and excluded from the keep-set so the drift sweep does not delete
existing matching rows on the strength of a typo'd YAML retry.
3. Magic number cleanup
- DefaultBuiltinModelTenantID (=10000) replaces the hard-coded `10000`
literal in toModel(). The invariant lives in three places already
(PG migration, SQLite migration, this constant); naming it makes
the cross-reference explicit and grep-able.
Tests:
- New TestLoadBuiltinModelsConfig_RejectsInvalidEntries with five
sub-cases (id-too-long, missing-type, lowercase-type, unknown-type,
unknown-status) asserts the table stays empty after each.
- All 11 existing tests still pass.
Original PR #1453 used fmt.Printf, which lands as unstructured noise in
release/JSON log pipelines. The natural fix is logger.Infof/Warnf, but
internal/logger itself imports internal/types — using it from here
would create an import cycle (this is the same constraint that forces
model.go's crypto error logs onto stdlib log).
Switch all loader output to log.Printf with a stable "[builtin-models]"
prefix (mirroring the "[crypto]" convention already established in
model.go). The prefix gives operators a grep handle even though the
lines stay unstructured.
Levels are encoded inline as "WARN:" for warnings so log shippers and
humans can still discriminate.
Extend the builtin_models.yaml loader so the YAML file becomes a complete
source of truth for the rows it owns. Builds on the previous commit's
managed_by column.
Lifecycle contract:
- Every UPSERTed entry is tagged managed_by="yaml".
- The DoUpdates list now includes deleted_at, so an entry that was
soft-deleted (e.g. via UI/API) is automatically resurrected when it
reappears in the file. Closes the "ghost row that exists but is
invisible" failure mode.
- After all UPSERTs, the loader soft-deletes rows where
managed_by='yaml' AND id NOT IN (current YAML id set). Removing an
entry from YAML is now the supported way to retire a built-in model —
no manual SQL needed.
- Rows tagged managed_by='' (UI/API/SQL-seeded built-ins) are invisible
to the reconcile path and never touched.
- When a YAML entry sets is_default=true, the loader first clears
is_default on any other rows in the same (tenant_id, type) bucket,
mirroring the invariant enforced by the API path
(repository.UnsetDefaultModel).
Failure handling stays defensive:
- File missing / not a regular file / parse error: warn and skip; the
drift sweep is NOT executed so a malformed file cannot wipe rows.
- Per-entry UPSERT error: warn, drop the id from the keep-set so the
sweep also leaves the existing row alone ("leave alone on failure").
Tests cover: file-missing, parse-error, basic upsert + defaults,
idempotency, ${ENV} interpolation (set vs unset), drift sweep removing
YAML rows, drift sweep ignoring manual rows, soft-delete resurrection,
is_default cleanup across tenant+type, explicit empty list sweeping all
yaml-managed rows, and a regression guard ensuring BeforeCreate does not
overwrite YAML-supplied stable ids.
Docs are rewritten so operators see "delete from YAML and restart" as
the supported removal path; SQL is retained only for the legacy
managed_by='' slice.
Introduce a `managed_by` varchar column on `models` so future declarative
loaders can claim ownership of a subset of rows without disturbing entries
created via the UI/API or seeded by hand-written SQL.
- versioned/000052_models_managed_by.{up,down}.sql add the column with a
default of '' and a partial index on non-empty values to keep startup
reconciliation cheap.
- sqlite/000000_init.up.sql is updated in place (the Lite init migration
is a single file per project convention) so fresh SQLite databases get
the column from the start.
- Model.ManagedBy mirrors the column. Existing rows default to '' which
the YAML loader treats as "manually managed, never touch".
Schema half of the YAML-driven built-in-model lifecycle work that follows
up on #1453; the reconciler that uses the column lands in the next commit.
Allow built-in models to be declared in config/builtin_models.yaml
instead of inserting rows via SQL. On every startup the file is read
and each entry is UPSERT-ed into the models table (is_builtin=true)
by stable id.
Any string field may reference an environment variable with ${NAME}.
Unset variables are left as the literal placeholder so
misconfiguration surfaces clearly in provider calls rather than
failing silently with an empty token.
The file is optional: missing file, parse errors, and per-entry
upsert failures all log a warning without aborting startup.
docker-compose.yml adds env_file (.env, required:false) so
deployment-specific variables are passed through automatically.
When a custom agent has `MultiTurnEnabled=false`, `applyAgentOverridesToChatManage`
sets `chatManage.MaxRounds = 0` to signal "no history". Two pipeline plugins
mistreated this zero value as "use the global default" and silently re-loaded
session history into the LLM context:
- `PluginLoadHistory` fell back to `Conversation.MaxRounds` when
`chatManage.MaxRounds == 0`.
- `PluginQueryUnderstand.loadHistory` had the same fallback, and even when
`LOAD_HISTORY` was skipped it would re-populate `chatManage.History`,
leaking previous turns into rewrite, image analysis, and the final answer.
The RAG branch in `session_knowledge_qa` also added `LOAD_HISTORY`
unconditionally, unlike the pure-chat branch which guarded it with `hasHistory`.
This change:
- Treats `chatManage.MaxRounds <= 0` as an explicit disable in both plugins;
no fallback to global config.
- Makes the RAG pipeline consistent with the pure-chat path by gating
`LOAD_HISTORY` on `hasHistory`.
- Removes the duplicated `Current Time: {{current_time}}` line from
`agent_system_prompt.yaml`. The agent already receives a fresh
`<runtime_context><current_time>` block with each turn from
`observe.buildRuntimeContextBlock`, so the static placeholder was
redundant.
The ReAct agent path (`session_agent_qa`) already checked `MultiTurnEnabled`
directly and is not affected.
Closes#1479
- Refactored checkStorageEngineConfigured to be a method of knowledgeService, enhancing encapsulation and readability.
- Updated logic to allow fallback to global file service when no storage provider is configured at the KB or tenant level, improving error handling.
- Added detailed comments to clarify the method's behavior and internal logic, ensuring better understanding for future maintenance.
- Removed the VideoInfo field from the Chunk struct in chunk.go, streamlining the data model.
- This change reflects a shift in focus away from video information storage within the Chunk type.
Reuse enqueueKnowledgeListDelete inside DeleteKnowledge so that single-item
delete shares the same hardening as BatchDeleteKnowledge / ClearKnowledgeBase:
asynq retries, business-aware queue routing, and marking-as-deleting inside
the worker.
The endpoint now returns 200 once the delete task has been enqueued; the
response body carries the asynq task_id and the message is updated to
"Delete task submitted". Swagger annotations, generated docs and the Go
client SDK comment are updated to reflect the new asynchronous semantics.
Note: this is a behavior change. Callers that previously assumed the
knowledge was already gone on a 200 response should poll the task status
or accept eventual consistency, matching the existing BatchDeleteKnowledge
contract.
`cached_tokens` is reported by every OpenAI-compatible provider that
supports prompt caching, but how it becomes non-zero differs by mode:
- Implicit caching (OpenAI, Azure OpenAI, DeepSeek, …) populates the
field automatically whenever a prompt prefix matches a previous
request within the provider's cache TTL. No client-side opt-in.
- Explicit caching (Qwen on Aliyun, Anthropic Claude, …) only
populates the field after the caller attaches `cache_control:
{"type": "ephemeral"}` to the relevant message / content block.
Until that opt-in is applied upstream of the request, the field
stays zero even when the prefix is otherwise byte-stable.
Without this distinction documented, the previous commit reads as if
`TokenUsage.CachedTokens` will show non-zero values for Qwen / Claude
once this PR lands — which is not the case. The plumbing here is a
prerequisite (stable prefix via sorted tools) and a meter (visibility
of the field), but the explicit-cache opt-in itself is out of scope
and lives elsewhere.
Document this on `TokenUsage.CachedTokens` and the `cachedTokens`
helper so callers do not mistake observability for activation.
OpenAI-compatible providers (Qwen, DeepSeek, OpenAI, Azure, etc.) report
prompt-cache hits in `usage.prompt_tokens_details.cached_tokens`. This
value was being read by go-openai but dropped at the WeKnora boundary,
so there was no way to tell whether prompt caching was actually working.
This change plumbs the field end-to-end:
- `types.TokenUsage.CachedTokens` (json:"cached_tokens,omitempty") —
zero-values are omitted so payloads stay quiet for providers that
never report cache hits.
- `cachedTokens` helper in remote_api.go guards against nil
`PromptTokensDetails` (Ollama and older OpenAI-compat backends omit
the details block entirely).
- All three response-parsing paths populate it:
* `parseCompletionResponse` (non-streaming)
* `processStream` (SDK streaming)
* `processRawHTTPStream` (raw-HTTP streaming, used when callers
need to inject custom fields like `cache_control`)
- The five `[LLM Usage]` log lines now print `cached_tokens=%d` so
cache hit rate is visible in `journalctl` / log tail without going
through metrics.
Together with the deterministic tool ordering from the previous commit,
this makes Qwen explicit caching observable: a warmed prefix should
show `cached_tokens` ≈ system + tools token count (typically several
thousand) on subsequent requests within the 5-minute TTL.
Tests:
- `TestCachedTokensHelper` — nil safety + round-trip
- `TestParseCompletionResponse_CachedTokens` — populated + missing
details paths through `parseCompletionResponse`
- `TestTokenUsage_CachedTokensJSONOmitempty` — zero is omitted,
non-zero is emitted
`ToolRegistry.GetFunctionDefinitions` and `ListTools` previously ranged
over the internal map directly. Go map iteration is intentionally
randomized, so the resulting `tools` array reshuffled on every request.
That reshuffling silently breaks provider-side prompt caches that key on
a byte-level prefix match — most visibly Qwen explicit caching, which
requires the messages (system + tools + history) to be byte-identical up
to the `cache_control` marker. With random ordering the serialized tools
block changes every call, so the cache prefix never matches and the
hit rate stays at 0%.
Sort by tool name in both functions. Output is now byte-stable across
calls and `cache_control: ephemeral` can actually take effect.
Tests in registry_test.go cover:
- Deterministic ordering across 50 iterations
- JSON byte-stability across 20 iterations (the real motivation)
- Field projection (Name / Description / Parameters)
- Empty registry returns `[]` not `null`
- ListTools sorting
- First-wins duplicate registration policy (GHSA-67q9-58vj-32qx)
Lays type-system groundwork for the upcoming OpenSearch k-NN driver
(Phase 3 PR 2, see Tencent/WeKnora#1440), with strict feature-gate:
this PR ships only inert constants, schema extensions, and an
unreachable normalizer case. No path activates an OpenSearch
VectorStore yet — creation continues to fail with "not a valid engine
type" until the activation switch lands in Phase 3 PR 3.
Changes:
- OpenSearchRetrieverEngineType constant ("opensearch") in
internal/types/retriever.go. Not added to validEngineTypes /
GetVectorStoreTypes / retrieverEngineMapping yet (gated).
- ConnectionConfig.InsecureSkipVerify (bool, default false) in
internal/types/vectorstore.go, placed inside the // Common section
because it is a cross-driver TLS option. Distinct from the
Qdrant-specific UseTLS, which enables TLS on gRPC — InsecureSkipVerify
only controls verification of an already-TLS HTTPS connection.
AES-GCM Value/Scan round-trips the field as plaintext (it travels
alongside the encrypted Password / APIKey but is not sensitive itself).
- VectorStoreFieldInfo gains four optional fields: Immutable (bool),
Min/Max (*float64), Enum ([]string). All omitempty so existing UI
schema entries serialize identically. The fields will drive the UI
in Phase 3 PR 3 (read-only on Edit + range/enum constraints).
- Six new AuditAction constants under vector_store.* and opensearch.*
namespaces. Definitions only — emission lands in Phase 3 PR 3.
- EngineAwareNormalizer.Normalize is restructured to group engines by
the effective score range observed by the normalizer (not the
theoretical raw cosine range):
Range [-1, 1] (raw cosine, mapped via (score + 1) / 2):
- Milvus (COSINE metric mode). Milvus docs explicitly state the
COSINE metric range is [-1, 1].
Range [0, 1] (passthrough — already on the target scale):
- Elasticsearch v8 / ElasticFaiss. The driver issues a
cosineSimilarity(...) script_score script, and Lucene rejects
negative final scores ("Final relevance scores from the
script_score query cannot be negative" — ES docs); the
effective range observed by the normalizer is therefore [0, 1]
for IR-normalized embeddings. ES was previously grouped with
Milvus and over-corrected via (score + 1) / 2, which inflated
every ES result by 50% in mixed-engine RRF fusion.
- OpenSearch. The k-NN plugin's
SpaceType.COSINESIMIL.scoreTranslation maps the underlying
Lucene/Faiss distance (1 - cosine) to (1 + cosine) / 2 ∈ [0, 1]
before the score reaches us. Source:
github.com/opensearch-project/k-NN at
src/main/java/org/opensearch/knn/index/SpaceType.java
(COSINESIMIL enum, scoreTranslation method).
- Weaviate. The driver requests `certainty`, defined by Weaviate
as (2 - distance) / 2 = (1 + cosine) / 2, intrinsically [0, 1].
- Postgres pgvector / SQLite sqlite-vec / Qdrant /
TencentVectorDB / Doris. The driver computes (1 - cosine_distance)
or normalized inner_product, whose theoretical range is [-1, 1].
The IR-normalized positive-component unit vectors WeKnora
targets (BGE / OpenAI text-embedding-3 / Cohere /
sentence-transformers) keep the observed range in [0, 1];
negative-cosine embedding models would silently clamp to 0
downstream — explicitly documented as the IR-normalization
caveat in the struct godoc.
Dead enum references (ElasticFaiss, Infinity) are flagged in the
godoc with a pointer to internal/types/vectorstore.go's existing
"legacy/experimental, no standalone deployable instance" annotation.
Their case labels are kept for switch exhaustiveness.
Test coverage:
- OpenSearch constant wire value + collision check against existing
10 engines + gated invariant (NOT in validEngineTypes).
- ConnectionConfig.InsecureSkipVerify backward-compat (missing JSON
field deserializes as false) + round-trip + AES-GCM coexistence
with encrypted Password / APIKey.
- VectorStoreFieldInfo omitempty preservation + new-field serialization
+ *float64 pointer distinction (min=0 vs nil).
- AuditAction dot-namespace convention enforcement, prefix invariants
for vector_store.* and opensearch.*, no wire-string collisions, exact
wire-value pins for the six new constants.
- EngineAwareNormalizer:
- CosineRange retains its (score + 1) / 2 coverage but now only for
Milvus.
- UnitInterval now covers the full passthrough group (ES /
ElasticFaiss / OpenSearch / Weaviate / Postgres / SQLite / Qdrant /
Infinity / TencentVectorDB / Doris).
- New TestEngineAwareNormalizer_ElasticsearchCosinePassthrough is an
explicit regression guard for the score-range correction: cos=0.5
maps to 0.5 (not (0.5 + 1) / 2 = 0.75 as an earlier draft assumed).
- OpenSearch passthrough across (0 / 0.5 / 0.75 / 1) + engine drift
(1.0001) + defensive negative + ±Inf / NaN edges + keyword
passthrough + nil-ctx safety.
Backward compatibility is preserved at every layer:
- All new struct fields are omitempty / pointer-tagged so existing
rows and existing wire formats remain unchanged.
- Normalizer's new OpenSearch case is unreachable until the driver
lands. The ES regrouping changes the post-normalization value for
every ES vector search result (a correctness fix, not a feature) —
ES vector retrieval is currently the only production path affected.
- AuditAction constants emit no audit_log rows in this PR.
- engine_type=opensearch VectorStore creation still rejected (gated).