Lets users stop an in-flight document parse to free up LLM / worker
resources without losing the chunks and index already written. The
core insight is that the previous parse_status=completed flipped as
soon as primary chunks landed, while the most expensive subtasks
(graph extract = N LLM calls per chunk, plus summary, question
generation) were still running in the background — so "completed"
wasn't actually terminal from a resource standpoint.
State machine
pending -> processing -> finalizing -> completed
|
+-> cancelled (any of the three
in-flight states)
+-> failed
+-> deleting
`finalizing` is the new post-process fan-out window. parse_status
only promotes to `completed` once pending_subtasks_count (a new
column tracking summary + question + per-chunk graph extract)
drains to zero via atomic FinalizeSubtask. Wiki ingest is
intentionally excluded from the counter — it's a KB-scoped
debounced batch and would otherwise pin parse_status in
`finalizing` for the wiki batch window.
Backend
- New ParseStatusFinalizing + pending_subtasks_count column with
migration 000056.
- knowledgeRepository.SetFinalizing transitions processing -> finalizing
conditionally so a racing cancel cannot be clobbered.
- knowledgeRepository.FinalizeSubtask atomically decrements the
counter and self-promotes the row to completed when it hits zero.
- KnowledgePostProcess restructured to compute expected subtask
count up front, flip to finalizing (or completed when no
enrichment is enabled), and only then fan out subtasks. Subtask
handlers (summary, question, graph extract) defer-decrement on
terminal exit using the existing isFinalAsynqAttempt convention.
- New POST /api/v1/knowledge/{id}/cancel-parse handler accepting
pending / processing / finalizing. Marks the row cancelled,
zeroes the counter, best-effort dequeues asynq tasks via a new
TaskInspector abstraction (asynq-mode walks pending/scheduled/
retry queues; Lite-mode noop), and scrubs wiki ingest pending op.
- SpanTracker.AbortAttempt flat-sweeps every still-running span
for the attempt via a new repo.CancelAllOpenSpans helper so the
trace viewer's striped bars all flip to cancelled, even leaf
generations whose parent stage already EndSpan'd (multimodal
fan-out pattern). knowledge_post_process closes its postSpan
via SkipSpan on the cancel/deleting entry guard so a worker
that opens a span AFTER the cancel sweep doesn't leak it.
- Housekeeping and resetPendingTasks sweep finalizing rows
identically to processing so a crash/restart can't strand them.
- DeleteKnowledge/DeleteKnowledgeList proactively dequeue
downstream tasks via the same TaskInspector path.
- ChunkExtractService gets a cancel entry guard so the most
expensive enrichment (graph extract) bails immediately when the
parent knowledge is aborted.
Frontend
- New cancelKnowledgeParse API client + "Stop parsing" entry in
both list view and card view more menus, gated on
pending/processing/finalizing.
- Polling predicate refactored to a shared isParseInFlight helper
that recognises `finalizing` (previously the doc list silently
stopped polling once parse_status flipped from processing).
- Knowledge processing timeline: isPolling includes finalizing,
new isHardTerminal short-circuits LIVE for cancelled/failed/
completed so stranded child spans cannot pin LIVE on.
- DocumentListView.computeStatus distinguishes finalizing
("增强中") from completed and shows the previous "生成摘要中"
copy when summary_status is still pending under finalizing.
Added cancelled badge as well.
- i18n: statusFinalizing / statusCancelled / cancelParse* keys
across zh-CN, en-US, ko-KR, ru-RU.
Docs / SDK
- docs/api/knowledge.md: documents the new finalizing state,
cancel-parse semantics, and which statuses accept cancel.
- client (Go SDK): CancelKnowledgeParse with docstring listing
the cancellable statuses.
This commit introduces several improvements to the knowledge processing timeline and related components. Key changes include:
1. Added a `gracePoll` prop to the `KnowledgeProcessingTimeline` component to manage polling behavior more effectively.
2. Enhanced the UI by displaying the document title in the drawer, improving user visibility of the current document context.
3. Implemented new CSS classes for better styling of the drawer title bar, ensuring a more polished appearance.
4. Updated the backend to support the new `WikiSpan` tracking, allowing for detailed monitoring of document processing stages.
These changes aim to improve user experience and provide better insights into the document processing workflow.
Pre-tracker historical knowledge has zero rows in
knowledge_processing_spans but parse_status correctly reads
"completed" or "failed". The /spans handler was synthesizing five
"pending" placeholders unconditionally, so legacy completed documents
rendered as if they were stuck waiting in the queue forever.
buildSpanTree now takes parse_status and chooses the placeholder
status accordingly:
- ParseStatusCompleted -> done
- ParseStatusFailed -> failed
- everything else -> pending (existing behaviour)
Real rows always take precedence; this only changes what we put in
the gaps. So healthy in-flight parses (parse_status=processing,
some real rows, some still pending) keep showing pending placeholders
exactly as before — the synthesized "completed" inference only fires
when the parse already hit terminal state.
Adds TestBuildSpanTree_LegacyCompletedRendersAsDone covering both
the completed-legacy and failed-legacy branches.
Addresses three review concerns from the prior PR:
1. No tests existed for any of the stuck-parsing fixes (PR① / PR②.5).
This commit adds coverage for the four most regression-prone
surfaces: span repo upsert/cascade/attempt isolation, span
tracker cascade-cancel and cross-process LookupStage,
housekeeping false-kill protection, and handler tree assembly.
2. Housekeeping was using only knowledge.updated_at as its staleness
signal, but knowledge.updated_at advances only at parse_status
transitions — a long DocReader call (or large embedding batch)
can run for an hour with no updated_at change, so a tight
DocumentProcessTimeout setting would falsely flip an actively
running parse to "failed".
The sweep now does a two-stage check: candidates by knowledge
updated_at, then filtered by MAX(spans.updated_at). Every
SpanTracker Begin/End/Fail/Skip now also pokes
knowledge.updated_at as a side-channel heartbeat, so the
filter sees recent activity even when no parse_status
transition has fired.
3. parseHeartbeatTime accepts the timestamp formats both Postgres
and SQLite emit for an aggregated MAX() column (the SQLite
driver doesn't auto-cast aggregates to time.Time the way
Postgres does), so the same code path works in Lite mode.
The new TestHousekeeping_NoFalseKill_ActiveSpan is the regression
test for the user-flagged scenario: a 3-hour-stale knowledge.updated_at
combined with a 2-minute-fresh span row must NOT be killed.
Addresses review feedback that the PR② design had four shortcomings:
1. The pipeline is a DAG, not a sequence — Embedding and Multimodal
are independent of each other, both downstream of Chunking, both
upstream of PostProcess. The flat (knowledge_id, stage) table
couldn't represent that, so a Chunking failure left dependents
stranded as "pending" forever instead of being marked as
impossible-to-run.
2. No history across attempts. A reparse erased the previous run's
status before the new run started, leaving operators with no way
to investigate "why did this fail twice?".
3. Stages had only status + duration. Operators want to know how big
the work was — pages parsed, chunks created, tokens embedded, VLM
calls made — to distinguish "slow because the file is huge" from
"slow because the docreader is wedged".
4. Multimodal fans out N image tasks; Embedding fans out M batches;
PostProcess fans out into Summary/Question/Wiki/Graph. Each unit
is interesting on its own (Langfuse already captures this for
LLM calls). The flat model couldn't express it.
The redesign mirrors Langfuse's trace/span/generation hierarchy:
* Migration 000053 supersedes 000052: knowledge_processing_spans
table with (knowledge_id, attempt, span_id) primary key, plus
parent_span_id, kind ∈ {root, stage, subspan, generation},
status ∈ {pending,running,done,failed,skipped,cancelled}, and
JSONB input/output/metadata fields.
* SpanTracker (replacing StageTracker) exposes OpenAttempt /
BeginStage / BeginSubSpan / EndSpan / FailSpan / SkipSpan /
LookupStage. Cross-process workers (image_multimodal) get the
parent's attempt + span via payload + LookupStage so subspans
attach correctly.
* StageDependencies declares the DAG; FailSpan now cascades —
descendants of the failed span and dependent stages are flipped
to "cancelled" with a UPSTREAM_FAILED code. The UI sees a clear
blast radius instead of orphan spinners.
* Reparse now calls OpenAttempt up front so the timeline reflects
"new attempt, all pending" instead of letting the previous run's
status linger until the worker picks up the task.
* Image_multimodal records each image as a generation subspan with
its own success/failure on the parent attempt's multimodal stage.
The finalize-on-last-attempt counter logic is preserved unchanged.
* GET /api/v1/knowledge/:id/spans (also kept /stages alias) returns
a tree shape with synthesized pending placeholders so the
frontend always renders five timeline segments. ?attempt=N
enables history navigation.
Adds a five-segment progress model for the document parsing pipeline so
the UI (PR③) can render a timeline showing where each document is
(DocReader → Chunking → Embedding → Multimodal → PostProcess) and
which stage failed with what error code.
- New table `knowledge_processing_stages` (migration 000052) with one
row per (knowledge_id, stage). UPSERT on Begin/Done/Fail bumps an
attempt counter so re-parses don't lose history.
- StageTracker service exposes Begin/Done/Fail/Skip; all calls are
best-effort and never break the pipeline if persistence fails.
- Stable error codes (DOCREADER_TIMEOUT / EMBEDDING_RATE_LIMIT /
VECTORSTORE_WRITE_FAILED / ...) the UI can map to localized
remediation hints.
- Tracker call sites added at the four meaningful failure points:
convert (DocReader), CreateChunks (Chunking), BatchIndex (Embedding),
enqueueImageMultimodalTasks (Multimodal start),
KnowledgePostProcess.Handle (Multimodal close + PostProcess).
- New endpoint `GET /api/v1/knowledge/:id/stages` returns the five
canonical stages — missing rows are synthesized as "pending" so
the timeline always renders five segments. Includes current_stage
and last_error block.
* feat(rbac): add multi-use share-link invitations for invite_only mode
When the auth.registration_mode toggle flips to invite_only, the public
/auth/register endpoint returns 403 — but until now there was no
channel for unregistered users to actually join a tenant. The existing
invitation table required a registered invitee, so new emails could
not enter the system at all.
This change introduces a share-link model:
- Owner generates a multi-use registration link with a per-tenant role
- Recipient opens the link, registers with their own email, and is
added to the tenant on the same request
- Token is stored in plaintext (not hashed) so the management UI can
re-display the URL on demand without a "copy now or revoke" trap.
Threat model is bounded by 7-day TTL, revocability, and the fact
that all the link grants is membership in one tenant.
- accepted_count tracks how many users joined through the link so the
Owner can tell whether a link is fresh or has already been spread
Per-user invitations (registered email -> in-app inbox accept) are
unchanged. Share-link rows use empty invitee_user_id as discriminator;
the partial unique index on (tenant_id, invitee_user_id) was relaxed
to skip empty values so multiple share links can coexist per tenant.
Frontend reuses Login.vue for /register?token=xxx instead of a
parallel page, and adds icon-only revoke / remove actions plus a
distinct "active" status tag for share-link rows.
Migration 000054 adds the token column, the accepted_count column,
and the relaxed pending unique index in a single step.
* feat(auth): update invitation token handling and add rate limiting
- Refactor the invitation token retrieval to use POST instead of GET, enhancing security by preventing the plaintext token from appearing in logs and browser history.
- Update the API endpoint from `/auth/invitations/:token` to `/auth/invitations/lookup` to reflect the new method.
- Introduce a rate limiter for unauthenticated share-link endpoints to mitigate brute-force attacks and abuse, ensuring a maximum of 30 requests per minute per IP.
- Adjust the registration flow to maintain user experience while securing the invitation process.
Two test surfaces, picked for cost/value:
internal/types/audit_log_test.go — extend the existing invariant suite
to include the system namespace:
- DotNamespaceConvention now covers system.setting_changed,
system.admin_promoted, system.admin_revoked.
- NoCollisionsAcrossNamespaces guards against duplicates across all
three new constants.
- New SystemNamespacePrefix test pins the shared "system." prefix —
this is the contract by which GET /system/admin/audit-log filters
out per-tenant rbac.* rows. Drift here would either leak per-tenant
events into the platform feed or hide platform events from
SystemAdmin.
- New SystemWireValues test pins the exact wire strings consumed by
the new frontend audit drawer, Langfuse exporters, and future SIEM
integrations; changes to these are a breaking change.
internal/handler/system_admin_audit_test.go — direct unit tests for
SystemHandler.emitAdminAudit, the helper that promote/revoke /
ApplyDefaultStorageQuotaToAllTenants all delegate to. Uses a
capturingAuditService stub (interface-embedded so any other method
call surfaces drift loudly) and a minimal SystemHandler with only
auditSvc wired — the helper deliberately doesn't touch other deps.
Coverage:
- NilServiceIsNoop: degraded-mode contract — a handler built without
an audit service must not panic on the audit hook.
- PopulatesCanonicalFields: every responsibility of the helper —
TenantID=0 (system scope), actor from ctx, role hard-pinned to
"system_admin", action passed through, outcome=success,
TargetType="user", TargetID/TargetUserID echoing user.ID, details
round-tripping through JSON.
- NilDetailsLeavesEmptyPayload: nil details map must NOT fabricate a
payload; the DB column defaults to '{}' and emitting an explicit
null would muddle "no extra context" filters.
- NilTargetStillEmitsRow: guards the nil-target defensive branch —
promote/revoke always supply one today, but the row still goes out
with empty target ids rather than crashing.
- IdempotentBranchSurvivesMarshal: pins the two boolean discriminator
flags (promote.idempotent, revoke.changed) so the audit reader can
distinguish a real grant from a probe and a real revoke from a noop.
Regression guard against accidentally swapping the payload to
stringly-typed shapes.
- LogErrorIsSwallowed: best-effort contract — a failing audit write
must NOT propagate, because the underlying privilege change has
already succeeded and bubbling the error would force the caller to
retry or roll back, both strictly worse than log-and-continue.
Mirrors the existing TestAuditLogHandler_* suite for the new
GET /system/admin/audit-log endpoint:
- AlwaysQueriesTenantZero: the defining contract — handler must call
AuditLogService.List with tenant_id=0 unconditionally, regardless of
any URL/header input. A regression here would leak per-tenant rbac.*
rows into the platform feed (or hide system.* rows from SystemAdmin).
- PassesQueryFiltersThrough: every advertised query key (after_id,
limit, action, outcome, actor) propagates exactly. Catches typos in
the param-key list.
- EmptyResultProducesZeroCursor: an empty service response must
collapse next_cursor to 0 so the drawer's infinite-scroll watcher
stops paginating.
- GarbageCursorAndLimitTolerated: malformed after_id / non-positive
limit fall back to defaults (matches ListTenantAuditLog) instead of
hard-failing, so stale URL params never blank-screen the drawer.
- ServiceErrorReturns500: List() errors surface as 500 via
errors.NewInternalServerError + ErrorHandler middleware, with a
non-empty body so the drawer alert has something to render.
The system_settings update, system admin promote/revoke, and
apply-default-storage-quota routes have been writing audit rows since
the prior commits, but with TenantID=0 (system-scope). The per-tenant
GET /tenants/:id/audit-log endpoint filters by tenant_id and never
returns them, so until now those rows existed only in the DB with no UI
surface. This commit closes the loop:
Backend:
- GET /api/v1/system/admin/audit-log: new SystemAdmin-gated endpoint
reusing AuditLogService.List with tenant_id=0. Same cursor-paged
shape and query params (after_id / limit / action / outcome / actor)
as the per-tenant feed, so the frontend reuses the same client logic.
- Wired through RegisterSystemAdminRoutes (mounted on the existing
adminRoutes group so it inherits the SystemAdmin() guard). The
handler dependency is optional: nil auditLogHandler skips the route,
mirroring RegisterTenantRoutes' /audit-log handling.
Frontend — new platform audit drawer in SystemSettings.vue:
- "审计日志" entry button in the section header opens a side drawer
(880px) listing system-scope events. Lazy-loaded on first open;
refresh is explicit via a button inside the drawer.
- Table columns: stacked date/time (so 50 events in the same minute
remain distinguishable), stacked actor/role, action tag, structured
target (subject key + diff line), outcome. The dead request column
(system actions don't go through middleware path capture) is dropped
in favour of richer target rendering.
- Per-action target formatters:
* system.setting_changed: subject = registry key, diff = `old → new`
(JSON-encoded, 80-char truncation). Reset shows `old → (空)`.
* tenant_storage_quota bulk apply: subject = "批量同步", diff =
"applied to N tenants (X GB)".
* system.admin_promoted / revoked: subject = "name (email)", diff
annotates idempotent / noop branches so an audit reader can tell
a real grant from a probe.
- Click-to-expand row reveals the full audit context: actor UUID,
target_user_id / target_type / target_id, and raw details JSON in
monospaced scroll-capped block. No psql round-trip needed for
forensic spot checks.
- Sticky thead pinned to the scroll container so column labels survive
long scrolls. Cells vertical-aligned middle to keep single-line tag
cells visually balanced against multi-line target cells. No zebra
stripes — the stacked content already provides row separation, and
stripes on top read as noise.
Frontend — same polish back-ported to TenantMembers.vue audit drawer:
- Same drawer width, stacked time / actor cells, structured target +
diff layout, expandable raw-details row, sticky thead, vertical-
align middle, no stripes. Refresh button reformulated as a text
button with label (was an outlined square icon-only).
- request_path column kept (rbac.access_denied carries meaningful
paths) but empty values render as a placeholder dash so they don't
read as broken.
- Diff line now covers rbac.invitation_sent / invitation_revoked role
in addition to the existing role_changed / access_denied details.
API:
- frontend/src/api/system/index.ts: listSystemAuditLog() reuses the
AuditLog / ListAuditLogParams types from @/api/tenant/audit-log
(re-exported) so consumers don't need to cross-import.
i18n (zh-CN / en-US / ko-KR / ru-RU):
- system.globalSettings.audit.*: full drawer copy + per-action labels
(system.setting_changed / admin_promoted / admin_revoked) + target
diff templates + expanded-row labels.
- tenantMember.audit.expanded.*: expanded-row labels added so the
shared drawer treatment renders cleanly under tenant scope.
Replace the standalone /platform/system/* routes with a single
"系统设置" section inside the canonical Settings modal. The previous
SystemLayout.vue / SystemAdmins.vue surfaces are removed and their
functionality (system admin roster, global settings) is hosted directly
in SystemSettings.vue under the standard `.section-header` /
`.settings-group` skeleton. Legacy URLs redirect to the modal section
so external bookmarks don't 404.
Backend:
- SystemSettingService.Reset + repo.Delete: drop the DB override for a
key so the 3-tier resolver falls back to ENV / built-in default.
Idempotent (resetting a never-persisted key returns nil); emits an
audit row only on real deletions, invalidates the local cache, and
publishes to peers via the existing pubsub channel.
- TenantService.BulkSetStorageQuota: overwrite every tenant's
storage_quota in one statement. Powers POST /system/admin/tenants/
apply-default-storage-quota; bypasses the per-tenant whitelist on
PUT /tenants/:id which intentionally forbids storage_quota edits.
- AuditAction{SystemAdminPromoted,SystemAdminRevoked} constants and
emitAdminAudit() in SystemHandler — promote / revoke now leave an
audit trail with TenantID=0 and {target_email,target_username,
idempotent|changed} in details.
- SystemSetting.LastModifiedByName: derived per-request display label
(username, email fallback) so the UI shows "wizardchen" instead of
a UUID prefix without storing a denormalised column.
Frontend:
- SystemSettings.vue rewritten against the Settings modal skeleton with
auto-persisting controls (switch/select @change, input/inputnumber
@blur, tag-input + per-delta popconfirm for SSRF whitelist and admin
roster). auth.registration_mode change goes through an inline
popconfirm; cancel rolls back. Reset / bulk-apply also inline
popconfirms — no dialog modal for these per-row affordances.
- Priority hint panel surfaces the DB > ENV > default resolver order
so operators can reason about "I set the env but it doesn't show up".
- Router: /platform/system, /platform/system/settings, /platform/system/
admins are now compatibility redirects to /platform/settings?section=
system-global.
- Settings modal sized 900x700 → 1080x780 and content-wrapper 600 → 760
so the wider tables (members, system settings) breathe; <1100px
viewport still flexes to the screen.
- i18n: system.globalSettings.* keys for title / description / loading /
empty / badges / priority hint / per-key labels / reset / bulkApply /
admins (label, placeholder, save messages, popconfirm copy) across
zh-CN, en-US, ko-KR, ru-RU.
Misc:
- internal/utils/filesize.go doc: clarify MAX_FILE_SIZE_MB is a
deploy-time-only knob (nginx + docreader + frontend each cache the
env at startup); a SystemAdmin UI override would mislead operators
because nginx would still 413. Until all four layers can hot-reload
the limit in lockstep, this stays env-only.
- internal/utils/security.go: SSRF whitelist parser/runtime now drives
off SystemSettingService for live updates; ENV remains the fallback
for never-overridden deployments.
- Added RevokeSystemAdmin functionality to the user service and repository, ensuring atomic checks for self-revoke and last admin scenarios.
- Updated the system handler to utilize the new revocation method, improving error handling for various edge cases.
- Enhanced the bootstrap process to prevent unintended promotions when system admins already exist.
- Refactored related comments and documentation for clarity on the new behavior and safeguards in place.
- Added WEKNORA_BOOTSTRAP_SYSTEM_ADMIN_EMAIL environment variable to promote a specified user to system admin on startup.
- Introduced a new bootstrap process in `bootstrap.go` to handle the promotion logic.
- Updated `.env.example` to document the new environment variable and its behavior.
- Created new views for managing system administrators and system settings, including listing, promoting, and revoking admin privileges.
- Enhanced the frontend to reflect the new system admin features, including UI elements for admin management and settings configuration.
- Updated API interfaces to support system admin functionalities, ensuring proper data handling and user management.
Reuse enqueueKnowledgeListDelete inside DeleteKnowledge so that single-item
delete shares the same hardening as BatchDeleteKnowledge / ClearKnowledgeBase:
asynq retries, business-aware queue routing, and marking-as-deleting inside
the worker.
The endpoint now returns 200 once the delete task has been enqueued; the
response body carries the asynq task_id and the message is updated to
"Delete task submitted". Swagger annotations, generated docs and the Go
client SDK comment are updated to reflect the new asynchronous semantics.
Note: this is a behavior change. Callers that previously assumed the
knowledge was already gone on a 200 response should poll the task status
or accept eventual consistency, matching the existing BatchDeleteKnowledge
contract.
Backend
-------
ListKnowledgeBases now enriches each row with the resolved vector_store
metadata (name / source / engine_type / status) via a new
buildKBListResponse helper. Store views are batch-resolved once per
request through BatchResolveStoreView so an N-KB list costs one
vector-store service call rather than N — closing the N+1 limitation
flagged in #1372's known-limitations section. Cross-tenant shared KBs
continue to render via SharedStoreDisplay so the owning tenant's store
inventory cannot be correlated across rows; the underlying vector_store_id
UUID is stripped from those responses.
Resolver failures degrade gracefully: bound KBs render as unavailable
instead of breaking the list. Test coverage pins the env / bound /
shared distinction, the batch-call-count invariant, and the
graceful-failure path.
Frontend
--------
KB editor modal gains a new "Vector Store" section. Create mode shows a
dropdown that combines the system default (env store) and the tenant's
configured user stores, fetched once at mount via the existing
listVectorStores API. Edit mode shows the bound store read-only via a
new VectorStoreBadge component with an explicit immutability hint —
matching the backend's `<-:create` GORM tag and the service-layer
UpdateKnowledgeBaseRequest DTO that already omit the field.
KB list cards surface a small engine-type badge for own-tenant bound
KBs, and a warning badge when the bound store is unavailable. Env-bound
and shared KBs render no badge (visual noise control). KB detail header
shows the bound store via the same VectorStoreBadge component; shared
KBs fall through to the badge's internal "shared" branch with no name /
engine / id rendered.
The KB editor's create-time error handler translates the typed
ErrVectorStoreBindingInvalid (2200) and ErrVectorStoreUnavailable
(2201) into localized messages and jumps the user back to the
VectorStore section so they can pick a different store or fall back to
the system default.
The KB row type gains five optional fields (vector_store_id / name /
engine_type / source / status). i18n: 18 new keys added to en-US,
ko-KR, zh-CN; ru-RU receives English placeholders pending translation
(consistent with prior PRs in this locale).
Part of #993 (Phase 2: Per-KB VectorStore Binding).
Phase 2 roadmap item: PR 5 (KB binding UI + list-response enrichment).
Depends on #994, #1310, #1372, #1386 (all backend in the Phase 2 series).
Tenant RBAC headline release: 4-tier role matrix (Owner/Admin/
Contributor/Viewer), per-KB resource ownership, per-tenant audit
log, tenant member management, self-service workspaces.
Also: CLI v0.3/v0.4 GA, KB retrieval fan-out across vector stores,
AES-256-GCM credential at-rest, docreader gRPC TLS+Token, Zhipu
embedding, Huawei OBS, vLLM URL for MinerU, Apache Doris compat
modes, server-side user preferences, Go 1.26.0.
See CHANGELOG.md for the full list.
docs(rbac): wire RBAC screenshots into READMEs and RBAC guide
- README.md / README_CN.md / README_JA.md / README_KO.md: replace the
single member-management thumbnail under the v0.6.0 RBAC highlight
with a 2×2 showcase (member management, workspace switcher,
self-service workspace creation, pending invitations).
- docs/RBAC说明.md: add the member-management screenshot to the
existing 前端实际界面 showcase so the guide is self-contained
and no longer cross-references README for it.
feat(rbac-ui): link tenant member page to RBAC guide
Add an inline doc-link in the Tenant Members settings page that
opens docs/RBAC说明.md on GitHub in a new tab, complementing the
existing in-app role-matrix popover. New i18n key
tenantMember.learnRbacGuide covered for zh-CN / en-US / ko-KR /
ru-RU.
Multi-KB hybrid search now groups KBs by their bound VectorStore (partition
key (storeID, owner_tenant_id)), retrieves in parallel via errgroup with a
SetLimit(4) cap and a per-group timeout (MULTI_STORE_RETRIEVE_TIMEOUT_SEC,
default 30s), and merges results. When the collected results span more than
one engine type, an EngineAwareNormalizer rescales vector scores to [0, 1];
keyword (BM25) scores pass through to the existing RRF fusion. Single-group
calls take the fast path with zero fan-out overhead, preserving today's
behavior for deployments where every KB has vector_store_id = NULL.
Embedding-model consistency is now enforced explicitly via
ResolveEmbeddingModelKeys. Multi-KB searches across KBs whose resolved
model identities differ return BadRequest instead of silently producing
incomparable scores. Cross-tenant Organization-shared KBs are preserved by
partitioning on KB.TenantID so the factory's ownership lookup runs against
the source tenant. Foreign-tenant KB UUIDs injected via the request body
are rejected via kbShareService.HasTenantKBPermission (Plan 3 of #1303,
3-D capped) before any retrieval; rejected scopes surface as 404 to avoid
leaking foreign KB existence.
Service-layer typed AppErrors (ErrVectorStoreBindingInvalid 2200 /
ErrVectorStoreUnavailable 2201) are mapped from PR2 sentinel hierarchy and
preserved end-to-end: the iterative FAQ path returns them rather than
swallowing, and the HybridSearch handler routes typed AppErrors to the
client unchanged instead of downgrading to 500.
Part of #993 (Phase 2: Per-KB VectorStore Binding).
Phase 2 roadmap item: PR 4 (Multi-store fan-out search).
Depends on #994, #1310, #1372.
Sessions now record the input-bar state used for the most recent QA
request (agent, model, KB scope, web search). The chat UI hydrates
those settings on session reopen so users see the same configuration
they used last time, instead of the global default.
The state is stored in the existing sessions.agent_config JSONB column
to avoid a new migration. Frontend snapshots the user's global defaults
on session enter and restores them on session leave, so opening an old
session does not pollute new-chat defaults.
Refactor the tenant RBAC configuration to change the default value from false to true, enabling role enforcement by default. This change allows operators to opt into a logging-only rollout window by explicitly setting the configuration to false.
Updates include:
- Modifications to .env.example and docker-compose.yml to reflect the new default.
- Adjustments in rbac.md documentation to clarify the new default behavior and the opt-in process.
- Code changes across various files to utilize the new pointer-based configuration for EnableRBAC, ensuring nil safety and clearer intent.
No functional changes were introduced; the adjustments primarily enhance clarity and maintainability of the RBAC feature.
DISABLE_REGISTRATION=true used to block /auth/register at the handler
layer only, leaving /auth/config still reporting self_serve. The
frontend therefore kept showing the Register entry even when the env
var was set — clicking it just hit the 403. Two gates, out of sync.
Wire DISABLE_REGISTRATION=true through applyAuthAndTenantDefaults so
it coerces auth.registration_mode to invite_only (env wins over YAML,
matching docs/rbac.md). The handler-side os.Getenv check is now
redundant with IsInviteOnly() and is removed, leaving a single
enforcement path.
Add config tests pinning down the env/YAML matrix, including the
explicit-self_serve override case that would otherwise be the easy
regression to ship.
Until now a fresh login (new device, expired refresh token, cleared
browser) always dropped the user into their home tenant, even if they
spend most of their time in a peer workspace. The session-local
X-Tenant-ID override in localStorage gave a "same browser sticky"
effect, but never crossed devices or new sessions.
This adds a small server-side preference, `users.preferences.
last_active_tenant_id`, persisted in the existing jsonb column (no
new migration), and threads it through:
* Backend
* `UserPreferences` gains `LastActiveTenantID *uint64` with sentinel
semantics (`*0` from the PATCH endpoint = clear preference).
* `resolveLoginTenantID` validates the stored id (tenant still
exists + active membership, or CanAccessAllTenants) and falls
back to home on any failure, best-effort clearing the stale
preference so subsequent logins don't pay for it again.
* `Login` and `LoginWithOIDC` resolve once and use the result for
both the JWT `tenant_id` claim and the returned `active_tenant`,
keeping the two in sync. `RefreshToken` rides through
`GenerateTokens` so refresh rotations also land the user back in
their preferred tenant instead of bouncing to home.
* `UpdateUserPreferences` learns to merge the new key.
* `PUT /auth/me/preferences` accepts the new field.
* Frontend
* `Login.vue` now writes the user's HOME tenant id into
`user.tenant_id` (matching the field's documented semantics) and
expresses any active-vs-home divergence via `setSelectedTenant`,
so `useHomeTenant` and the "current"/"home" badges stay correct
after the backend honours a remembered preference. `App.vue`'s
OIDC sync does the same reconciliation.
* `TenantSelector`, `UserMenu` and the post-tenant-create handlers
fire `persistLastActiveTenantPreference` after every successful
user-initiated switch (switching to home sends `0` to clear).
The call is raced against the existing reload-grace window so
most writes finish before the page tears down; lost writes are
recoverable on the next switch.
No new UI. Users will simply notice that, after re-logging in on
another device, they land back in the workspace they were last
using rather than always in their home tenant.
Note: `make docs` is unrelated-broken on `main` (audit_log.go
references `errors.AppError` which swag can't resolve), so the
Swagger artifacts under docs/ are intentionally not regenerated
in this PR. The handler code is the source of truth.
Modified the AgentQA function to conditionally pass the title disable flag to the executeQA method, enhancing flexibility in session handling. This change improves the behavior of the QA execution based on user requests.
Updated the tenant switch logic to consistently redirect users to the platform's knowledge base list after a tenant switch, simplifying the navigation experience. Removed the previous handling of tenant-scoped routes, ensuring a more predictable behavior during tenant transitions. This change enhances user experience by avoiding potential empty states on reload.
Refs: #1303
Updated the memory management logic to utilize a server-side per-user preference for enabling memory. The `enable_memory` field is now conditionally set based on user preferences, allowing for better control in both normal and embedded contexts. Adjusted relevant API handlers and request structures to support this change, ensuring backward compatibility with existing clients.
Refs: #1303
Added functionality for managing user preferences, including the ability to update preferences via a new API endpoint. The preferences are now stored server-side, allowing for synchronization across devices. Updated relevant components to handle user preferences, including the UserMenu and settings views. Enhanced internationalization support for error messages related to preference updates.
Introduced a new optional field, creator_name, in both CustomAgent and KnowledgeBase types to allow the front end to display the creator's name. This enhancement enables better differentiation between resources created by the current user and those created by other members of the same tenant. Updated relevant API handlers to populate this field during list operations, ensuring accurate representation in the UI. Additionally, modified the ResourceOriginBadge component to utilize the creator_name for improved context in resource listings.
Refs: #1303
Added functionality for managing pending invitations, including a polling mechanism to update the invitation count in real-time. Introduced a new dialog component for users to view and respond to their invitations without navigating away from their current context. Updated API endpoints for fetching and managing invitations, and enhanced the user interface with relevant internationalization support.
Refs: #1303
Enhanced the GetOrganization and ListMembers methods to enforce tenant-based access control. Users can now only access organization details if their tenant is a member or if the organization is marked as searchable. This change prevents unauthorized enumeration of organizations and ensures sensitive member information is only accessible to authorized tenants.
Refs: #1303
Introduced a new API client for managing user favorites, allowing users to list, add, and remove starred resources. Enhanced the sidebar to display favorites and recents, improving user navigation. Updated internationalization files to include relevant labels and tooltips for the new features across multiple languages.
Refs: #1303
Added a new configuration option to limit the number of tenants a non-superuser can create via self-service. Introduced a new error type for handling cases where users exceed this limit, returning a 429 status code. Updated the tenant creation and update handlers to enforce this limit and provide appropriate feedback to users. Additionally, refactored the tenant update request structure to ensure only mutable fields are allowed.
Refs: #1303
Added a new API endpoint for creating tenants, allowing any logged-in user to create their own workspace. The backend automatically assigns the creator as the Owner of the new tenant. A new CreateTenantDialog component was introduced in the frontend, enabling users to input the workspace name and an optional description. Internationalization files were updated to include relevant messages for this feature across multiple languages.
Refs: #1303
Updated the organization membership model to operate at the tenant level, aligning with Plan 3. This change introduces new interfaces and modifies existing ones to accommodate tenant-centric operations, including the `InviteMemberRequest` and `TenantInviteCandidate`. The search functionality for inviting members has been updated to reflect this shift, allowing for tenant-based searches instead of user-based. Additionally, internationalization files have been updated to ensure consistent messaging across languages regarding tenant membership and roles.
Refs: #1303
Plan 3 (#1303) lifted Org membership from per-user to per-tenant, and
40fdb978 pinned the org's owner tenant in DB as organizations.owner_tenant_id.
The OrganizationSettingsModal members list, however, still decided
"which row is the owner" by comparing member.user_id against
orgInfo.owner_id. After Plan 3 the members list is keyed on tenant_id,
so the user-id check is structurally wrong: the representative user of
the owner tenant happens to be the owner today, but if that user is
moved to another tenant — or a different user of the owner tenant
becomes the representative — the modal would either lose the "(owner)"
tag, or re-tag a non-owner row, and would offer the role-select /
remove-button affordances on what is in fact the owner tenant row.
This change:
- Adds OwnerTenantID to OrganizationResponse and populates it in
toOrgResponse from the persisted column. IsOwner now compares the
caller's current tenant against owner_tenant_id (with a user-id
fallback for legacy rows where owner_tenant_id == 0).
- Adds owner_tenant_id to the frontend Organization type and a small
isOwnerMember(member) helper in OrganizationSettingsModal, which
tenant-matches by default and falls back to user-id only on
pre-000046 rows. The role <select>, role tag suffix, and the
delete button on member rows now all use the helper.
The "is-me" CSS class and the me-tag in the member name still match
on user_id — that one really is a per-user marker ("this row's user is
me, the signed-in user").
Follow-up to the FAQ + Tag refactor in this same PR. Three additional
handler areas now run through the route-level g.KBAccess* guard:
- chunk.go : full refactor — helper effectiveCtxForKnowledge and
validateAndGetChunk's access check are gone; the
handler's chunk-fetch + ownership-mismatch check stay
as defence-in-depth (a same-tenant attacker can't
pass mismatched knowledge_id + chunk_id).
- knowledge.go and knowledgebase.go : routes now run the guard
BEFORE the handler. The in-handler
validateKnowledgeBaseAccess /
resolveKnowledgeAndValidateKBAccess helpers stay for
now because they branch on the agent_id query param
(specific-agent check vs. any-agent fallback) — that
handler-specific branch can collapse into the guard
in a follow-up. The duplicated DB roundtrip is
negligible compared to the handler's main work.
New middleware exports:
- KBIDFromChunkIDParam : walks chunk_id → KnowledgeBaseID via the
chunk's denormalised column.
- g.KBAccessReadFromChunkIDParam / g.KBAccessWriteFromChunkIDParam
: convenience methods on rbacGuards.
Route changes are purely additive (existing role / ownership guards
stay in place; the new KB-access guard runs after them and rewrites
the request's tenant context to the source tenant for shared KBs).
No new dependencies on the kbShareService / agentShareService at the
handler level for chunk.go — those move to the route layer.
Tests pass; behaviour unchanged.
Five handler files used to carry near-identical 30-line helpers
(effectiveCtxForKB / validateAndGetKnowledgeBase) that did the same
own-or-shared resolution before every KB-scoped operation:
1. KB belongs to caller's tenant -> grant own access
2. Org-shared KB -> grant min(share, role) cap
3. Shared agent carries the KB -> grant Viewer (read-only)
A bug found in any one of them had to be fixed in five places, and
several already drifted apart (different log levels, slightly
different error messages, one path missing the share-cap check).
Move the resolution to a route-level gin.HandlerFunc:
middleware.RequireKBAccess, exposed on rbacGuards as
g.KBAccessRead("id") / g.KBAccessWrite("id"). The guard runs after
the role/ownership checks, then on success:
- stashes the resolved (KB + effective tenant id + permission)
on c.Keys under middleware.KBAccessContextKey, and
- rewrites c.Request.Context() to carry the effective tenant id
so handlers downstream just read tenant the way they always did
(types.MustTenantIDFromContext) without a per-handler helper.
Refactor FAQ and Tag handlers to drop their helper methods; their
service constructors lose the kbShareService / agentShareService
dependencies (now lifted to the route layer). Routes:
faq.GET("/entries", g.Viewer(), g.KBAccessRead("id"), ...)
faq.POST("/entries", g.OwnedKBOrAdmin(), g.KBAccessWrite("id"), ...)
kbTags.GET("", g.Viewer(), g.KBAccessRead("id"), ...)
kbTags.POST("", g.OwnedKBOrAdmin(), g.KBAccessWrite("id"), ...)
The middleware also exports KBIDFromKnowledgeIDParam so chunk routes
(URL :knowledge_id) can adopt the same pattern in a follow-up PR
without reshaping the helper.
Five new tests cover own-KB, not-found, shared-with-sufficient-perm,
shared-but-below-min, and missing-tenant-context paths. Handler
tests still pass (no behaviour change).
Before this change, the `organization_members` table keyed on user_id —
joining an org as a Viewer in tenant T meant only that one user got
visibility on shared KBs/agents, while the rest of T (including its
Owner) saw nothing. The mismatch surfaced as confusing UX bugs:
"my colleague shared this with us, why can't I see it?"
This change replaces `organization_members` with
`organization_tenant_members`. Membership and Org-RBAC role now key on
(org_id, tenant_id); `representative_user_id` stays as a display-only
label of the user who first brought the tenant in.
Permission resolution gains a 3-D cap so tenant-RBAC stays honest under
cross-tenant sharing:
effective = min(share.Permission, tenant_org_role, tenant_role_cap)
where tenant_role_cap pins tenant Viewers to OrgRoleViewer regardless
of the org-level grant. A Viewer in their own tenant cannot edit a
shared KB even if the org granted Editor — preserves the
"Viewer = read-only everywhere" promise.
Migration 000045 backfills `organization_tenant_members` from the old
table by collapsing each (org, tenant) group to its max role, then
renames the old table to `organization_members_pre_plan3` for safe
rollback. Sqlite mirror (Lite mode) is fresh-bootstrap: the init
schema replaces the old table directly.
Routes: member-management URLs change from `:user_id` to `:tenant_id`.
Org create / join / leave now require Admin+ in the caller's tenant —
joining an Org affects everyone in T, so the decision can't sit with
a Viewer.
Frontend: OrganizationSettingsModal + store update to address members
by tenant_id.
AgentQA used to trust request.AgentEnabled even when
resolveAgent() returned nil (no agent_id, or unresolvable id) — the
request reached the async goroutine, AgentQA service errored with
'custom agent configuration is required', and the user saw a broken
SSE stream with no actionable error.
Most likely root cause is a stale frontend settings store where
isAgentEnabled stayed true after selectedAgentId got blanked
(typically a cross-tenant switch where the previously selected
agent isn't visible in the new tenant). The frontend fix lives in
selectAgent() syncing both flags, but the handler shouldn't trust
that contract.
Add an early 400 with a clear message before executeQA dispatches.
Logs the actual request.AgentID (sanitized) so future stuck-state
reports are diagnosable from a single log line.
DELETE /chunks/by-id/:id/questions used to be gated at flat
Contributor+ as a deliberate carve-out: the URL only carries the
chunk's own id, so the chunk -> knowledge -> KB ownership chain
wasn't reachable from request params alone. The carve-out had two
problems:
- Inconsistent with every other chunk mutation. PUT
/chunks/:knowledge_id/:id is OwnedKBOrAdmin (KB creator OR
Admin+); a Contributor with no KB ownership can't update a
chunk but COULD delete a generated question on the same chunk.
- User-visible: a Contributor in tenant A who never created any
KB still saw the 'delete question' button work, while every
other chunk control 403'd.
Wire the missing hop:
KBCreatorLookupFromChunkIDParam (handler) reads :id, fetches the
chunk via the existing GetChunkByIDOnly (unscoped repo call),
re-checks chunk.TenantID against the caller's tenant context to
block cross-tenant chunk-id probing, then defers to the existing
resolveKBCreatorByKnowledgeID helper for the rest of the chain.
OwnedChunkKBOrAdminFromChunkID (router) wraps it in
RequireOwnershipOrRole(Admin, ...) — same matrix as every other
chunk/knowledge/wiki mutation guard.
chunks.DELETE switches from g.Contributor() to the new guard.
4 unit tests cover happy path, chunk-not-found, cross-tenant
chunk-id probing, and KB-not-found upstream — same edge-case shape
as the existing KBCreatorLookup_* tests so future refactors get
caught loudly.
The Tenant.ConversationConfig field was marked deprecated when
CustomAgent (builtin-quick-answer) replaced the per-tenant defaults
path, but the JSONB column, GET/PUT KV handlers, write helpers and
read fall-throughs all stayed wired. This had two concrete failure
modes:
- Input-field.vue silently PUT-ed per-user model picks back into
/tenants/kv/conversation-config, which requires Admin+ — Viewers
and Contributors got a 403 every time they switched chat models.
- The same write also overwrote the tenant-shared default for every
other user, turning a per-user UX into a per-tenant footgun.
Remove the field, the struct, the validate helper, the PUT case and
the two dead Summary helpers in session/helpers.go. KnowledgeSearch
loses its tenant-conversation fallback (one chunk-tool reader); the
global config + hardcoded defaults still cover it. session_agent_qa
already required agents to declare their own rerank_model_id, so no
behaviour change there.
Frontend follows:
- Input-field.vue: per-user 'last selected chat model' moves to
localStorage, no API call.
- AgentEditorModal: defaults for builtin-quick-answer creation now
come from prompt-templates (system/context/rewrite/fallback) and
retrieval-config (top_k / threshold), the two endpoints that
actually own those settings now.
- AgentSettings.vue: orphan settings page with no router/import
referencing it; deleted.
- api/system: getConversationConfig / updateConversationConfig and
the ConversationConfig interface go away.
Net: 137 insertions, 2761 deletions.
Sweep over the route table found four classes of mutating endpoints
that had no role guard:
- POST /tenants — creating a brand-new tenant. Now g.CrossTenant():
only org-level superusers may create tenants, matching /tenants/all
and /tenants/search.
- POST /initialization/initialize/:kbId, PUT /initialization/config/:kbId
— these change a KB's embedding/parser/storage configuration, which
is at least as sensitive as PUT /knowledge-bases/:id. New
KBCreatorLookupFromKbIDParam (sister of KBCreatorLookup but reading
:kbId) lets us reuse the OwnedKBOrAdmin matrix; routes now use the
new g.OwnedKBOrAdminFromKbIDParam() guard.
- POST /initialization/{ollama,remote,embedding,rerank,asr,multimodal,
extract}/* — system-level model probes / downloads / extraction
pipelines. These trigger LLM calls and outbound network fanout
with tenant credentials; gated to Admin+. The corresponding GETs
(status / list / progress) drop to Viewer+ since they're read-only
observability.
- POST /system/{parser-engines,docreader,storage-engine}/* — same
pattern: GETs that report state are Viewer+, the *-check /
/reconnect actions are Admin+.
POST /evaluation/ runs the eval harness which fans out LLM calls
across the tenant — gated to Admin+.
Operational improvement: log the resolved tenant RBAC state at
LoadConfig completion. air's hot-reload only rebuilds the binary on
Go-source changes; it does NOT re-source .env, so flipping
WEKNORA_TENANT_ENABLE_RBAC=true while the dev loop is already
running has no effect until the parent dev-app process restarts.
This was a real diagnostic loop in PR review — printing the value at
startup makes the "I edited .env but the gates still aren't firing"
trap obvious from the first console line.
The Printf includes both the resolved cfg.Tenant.EnableRBAC value
and the raw env var, so a mismatch between "what the .env file
says" and "what this process actually inherited" is one glance.
When an Owner of tenant A invited user B as a Viewer, B could log in,
switch the active tenant from B's home to A via the user menu, and
still see every write button — Create KB, Upload Document, Edit/Delete
agent — because the role-gate plumbing on the client was wired to the
wrong tenant. The server's RBAC enforcement (PR 2 / PR 5) still blocked
the actual mutations with 403, so no data leaked, but the UI presented
authority the user did not have, which is a confusing and embarrassing
defect.
Five distinct root causes, all on the client / server boundary:
1. `currentTenantRole` (frontend/src/stores/auth.ts) read
`tenant.value.id` — the user's *home* tenant — instead of the active
tenant id. After switching to A, the helper still returned B's home
role (Owner), so every `hasRole(...)` gate in the codebase silently
short-circuited. Fix: read `selectedTenantId` first, fall back to
`tenant.id`.
2. `setMemberships` was only called once, on /auth/login. Hard reloads
(which the tenant switcher triggers) re-ran `getCurrentUser()` via
`App.vue` and `router/index.ts`, but those callers discarded the
`memberships` field of the response, so the auth store kept the
login-time snapshot forever. Role changes between sessions stayed
invisible. Fix: extract memberships in both /auth/me consumers and
feed them into `setMemberships`.
3. `KnowledgeBase.vue`'s `isOwner` compared `kb.tenant_id` to the
active tenant id, so any KB inside the current tenant was treated
as "owned by the caller" — bypassing every role gate below. Fix:
compare against `kb.creator_id` (the field PR 5 introduced) and
require an actual creator match. `canEdit` / `canManage` now layer
`hasRole('admin')` on top so a non-creator with sufficient role can
still manage.
4. `KnowledgeBaseList.vue` and `AgentList.vue` rendered the per-card
more-menu (Settings, Delete, Edit) unconditionally. With the
ownership column from PR 5 this is straightforward to gate on the
client: Settings/Delete now require creator-match or `hasRole('admin')`
per the server's `OwnedKBOrAdmin` / `OwnedAgentOrAdmin` matrix.
5. The axios interceptor (frontend/src/utils/request.ts) had a
"selectedTenantId === defaultTenantId → don't attach X-Tenant-ID"
short-circuit. Any code that wrote the active tenant into
`weknora_tenant` (UserMenu's loadUserInfo, OIDC sync, router
hydrate) made the two values equal, after which the header silently
stopped attaching and subsequent navigation requests landed on the
home tenant. Fix: drop the short-circuit; always attach when
selectedTenantId is set.
Server-side, /auth/me read `user.TenantID` (the immutable home tenant
id stamped at signup) when fetching the tenant for the response, so
the `tenant` field always pointed at home even after the auth
middleware had already resolved an active tenant from X-Tenant-ID.
The frontend then re-keyed `authStore.tenant` to the home tenant on
every refresh, undoing the switch. Fix: read the active tenant id from
context (`types.TenantIDFromContext(ctx)`) and only fall back to
user.TenantID when context is unset.
The server-side enforcement was correct throughout — every fix here is
purely about not lying to the user with the rendered UI.
PR 1-5 of the multi-tenant RBAC series enforce the role matrix; what
they don't do is keep a durable record of who did what. Today the
RBAC middleware logs "[rbac] role insufficient ..." to stderr -
perishable, not searchable. Operators cannot answer "who removed
Alice from tenant T-foo on Tuesday" or "did we 403 a probing user
200 times last hour" after the fact.
This PR adds a generic per-tenant audit log that:
- Captures mutating RBAC actions (member add/remove/role-change/leave)
durably from the tenant_member service after the repo write succeeds.
- Captures enforcement rejections (RequireRole / RequireOwnershipOrRole
denies under EnableRBAC=true) via a gin-context-injected hook.
- Sliding-window dedup (1 minute) on (tenant, actor, action, path)
for denied events so a probing client cannot flood audit_logs.
- Default-on - no extra env flag. Schema is generic enough for future
KB / agent / datasource events to reuse via new action namespaces
without another migration.
Schema (migration 000044) mirrors wiki_log_entries' shape: a single
audit_logs table with action, target_type/id, request path/method,
outcome, and a JSONB details column. Three indexes: (tenant_id, id
DESC) for the per-tenant cursor feed, actor_user_id for "what did
Alice do", (tenant_id, action) for both the action filter and the
dedup lookup. Sqlite mirror added for Lite mode.
Wiring is nil-safe: the audit field on tenantMemberService and the
gin-context lookup in rbac.go both tolerate a nil service so existing
tests construct without a stub and dormant deployments degrade
gracefully if the provider is misconfigured.
The query API is GET /tenants/:id/audit-log, gated by Admin+ on top
of the existing PathTenantMatch (denial histories should not surface
to ordinary members). Cursor-paginated by descending id (sidesteps
the duplicate-timestamp tie-breaking that BIGSERIAL trivially solves).
Frontend: TenantMembers.vue gets a t-tabs wrapper with Members
(existing) and Audit log (new) tabs; the audit tab is gated to
Admin+ to mirror the server, lazy-loads on first open, paginates via
the next_cursor pattern, and renders coloured chips per
action/outcome so an operator can scan for anomalies. i18n keys
added in all four locales.
Tests: 14 new tests across service (6), middleware (4), and handler
(4). Service tests pin the dedup window behaviour (writes again
after expiry, dedup is per-actor-per-path, degrades on lookup
error). Middleware tests pin both reject sites' audit hook and the
"dormant mode does not fire" + "nil audit does not panic"
invariants. Handler tests pin the cursor envelope and that an empty
page returns next_cursor=0.
Out of scope: per-action ACL on the audit log, retention/TTL job,
KB/agent/datasource event wiring (schema supports it - just no
emitters yet), async writes, frontend filter UI.