Adds a five-segment progress model for the document parsing pipeline so
the UI (PR③) can render a timeline showing where each document is
(DocReader → Chunking → Embedding → Multimodal → PostProcess) and
which stage failed with what error code.
- New table `knowledge_processing_stages` (migration 000052) with one
row per (knowledge_id, stage). UPSERT on Begin/Done/Fail bumps an
attempt counter so re-parses don't lose history.
- StageTracker service exposes Begin/Done/Fail/Skip; all calls are
best-effort and never break the pipeline if persistence fails.
- Stable error codes (DOCREADER_TIMEOUT / EMBEDDING_RATE_LIMIT /
VECTORSTORE_WRITE_FAILED / ...) the UI can map to localized
remediation hints.
- Tracker call sites added at the four meaningful failure points:
convert (DocReader), CreateChunks (Chunking), BatchIndex (Embedding),
enqueueImageMultimodalTasks (Multimodal start),
KnowledgePostProcess.Handle (Multimodal close + PostProcess).
- New endpoint `GET /api/v1/knowledge/:id/stages` returns the five
canonical stages — missing rows are synthesized as "pending" so
the timeline always renders five segments. Includes current_stage
and last_error block.
Added a new configuration option to limit the number of tenants a non-superuser can create via self-service. Introduced a new error type for handling cases where users exceed this limit, returning a 429 status code. Updated the tenant creation and update handlers to enforce this limit and provide appropriate feedback to users. Additionally, refactored the tenant update request structure to ensure only mutable fields are allowed.
Refs: #1303
Follow-up to the route-level KB-access guard introduced earlier in
this PR. Five reviewer findings, all fixed in middleware + tests
without changing handler call sites:
- Honour `?agent_id` in the agent-share fallback. The in-handler
resolution in knowledgebase.go branches on a specific agent's
KBSelectionMode (all / selected / none); the guard was only ever
calling TenantCanAccessKBViaSomeSharedAgent, which let some
requests pass at the guard layer that the handler would then
reject — and rewrote the request's tenant context to the source
tenant before the rejection happened. The guard now mirrors the
handler: a specific agent_id pins the resolution to that agent
(no fallback to any-agent), an empty agent_id keeps the existing
any-agent behaviour. The knowledge.go / knowledgebase.go handler
helpers stay in place as defence in depth until they're folded
into the guard in a follow-up.
- Distinguish not-found vs. transient on the knowledge / chunk
resolvers. A DB hiccup used to surface as a 404, both confusing
clients and hiding the underlying failure from monitoring. The
resolvers now return apperror 404 only for the known not-found
sentinels (apprepo.ErrKnowledgeBaseNotFound, ErrResourceNotFound)
and let other errors propagate; the guard maps those to a fresh
NewServiceUnavailableError (503).
- Stop 500-ing on legacy chunks with empty knowledge_base_id. The
chunk is effectively unresolvable to a KB, so the client gets the
same 404 they'd get for a missing chunk instead of a 500 that
pollutes alerting. The warn log preserves the operator signal.
- Honour cfg.Tenant.EnableRBAC. The guard now mirrors RequireRole /
RequireOwnershipOrRole: with enforcement off, 401/403 paths log
the would-be rejection and pass through. 404 still fires either
way (a missing resource is not an authorisation event). Keeps the
rollout window safe — operators can flip enforcement on globally
without code changes elsewhere.
- Document the c.Keys vs. c.Request.Context() split. The guard
intentionally rewrites only the request context (handlers that
still run their own share resolution, currently knowledge.go /
knowledgebase.go, would otherwise see kb.TenantID == c.Keys
tenant and misclassify shared-Viewer access as Admin). The
package-level comment now spells this out so the migration of the
remaining handlers in a follow-up doesn't accidentally regress
`my_permission` rendering.
Tests:
- Existing four cases keep working; runGuard signature gained a
guardOpts knob.
- Eight new cases cover the agent-share resolution paths (any-agent
viewer, Editor required → must reject, specific agent in modes
all / selected-match / selected-miss / none, cross-tenant
mismatch) plus the EnableRBAC=false fail-open + 404-still-fires
pair.
Wires KnowledgeBase.VectorStoreID and the ownership-aware retrieve factory
into the user-facing knowledge-base lifecycle:
- POST /knowledge-bases validates the requested vector_store_id against
the caller's tenant scope and the engine registry. New error codes
ErrVectorStoreBindingInvalid (2200) and ErrVectorStoreUnavailable (2201)
distinguish the typed branches without echoing UUIDs to the client.
- GET / POST / PUT / PUT-pin responses embed the bound store's display
metadata (name, source, engine_type, status) without exposing any
connection credentials. Cross-tenant shared KBs receive a suppressed
payload (vector_store_id stripped, source="shared") so operator-chosen
store names cannot be enumerated across tenants.
- POST /knowledge-bases/copy synchronously rejects clones whose target
has a different embedding model or vector store, before the async
clone task is enqueued. The async clone worker re-applies the same
checks for defense in depth.
- DELETE /vector-stores/:id refuses to remove a store with bound KBs,
inside a transaction that row-locks the store on PostgreSQL and
serializes via WAL on SQLite. unregister-from-registry is wrapped in
defer/recover so a panic surfaces as a structured warning instead of
silently leaking a stale engine.
- vector_store_id is immutable after creation. The GORM <-:create tag
blocks every ORM update path; the service-layer DTO omits the field
entirely; a reflection-based regression test catches any future
maintainer who adds it back to either layer.
- Empty-string vector_store_id is normalized to nil at both the create
path and inside SharesStoreWith, so rows persisted by callers that
did not run Normalize first cannot trip false same-store comparisons.
Part of #993. Depends on #994 and #1310.