WeKnora

mirror of https://github.com/Tencent/WeKnora.git synced 2026-06-04 13:30:32 +08:00

Author	SHA1	Message	Date
wizardchen	04f56f9cda	feat(knowledge): track per-stage parsing progress with /stages API Adds a five-segment progress model for the document parsing pipeline so the UI (PR③) can render a timeline showing where each document is (DocReader → Chunking → Embedding → Multimodal → PostProcess) and which stage failed with what error code. - New table `knowledge_processing_stages` (migration 000052) with one row per (knowledge_id, stage). UPSERT on Begin/Done/Fail bumps an attempt counter so re-parses don't lose history. - StageTracker service exposes Begin/Done/Fail/Skip; all calls are best-effort and never break the pipeline if persistence fails. - Stable error codes (DOCREADER_TIMEOUT / EMBEDDING_RATE_LIMIT / VECTORSTORE_WRITE_FAILED / ...) the UI can map to localized remediation hints. - Tracker call sites added at the four meaningful failure points: convert (DocReader), CreateChunks (Chunking), BatchIndex (Embedding), enqueueImageMultimodalTasks (Multimodal start), KnowledgePostProcess.Handle (Multimodal close + PostProcess). - New endpoint `GET /api/v1/knowledge/:id/stages` returns the five canonical stages — missing rows are synthesized as "pending" so the timeline always renders five segments. Includes current_stage and last_error block.	2026-05-28 15:14:45 +08:00
wizardchen	7d030a6f6a	feat(tenant): implement tenant creation limit and error handling Added a new configuration option to limit the number of tenants a non-superuser can create via self-service. Introduced a new error type for handling cases where users exceed this limit, returning a 429 status code. Updated the tenant creation and update handlers to enforce this limit and provide appropriate feedback to users. Additionally, refactored the tenant update request structure to ensure only mutable fields are allowed. Refs: #1303	2026-05-18 17:28:58 +08:00
wizardchen	8efe1b4187	refactor(rbac): address kb-access guard review findings Follow-up to the route-level KB-access guard introduced earlier in this PR. Five reviewer findings, all fixed in middleware + tests without changing handler call sites: - Honour `?agent_id` in the agent-share fallback. The in-handler resolution in knowledgebase.go branches on a specific agent's KBSelectionMode (all / selected / none); the guard was only ever calling TenantCanAccessKBViaSomeSharedAgent, which let some requests pass at the guard layer that the handler would then reject — and rewrote the request's tenant context to the source tenant before the rejection happened. The guard now mirrors the handler: a specific agent_id pins the resolution to that agent (no fallback to any-agent), an empty agent_id keeps the existing any-agent behaviour. The knowledge.go / knowledgebase.go handler helpers stay in place as defence in depth until they're folded into the guard in a follow-up. - Distinguish not-found vs. transient on the knowledge / chunk resolvers. A DB hiccup used to surface as a 404, both confusing clients and hiding the underlying failure from monitoring. The resolvers now return apperror 404 only for the known not-found sentinels (apprepo.ErrKnowledgeBaseNotFound, ErrResourceNotFound) and let other errors propagate; the guard maps those to a fresh NewServiceUnavailableError (503). - Stop 500-ing on legacy chunks with empty knowledge_base_id. The chunk is effectively unresolvable to a KB, so the client gets the same 404 they'd get for a missing chunk instead of a 500 that pollutes alerting. The warn log preserves the operator signal. - Honour cfg.Tenant.EnableRBAC. The guard now mirrors RequireRole / RequireOwnershipOrRole: with enforcement off, 401/403 paths log the would-be rejection and pass through. 404 still fires either way (a missing resource is not an authorisation event). Keeps the rollout window safe — operators can flip enforcement on globally without code changes elsewhere. - Document the c.Keys vs. c.Request.Context() split. The guard intentionally rewrites only the request context (handlers that still run their own share resolution, currently knowledge.go / knowledgebase.go, would otherwise see kb.TenantID == c.Keys tenant and misclassify shared-Viewer access as Admin). The package-level comment now spells this out so the migration of the remaining handlers in a follow-up doesn't accidentally regress `my_permission` rendering. Tests: - Existing four cases keep working; runGuard signature gained a guardOpts knob. - Eight new cases cover the agent-share resolution paths (any-agent viewer, Editor required → must reject, specific agent in modes all / selected-match / selected-miss / none, cross-tenant mismatch) plus the EnableRBAC=false fail-open + 404-still-fires pair.	2026-05-18 17:28:58 +08:00
ochan.kwon	0e8de6192c	feat(knowledge-base): validate vector store bindings on create, copy, and delete Wires KnowledgeBase.VectorStoreID and the ownership-aware retrieve factory into the user-facing knowledge-base lifecycle: - POST /knowledge-bases validates the requested vector_store_id against the caller's tenant scope and the engine registry. New error codes ErrVectorStoreBindingInvalid (2200) and ErrVectorStoreUnavailable (2201) distinguish the typed branches without echoing UUIDs to the client. - GET / POST / PUT / PUT-pin responses embed the bound store's display metadata (name, source, engine_type, status) without exposing any connection credentials. Cross-tenant shared KBs receive a suppressed payload (vector_store_id stripped, source="shared") so operator-chosen store names cannot be enumerated across tenants. - POST /knowledge-bases/copy synchronously rejects clones whose target has a different embedding model or vector store, before the async clone task is enqueued. The async clone worker re-applies the same checks for defense in depth. - DELETE /vector-stores/:id refuses to remove a store with bound KBs, inside a transaction that row-locks the store on PostgreSQL and serializes via WAL on SQLite. unregister-from-registry is wrapped in defer/recover so a panic surfaces as a structured warning instead of silently leaking a stale engine. - vector_store_id is immutable after creation. The GORM <-:create tag blocks every ORM update path; the service-layer DTO omits the field entirely; a reflection-based regression test catches any future maintainer who adds it back to either layer. - Empty-string vector_store_id is normalized to nil at both the create path and inside SharesStoreWith, so rows persisted by callers that did not run Normalize first cannot trip false same-store comparisons. Part of #993. Depends on #994 and #1310.	2026-05-18 15:58:46 +08:00
wizardchen	4fa3adbf3b	feat: Add agent configuration and cleanup scripts for database migrations	2025-11-05 23:18:44 +08:00
wizardchen	56eb2bce33	init commit	2025-08-05 15:08:07 +08:00

6 Commits