mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
Phase 3 (#1440) gate flip. PR 1 (#1445) + PR 2a (#1481) + PR 2b (#1482) laid the type prep + driver skeleton + read/write paths as gated dead code; this PR wires every activation surface so opensearch becomes a registerable VectorStore engine. Activation wiring - internal/types: validEngineTypes / GetVectorStoreTypes (with HNSW bounds + knn_engine enum + Immutable hints) / retrieverEngineMapping / buildEnvStoreForDriver — every gated surface now recognises "opensearch". IndexConfig grows four omitempty HNSW fields (HNSWM / HNSWEFConstruction / HNSWEFSearch / KNNEngine), keeping other engines' serialised config byte-identical. - internal/container: createOpenSearchEngine + the switch case in createEngineServiceFromStore; the RETRIEVE_DRIVER=opensearch env path in initRetrieveEngineRegistry; NewEngineFactory now closes over the AuditLogService (the EngineFactory type itself is unchanged). - internal/application/service/vectorstore_healthcheck.go: a testOpenSearchConnection case so CreateStore's connectivity probe accepts opensearch instead of returning 400. - internal/application/repository/retriever/opensearch/transport.go: NewOpenSearchClient is exported so the factory and env path can build the TLS-hardened client; healthcheck.go reuses the unexported probeVersion / probeKNNPlugin for the service-layer probe. Service-layer validation - validateOpenSearchIndexConfig validates the HNSW caps (m 2-100, ef_construction 2-4096, ef_search 1-10000, knn_engine ∈ lucene|faiss). Shards/replicas continue to be enforced by the flat ValidateIndexConfig. Create-only: UpdateStore mutates the name only. - validateConnectionConfig requires addr for opensearch. Sync implementations (stubs.go shrinks) - CopyIndices (copy.go) mirrors the Elasticsearch / Qdrant pattern — search → BatchSave with the source_id remap for generated questions — so dim/keyword routing and the source_id contract come from BatchSave for free. embeddingMap is keyed by the *target* SourceID because OpenSearch's BatchSave looks up embeddings by SourceID (lookupEmbedding), not by chunk_id (the ES driver's convention). Pagination is from/size; copies larger than max_result_window (default 10000) need the scroll-based async path that lands later. - BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID (bulk_update.go) group the input by target value and issue one _update_by_query per group over the cross-dim <base>_* pattern. Caller values flow through bound script params only — never string-interpolated into the Painless source — closing the script-injection surface. - inspectByQueryResponse (byquery.go) mirrors inspectBulkResponse: the full failure reason goes to the debug log only; the returned error carries the bounded id + type. - UpdateByQueryParams.Refresh is *bool in opensearch-go v4.6.0 (the same shape as DeleteByQuery's quirk), so refresh=wait_for is not expressible; we use refresh=true. Driver-owned audit (DIP) - A new opensearch.AuditSink interface (with nopSink + WithAuditSink functional option) lets the driver emit opensearch.index_created and opensearch.reindex_executed events without importing any service package — the service layer implements the interface. NewRepository takes opts, so existing 4-arg test call sites keep compiling unchanged. - internal/container/audit_sink.go bridges AuditSink to AuditLogService. When the context carries no tenant (the env-path registration ctx during boot, for example) the adapter skips the emit with a warning rather than silently writing tenant_id=0, which would collide with the system-scope sentinel. Frontend + polish - FieldSchema (frontend/src/api/vector-store.ts) gains min/max/enum/ immutable. VectorStoreSettings.vue is now schema-driven: a closed `enum` renders a t-select; number inputs use the schema's `:min`/`:max` and fall back to the legacy replica-vs-shard heuristic only when the schema does not pin them; a danger-coloured warning fires when insecure_skip_verify is toggled on (the switch and warning are wrapped in a vertical stack so the warning sits on its own row below the switch). - i18n: labels for hnsw_m / hnsw_ef_construction / hnsw_ef_search / knn_engine / insecure_skip_verify plus the warning copy in en-US, ko-KR, zh-CN, ru-RU. - docker-compose.dev.yml: an opensearch profile (single-node 3.3.2 with security plugin disabled for dev only). OpenSearch Dashboards lives in a separate, opt-in opensearch-ui profile so the heavy UI container is not forced up alongside the cluster (the driver e2e is fully curl-verifiable against :9200). The new docs/dev/opensearch-integration-test.md covers the end-to-end exercise and the single-node guidance (set replicas=0 to keep the cluster Green). Gating-guard tests flipped - The "OpenSearch is NOT in validEngineTypes / mapping / types list / env builder / stubs" guard tests from PR 1 / PR 2 are replaced by their positive counterparts in this PR. The test suite was the activation checklist; the activation flip is its diff. Backward compatibility - Additive everywhere. IndexConfig's new HNSW fields are omitempty so other engines' serialised config is byte-identical. Existing Elasticsearch / Qdrant / Milvus / Weaviate / Doris / TencentVectorDB stores are untouched. No migrations. Test plan - go build ./... clean - go vet ./... clean - gofmt -l clean on touched files - go test ./... — only TestOssEnsureBucket_CreateFails (Aliyun OSS endpoint), the docreader gRPC tests, and the doris SQL-shape tests fail; all three are pre-existing on upstream/main and untouched by this PR. - New tests across internal/types, opensearch, service and container — including a full end-to-end env-path test that exercises initRetrieveEngineRegistry with RETRIEVE_DRIVER=opensearch against an httptest cluster.
67 lines
2.5 KiB
Go
67 lines
2.5 KiB
Go
package container
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
|
|
"github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
|
|
"github.com/Tencent/WeKnora/internal/logger"
|
|
"github.com/Tencent/WeKnora/internal/types"
|
|
"github.com/Tencent/WeKnora/internal/types/interfaces"
|
|
)
|
|
|
|
// auditSinkAdapter bridges the OpenSearch driver's AuditSink (which the driver
|
|
// owns so it imports no service package) to the service-layer AuditLogService.
|
|
// This keeps the dependency one-way: the driver depends only on its own
|
|
// AuditSink abstraction; the container implements it.
|
|
type auditSinkAdapter struct {
|
|
svc interfaces.AuditLogService
|
|
}
|
|
|
|
// newAuditSinkAdapter returns an opensearch.AuditSink backed by svc. A nil svc
|
|
// yields a sink whose emits are no-ops.
|
|
func newAuditSinkAdapter(svc interfaces.AuditLogService) opensearch.AuditSink {
|
|
return auditSinkAdapter{svc: svc}
|
|
}
|
|
|
|
func (a auditSinkAdapter) EmitIndexCreated(ctx context.Context, alias string, dim int) {
|
|
a.emit(ctx, types.AuditActionOpenSearchIndexCreated, alias,
|
|
map[string]any{"alias": alias, "dim": dim})
|
|
}
|
|
|
|
func (a auditSinkAdapter) EmitReindexExecuted(ctx context.Context, srcAlias, dstAlias string, docs int64) {
|
|
a.emit(ctx, types.AuditActionOpenSearchReindexExecuted, dstAlias,
|
|
map[string]any{"src_alias": srcAlias, "dst_alias": dstAlias, "docs": docs})
|
|
}
|
|
|
|
// emit writes one audit entry. It skips (with a warning) when the context
|
|
// carries no tenant — driver events can fire from background task contexts
|
|
// (e.g. lazy index creation under an async copy task), and writing tenant_id=0
|
|
// would collide with the system-scope sentinel and corrupt the audit trail.
|
|
func (a auditSinkAdapter) emit(ctx context.Context, action types.AuditAction, target string, detail map[string]any) {
|
|
if a.svc == nil {
|
|
return
|
|
}
|
|
tid, ok := types.TenantIDFromContext(ctx)
|
|
if !ok {
|
|
logger.GetLogger(ctx).Warnf("[audit] %s: no tenant in context, skipping audit (target=%s)", action, target)
|
|
return
|
|
}
|
|
// Details is a typed JSON blob — only bounded, non-secret fields. Never
|
|
// include cluster reason strings or connection secrets.
|
|
b, err := json.Marshal(detail)
|
|
if err != nil {
|
|
logger.GetLogger(ctx).Warnf("[audit] %s: marshal details failed: %v", action, err)
|
|
b = []byte("{}")
|
|
}
|
|
if err := a.svc.Log(ctx, &types.AuditLog{
|
|
TenantID: tid,
|
|
Action: action,
|
|
TargetType: "opensearch_index",
|
|
TargetID: target,
|
|
Details: types.JSON(b),
|
|
}); err != nil {
|
|
logger.GetLogger(ctx).Warnf("[audit] %s emit failed: %v", action, err)
|
|
}
|
|
}
|