Files
WeKnora/internal/application/service/vectorstore_healthcheck.go
ochan.kwon 40b74e2efa feat(retriever): activate OpenSearch k-NN driver (PR 3 of 3)
Phase 3 (#1440) gate flip. PR 1 (#1445) + PR 2a (#1481) + PR 2b (#1482)
laid the type prep + driver skeleton + read/write paths as gated dead
code; this PR wires every activation surface so opensearch becomes a
registerable VectorStore engine.

Activation wiring
- internal/types: validEngineTypes / GetVectorStoreTypes (with HNSW
  bounds + knn_engine enum + Immutable hints) / retrieverEngineMapping /
  buildEnvStoreForDriver — every gated surface now recognises
  "opensearch". IndexConfig grows four omitempty HNSW fields (HNSWM /
  HNSWEFConstruction / HNSWEFSearch / KNNEngine), keeping other engines'
  serialised config byte-identical.
- internal/container: createOpenSearchEngine + the switch case in
  createEngineServiceFromStore; the RETRIEVE_DRIVER=opensearch env path
  in initRetrieveEngineRegistry; NewEngineFactory now closes over the
  AuditLogService (the EngineFactory type itself is unchanged).
- internal/application/service/vectorstore_healthcheck.go: a
  testOpenSearchConnection case so CreateStore's connectivity probe
  accepts opensearch instead of returning 400.
- internal/application/repository/retriever/opensearch/transport.go:
  NewOpenSearchClient is exported so the factory and env path can build
  the TLS-hardened client; healthcheck.go reuses the unexported
  probeVersion / probeKNNPlugin for the service-layer probe.

Service-layer validation
- validateOpenSearchIndexConfig validates the HNSW caps (m 2-100,
  ef_construction 2-4096, ef_search 1-10000, knn_engine ∈ lucene|faiss).
  Shards/replicas continue to be enforced by the flat ValidateIndexConfig.
  Create-only: UpdateStore mutates the name only.
- validateConnectionConfig requires addr for opensearch.

Sync implementations (stubs.go shrinks)
- CopyIndices (copy.go) mirrors the Elasticsearch / Qdrant pattern —
  search → BatchSave with the source_id remap for generated questions —
  so dim/keyword routing and the source_id contract come from BatchSave
  for free. embeddingMap is keyed by the *target* SourceID because
  OpenSearch's BatchSave looks up embeddings by SourceID
  (lookupEmbedding), not by chunk_id (the ES driver's convention).
  Pagination is from/size; copies larger than max_result_window
  (default 10000) need the scroll-based async path that lands later.
- BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID (bulk_update.go)
  group the input by target value and issue one _update_by_query per
  group over the cross-dim <base>_* pattern. Caller values flow through
  bound script params only — never string-interpolated into the Painless
  source — closing the script-injection surface.
- inspectByQueryResponse (byquery.go) mirrors inspectBulkResponse: the
  full failure reason goes to the debug log only; the returned error
  carries the bounded id + type.
- UpdateByQueryParams.Refresh is *bool in opensearch-go v4.6.0 (the same
  shape as DeleteByQuery's quirk), so refresh=wait_for is not
  expressible; we use refresh=true.

Driver-owned audit (DIP)
- A new opensearch.AuditSink interface (with nopSink + WithAuditSink
  functional option) lets the driver emit opensearch.index_created and
  opensearch.reindex_executed events without importing any service
  package — the service layer implements the interface. NewRepository
  takes opts, so existing 4-arg test call sites keep compiling unchanged.
- internal/container/audit_sink.go bridges AuditSink to AuditLogService.
  When the context carries no tenant (the env-path registration ctx
  during boot, for example) the adapter skips the emit with a warning
  rather than silently writing tenant_id=0, which would collide with the
  system-scope sentinel.

Frontend + polish
- FieldSchema (frontend/src/api/vector-store.ts) gains min/max/enum/
  immutable. VectorStoreSettings.vue is now schema-driven: a closed
  `enum` renders a t-select; number inputs use the schema's `:min`/`:max`
  and fall back to the legacy replica-vs-shard heuristic only when the
  schema does not pin them; a danger-coloured warning fires when
  insecure_skip_verify is toggled on (the switch and warning are wrapped
  in a vertical stack so the warning sits on its own row below the switch).
- i18n: labels for hnsw_m / hnsw_ef_construction / hnsw_ef_search /
  knn_engine / insecure_skip_verify plus the warning copy in en-US,
  ko-KR, zh-CN, ru-RU.
- docker-compose.dev.yml: an opensearch profile (single-node 3.3.2 with
  security plugin disabled for dev only). OpenSearch Dashboards lives in a
  separate, opt-in opensearch-ui profile so the heavy UI container is not
  forced up alongside the cluster (the driver e2e is fully curl-verifiable
  against :9200). The new docs/dev/opensearch-integration-test.md covers the
  end-to-end exercise and the single-node guidance (set replicas=0 to keep
  the cluster Green).

Gating-guard tests flipped
- The "OpenSearch is NOT in validEngineTypes / mapping / types list /
  env builder / stubs" guard tests from PR 1 / PR 2 are replaced by
  their positive counterparts in this PR. The test suite was the
  activation checklist; the activation flip is its diff.

Backward compatibility
- Additive everywhere. IndexConfig's new HNSW fields are omitempty so
  other engines' serialised config is byte-identical. Existing
  Elasticsearch / Qdrant / Milvus / Weaviate / Doris / TencentVectorDB
  stores are untouched. No migrations.

Test plan
- go build ./... clean
- go vet ./... clean
- gofmt -l clean on touched files
- go test ./... — only TestOssEnsureBucket_CreateFails (Aliyun OSS
  endpoint), the docreader gRPC tests, and the doris SQL-shape tests
  fail; all three are pre-existing on upstream/main and untouched by
  this PR.
- New tests across internal/types, opensearch, service and container —
  including a full end-to-end env-path test that exercises
  initRetrieveEngineRegistry with RETRIEVE_DRIVER=opensearch against an
  httptest cluster.
2026-05-29 16:32:27 +08:00

331 lines
12 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
package service
import (
"context"
"database/sql"
"encoding/json"
"fmt"
"io"
"net"
"net/http"
"strings"
"time"
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
"github.com/Tencent/WeKnora/internal/errors"
"github.com/Tencent/WeKnora/internal/logger"
"github.com/Tencent/WeKnora/internal/types"
"github.com/go-sql-driver/mysql" // MySQL driver for database/sql, used by Doris connection test
_ "github.com/jackc/pgx/v5/stdlib" // pgx driver for database/sql
"github.com/qdrant/go-client/qdrant"
"github.com/tencent/vectordatabase-sdk-go/tcvectordb"
"github.com/weaviate/weaviate-go-client/v5/weaviate"
"github.com/weaviate/weaviate-go-client/v5/weaviate/auth"
wgrpc "github.com/weaviate/weaviate-go-client/v5/weaviate/grpc"
)
const connectionTestTimeout = 10 * time.Second
// TestConnection tests connectivity to a vector database.
// Returns the detected server version on success (e.g., "7.10.1"), empty string if unknown.
func (s *vectorStoreService) TestConnection(
ctx context.Context,
engineType types.RetrieverEngineType,
config types.ConnectionConfig,
) (string, error) {
switch engineType {
case types.ElasticsearchRetrieverEngineType:
return testElasticsearchConnection(ctx, config)
case types.PostgresRetrieverEngineType:
return testPostgresConnection(ctx, config)
case types.QdrantRetrieverEngineType:
return testQdrantConnection(ctx, config)
case types.MilvusRetrieverEngineType:
return testMilvusConnection(ctx, config)
case types.TencentVectorDBRetrieverEngineType:
return testTencentVectorDBConnection(ctx, config)
case types.WeaviateRetrieverEngineType:
return testWeaviateConnection(ctx, config)
case types.DorisRetrieverEngineType:
return testDorisConnection(ctx, config)
case types.OpenSearchRetrieverEngineType:
return testOpenSearchConnection(ctx, config)
case types.SQLiteRetrieverEngineType:
// SQLite is file-based, no remote connection to test
return "", nil
default:
return "", errors.NewBadRequestError(
fmt.Sprintf("connection test not supported for engine type: %s", engineType))
}
}
func testElasticsearchConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
// Use plain HTTP GET to the root endpoint instead of the go-elasticsearch SDK.
// The v8 SDK's TypedClient performs a product check that rejects ES7 servers,
// so we use a raw HTTP request to support both v7 and v8.
req, err := http.NewRequestWithContext(ctx, http.MethodGet, config.Addr, nil)
if err != nil {
return "", errors.NewBadRequestError("failed to create elasticsearch request: invalid address")
}
if config.Username != "" {
req.SetBasicAuth(config.Username, config.Password)
}
client := &http.Client{Timeout: connectionTestTimeout}
resp, err := client.Do(req)
if err != nil {
logger.Warnf(ctx, "Elasticsearch connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to elasticsearch: connection refused or authentication failed")
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
logger.Warnf(ctx, "Elasticsearch connection test returned status %d", resp.StatusCode)
return "", errors.NewBadRequestError("failed to connect to elasticsearch: authentication failed or server error")
}
// Parse version from response: {"version": {"number": "7.10.1"}, ...}
body, err := io.ReadAll(io.LimitReader(resp.Body, 4096))
if err != nil {
return "", nil // connected but version unknown
}
var esInfo struct {
Version struct {
Number string `json:"number"`
} `json:"version"`
}
if err := json.Unmarshal(body, &esInfo); err != nil {
return "", nil // connected but version unparseable
}
return esInfo.Version.Number, nil
}
func testPostgresConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
if config.UseDefaultConnection {
// Using the default app DB connection — always reachable if the app is running.
// Cannot query version without a DB handle; return empty.
return "", nil
}
db, err := sql.Open("pgx", config.Addr)
if err != nil {
return "", errors.NewBadRequestError("failed to create postgres connection: invalid configuration")
}
defer db.Close()
if err := db.PingContext(testCtx); err != nil {
logger.Warnf(ctx, "Postgres connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to postgres: connection refused or authentication failed")
}
// Detect version
var version string
if err := db.QueryRowContext(testCtx, "SHOW server_version").Scan(&version); err != nil {
logger.Warnf(ctx, "Postgres version detection failed: %v", err)
return "", nil // connected but version unknown
}
return version, nil
}
func testQdrantConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
port := config.Port
if port == 0 {
port = 6334
}
client, err := qdrant.NewClient(&qdrant.Config{
Host: config.Host,
Port: port,
APIKey: config.APIKey,
UseTLS: config.UseTLS,
})
if err != nil {
return "", errors.NewBadRequestError("failed to create qdrant client: invalid configuration")
}
defer client.Close()
result, err := client.HealthCheck(testCtx)
if err != nil {
logger.Warnf(ctx, "Qdrant connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to qdrant: connection refused or authentication failed")
}
return result.GetVersion(), nil
}
func testMilvusConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
// Use TCP dial instead of the Milvus SDK to avoid protobuf namespace conflict
// between milvus-proto and qdrant-client (both register "common.proto").
// A TCP dial is sufficient for connectivity verification; the Milvus SDK client
// creation in container.go (PR 3) will validate full gRPC connectivity.
// Version detection is not possible with TCP dial alone.
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
addr := config.Addr
if addr == "" {
addr = "localhost:19530"
}
conn, err := (&net.Dialer{}).DialContext(testCtx, "tcp", addr)
if err != nil {
logger.Warnf(ctx, "Milvus connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to milvus: connection refused or server unreachable")
}
defer conn.Close()
return "", nil
}
func testTencentVectorDBConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
client, err := tcvectordb.NewRpcClient(config.Addr, config.Username, config.APIKey, &tcvectordb.ClientOption{
ReadConsistency: tcvectordb.EventualConsistency,
Timeout: connectionTestTimeout,
})
if err != nil {
logger.Warnf(ctx, "Tencent VectorDB connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to tencent vectordb: connection refused or authentication failed")
}
defer client.Close()
if _, err := client.ListDatabase(testCtx); err != nil {
logger.Warnf(ctx, "Tencent VectorDB list database failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to tencent vectordb: authentication failed or server error")
}
return "", nil
}
func testWeaviateConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
host := config.Host
if host == "" {
host = "weaviate:8080"
}
grpcAddress := config.GrpcAddress
if grpcAddress == "" {
grpcAddress = "weaviate:50051"
}
scheme := config.Scheme
if scheme == "" {
scheme = "http"
}
weaviateCfg := weaviate.Config{
Host: host,
GrpcConfig: &wgrpc.Config{
Host: grpcAddress,
},
Scheme: scheme,
}
if config.APIKey != "" {
weaviateCfg.AuthConfig = auth.ApiKey{Value: config.APIKey}
}
// Weaviate Go client v5 does not expose a Close() method;
// it uses HTTP + gRPC transports that are managed internally.
client, err := weaviate.NewClient(weaviateCfg)
if err != nil {
logger.Warnf(ctx, "Weaviate connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to create weaviate client: invalid configuration")
}
isReady, err := client.Misc().ReadyChecker().Do(testCtx)
if err != nil || !isReady {
logger.Warnf(ctx, "Weaviate connection test failed: ready=%v, err=%v", isReady, err)
return "", errors.NewBadRequestError("failed to connect to weaviate: server not ready or authentication failed")
}
// Detect version via /v1/meta
meta, err := client.Misc().MetaGetter().Do(testCtx)
if err != nil || meta == nil {
return "", nil // connected but version unknown
}
return meta.Version, nil
}
// testDorisConnection 通过 MySQL 协议database/sql + go-sql-driver
// Ping Doris FE 并查询 @@version。
//
// Doris 的 @@version 形如 "5.7.99 Doris-4.1.0"——前半段是 MySQL 协议
// 兼容性表达式,"Doris-" 之后才是真实版本号。统一只返回 "4.1.0" 这类
// 裸版本号,与 Postgres/ES 路径的格式保持一致。
func testDorisConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
if config.Addr == "" {
return "", errors.NewBadRequestError("failed to create doris connection: addr is required")
}
// Database 不强制要求Ping 时无明确库则用 information_schema任何 MySQL 兼容服务都有)。
database := config.Database
if database == "" {
database = "information_schema"
}
// 用 mysql.Config.FormatDSN() 构造 DSN避免用户名/密码中 `@` `:` `/`
// 等特殊字符破坏字面量拼接fmt.Sprintf 会跑偏,参考 issue #1234 类问题)。
cfg := mysql.NewConfig()
cfg.User = config.Username
cfg.Passwd = config.Password
cfg.Net = "tcp"
cfg.Addr = config.Addr
cfg.DBName = database
cfg.Timeout = 5 * time.Second
db, err := sql.Open("mysql", cfg.FormatDSN())
if err != nil {
return "", errors.NewBadRequestError("failed to create doris connection: invalid configuration")
}
defer db.Close()
if err := db.PingContext(testCtx); err != nil {
logger.Warnf(ctx, "Doris connection test failed: %v", err)
return "", errors.NewBadRequestError("failed to connect to doris: connection refused or authentication failed")
}
var version string
if err := db.QueryRowContext(testCtx, "SELECT @@version").Scan(&version); err != nil {
logger.Warnf(ctx, "Doris version detection failed: %v", err)
return "", nil
}
if i := strings.Index(version, "Doris-"); i >= 0 {
return strings.TrimSpace(version[i+len("Doris-"):]), nil
}
return version, nil
}
// testOpenSearchConnection verifies the cluster is reachable, runs a
// supported OpenSearch version, and has the k-NN plugin installed. The driver
// owns the probe logic; a generic message is returned on failure so cluster
// internals are not surfaced to the API caller.
func testOpenSearchConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
if config.Addr == "" {
return "", errors.NewBadRequestError("failed to create opensearch connection: addr is required")
}
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
if err := openSearchRepo.TestConnection(testCtx, &config); err != nil {
logger.Warnf(ctx, "OpenSearch connection test failed: %v", err)
return "", errors.NewBadRequestError(
"failed to connect to opensearch: check address, credentials, version (>= 2.4), and that the k-NN plugin is installed")
}
// Version is detected during the probe but not surfaced here; lazy index
// creation re-validates on first use.
return "", nil
}