feat(retriever): activate OpenSearch k-NN driver (PR 3 of 3)

Phase 3 (#1440) gate flip. PR 1 (#1445) + PR 2a (#1481) + PR 2b (#1482)
laid the type prep + driver skeleton + read/write paths as gated dead
code; this PR wires every activation surface so opensearch becomes a
registerable VectorStore engine.

Activation wiring
- internal/types: validEngineTypes / GetVectorStoreTypes (with HNSW
  bounds + knn_engine enum + Immutable hints) / retrieverEngineMapping /
  buildEnvStoreForDriver — every gated surface now recognises
  "opensearch". IndexConfig grows four omitempty HNSW fields (HNSWM /
  HNSWEFConstruction / HNSWEFSearch / KNNEngine), keeping other engines'
  serialised config byte-identical.
- internal/container: createOpenSearchEngine + the switch case in
  createEngineServiceFromStore; the RETRIEVE_DRIVER=opensearch env path
  in initRetrieveEngineRegistry; NewEngineFactory now closes over the
  AuditLogService (the EngineFactory type itself is unchanged).
- internal/application/service/vectorstore_healthcheck.go: a
  testOpenSearchConnection case so CreateStore's connectivity probe
  accepts opensearch instead of returning 400.
- internal/application/repository/retriever/opensearch/transport.go:
  NewOpenSearchClient is exported so the factory and env path can build
  the TLS-hardened client; healthcheck.go reuses the unexported
  probeVersion / probeKNNPlugin for the service-layer probe.

Service-layer validation
- validateOpenSearchIndexConfig validates the HNSW caps (m 2-100,
  ef_construction 2-4096, ef_search 1-10000, knn_engine ∈ lucene|faiss).
  Shards/replicas continue to be enforced by the flat ValidateIndexConfig.
  Create-only: UpdateStore mutates the name only.
- validateConnectionConfig requires addr for opensearch.

Sync implementations (stubs.go shrinks)
- CopyIndices (copy.go) mirrors the Elasticsearch / Qdrant pattern —
  search → BatchSave with the source_id remap for generated questions —
  so dim/keyword routing and the source_id contract come from BatchSave
  for free. embeddingMap is keyed by the *target* SourceID because
  OpenSearch's BatchSave looks up embeddings by SourceID
  (lookupEmbedding), not by chunk_id (the ES driver's convention).
  Pagination is from/size; copies larger than max_result_window
  (default 10000) need the scroll-based async path that lands later.
- BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID (bulk_update.go)
  group the input by target value and issue one _update_by_query per
  group over the cross-dim <base>_* pattern. Caller values flow through
  bound script params only — never string-interpolated into the Painless
  source — closing the script-injection surface.
- inspectByQueryResponse (byquery.go) mirrors inspectBulkResponse: the
  full failure reason goes to the debug log only; the returned error
  carries the bounded id + type.
- UpdateByQueryParams.Refresh is *bool in opensearch-go v4.6.0 (the same
  shape as DeleteByQuery's quirk), so refresh=wait_for is not
  expressible; we use refresh=true.

Driver-owned audit (DIP)
- A new opensearch.AuditSink interface (with nopSink + WithAuditSink
  functional option) lets the driver emit opensearch.index_created and
  opensearch.reindex_executed events without importing any service
  package — the service layer implements the interface. NewRepository
  takes opts, so existing 4-arg test call sites keep compiling unchanged.
- internal/container/audit_sink.go bridges AuditSink to AuditLogService.
  When the context carries no tenant (the env-path registration ctx
  during boot, for example) the adapter skips the emit with a warning
  rather than silently writing tenant_id=0, which would collide with the
  system-scope sentinel.

Frontend + polish
- FieldSchema (frontend/src/api/vector-store.ts) gains min/max/enum/
  immutable. VectorStoreSettings.vue is now schema-driven: a closed
  `enum` renders a t-select; number inputs use the schema's `:min`/`:max`
  and fall back to the legacy replica-vs-shard heuristic only when the
  schema does not pin them; a danger-coloured warning fires when
  insecure_skip_verify is toggled on (the switch and warning are wrapped
  in a vertical stack so the warning sits on its own row below the switch).
- i18n: labels for hnsw_m / hnsw_ef_construction / hnsw_ef_search /
  knn_engine / insecure_skip_verify plus the warning copy in en-US,
  ko-KR, zh-CN, ru-RU.
- docker-compose.dev.yml: an opensearch profile (single-node 3.3.2 with
  security plugin disabled for dev only). OpenSearch Dashboards lives in a
  separate, opt-in opensearch-ui profile so the heavy UI container is not
  forced up alongside the cluster (the driver e2e is fully curl-verifiable
  against :9200). The new docs/dev/opensearch-integration-test.md covers the
  end-to-end exercise and the single-node guidance (set replicas=0 to keep
  the cluster Green).

Gating-guard tests flipped
- The "OpenSearch is NOT in validEngineTypes / mapping / types list /
  env builder / stubs" guard tests from PR 1 / PR 2 are replaced by
  their positive counterparts in this PR. The test suite was the
  activation checklist; the activation flip is its diff.

Backward compatibility
- Additive everywhere. IndexConfig's new HNSW fields are omitempty so
  other engines' serialised config is byte-identical. Existing
  Elasticsearch / Qdrant / Milvus / Weaviate / Doris / TencentVectorDB
  stores are untouched. No migrations.

Test plan
- go build ./... clean
- go vet ./... clean
- gofmt -l clean on touched files
- go test ./... — only TestOssEnsureBucket_CreateFails (Aliyun OSS
  endpoint), the docreader gRPC tests, and the doris SQL-shape tests
  fail; all three are pre-existing on upstream/main and untouched by
  this PR.
- New tests across internal/types, opensearch, service and container —
  including a full end-to-end env-path test that exercises
  initRetrieveEngineRegistry with RETRIEVE_DRIVER=opensearch against an
  httptest cluster.
This commit is contained in:
ochan.kwon
2026-05-28 18:01:17 +09:00
committed by lyingbug
parent 19a1b15106
commit 40b74e2efa
35 changed files with 1781 additions and 169 deletions

View File

@@ -119,6 +119,61 @@ services:
- qdrant - qdrant
- full - full
# OpenSearch k-NN (Phase 3 driver). Single-node dev profile with the
# security plugin disabled → plain HTTP on :9200, no auth/TLS. The image
# bundles the opensearch-knn plugin. For production use a secured,
# multi-node cluster. See docs/dev/opensearch-integration-test.md.
opensearch:
image: opensearchproject/opensearch:3.3.2
container_name: WeKnora-opensearch-dev
environment:
- discovery.type=single-node
# dev only: plain HTTP on :9200, no TLS/auth. The entrypoint script
# honours DISABLE_SECURITY_PLUGIN (env var) to skip both the demo
# install and the OPENSEARCH_INITIAL_ADMIN_PASSWORD requirement.
- DISABLE_SECURITY_PLUGIN=true
- DISABLE_INSTALL_DEMO_CONFIG=true
- OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
- bootstrap.memory_lock=true
ulimits:
memlock:
soft: -1
hard: -1
ports:
- "${OPENSEARCH_PORT:-9200}:9200"
volumes:
- opensearch_data_dev:/usr/share/opensearch/data
networks:
- WeKnora-network-dev
restart: unless-stopped
profiles:
- opensearch
- full
# Also a member of opensearch-ui so the Dashboards depends_on resolves
# when only that profile is active (`--profile opensearch-ui up`).
- opensearch-ui
# Optional UI for visual index/mapping/query inspection. Decoupled from the
# "opensearch" / "full" profiles so the heavy Dashboards container is never
# forced up alongside the cluster — the driver e2e is fully curl-verifiable
# against :9200. Start it on demand with `--profile opensearch-ui up -d`
# (depends_on pulls the cluster in automatically).
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:3.3.0
container_name: WeKnora-opensearch-dashboards-dev
environment:
- OPENSEARCH_HOSTS=["http://opensearch:9200"]
- DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
ports:
- "${OPENSEARCH_DASHBOARDS_PORT:-5601}:5601"
networks:
- WeKnora-network-dev
depends_on:
- opensearch
restart: unless-stopped
profiles:
- opensearch-ui
milvus: milvus:
image: milvusdb/milvus:v2.6.11 image: milvusdb/milvus:v2.6.11
container_name: WeKnora-milvus-dev container_name: WeKnora-milvus-dev
@@ -468,6 +523,7 @@ volumes:
neo4j-data-dev: neo4j-data-dev:
jaeger_data_dev: jaeger_data_dev:
qdrant_data_dev: qdrant_data_dev:
opensearch_data_dev:
milvus_data_dev: milvus_data_dev:
docreader-tmp-dev: docreader-tmp-dev:
langfuse_clickhouse_data_dev: langfuse_clickhouse_data_dev:

View File

@@ -0,0 +1,109 @@
# OpenSearch k-NN driver — local integration test
This guide brings up a single-node OpenSearch cluster and exercises the
OpenSearch retrieve engine end to end. The driver lives in
`internal/application/repository/retriever/opensearch/`.
## 1. Start a dev cluster
```bash
docker compose -f docker-compose.dev.yml --profile opensearch up -d
```
This starts:
- `opensearch` on `http://localhost:9200` — single-node, **security plugin
disabled** (plain HTTP, no auth/TLS). The image bundles the
`opensearch-knn` plugin.
> **OpenSearch Dashboards is optional** and lives in a separate
> `opensearch-ui` profile, so it is *not* started by `--profile opensearch`.
> The whole integration test below is curl-verifiable against `:9200`. If you
> want the web UI (Dev Tools console / visual index inspection), start it on
> demand:
>
> ```bash
> docker compose -f docker-compose.dev.yml --profile opensearch-ui up -d
> # opensearch-dashboards on http://localhost:5601 (depends_on pulls the cluster in)
> ```
Verify:
```bash
curl -s localhost:9200 | jq '.version.distribution, .version.number'
# "opensearch" "3.3.2"
curl -s 'localhost:9200/_cat/plugins?format=json' | jq -r '.[].component' | grep opensearch-knn
```
> Production clusters must enable the security plugin (TLS + auth). The dev
> profile disables it only to keep local setup trivial. When connecting to a
> secured cluster, set `username` / `password` and — for self-signed certs in
> dev only — `insecure_skip_verify=true`.
## 2. Register the store
### Option A — DB store (UI / API)
`POST /api/v1/vector-stores`:
```json
{
"name": "opensearch-local",
"engine_type": "opensearch",
"connection_config": { "addr": "http://localhost:9200" },
"index_config": {
"number_of_shards": 1,
"number_of_replicas": 0,
"hnsw_m": 16,
"hnsw_ef_construction": 100,
"knn_engine": "lucene"
}
}
```
`CreateStore` runs the connection probe (version + k-NN plugin) before
persisting; a bad address / unsupported version / missing plugin is rejected
with `400`.
### Option B — env store
```bash
export RETRIEVE_DRIVER=opensearch
export OPENSEARCH_ADDR=http://localhost:9200
# export OPENSEARCH_USERNAME / OPENSEARCH_PASSWORD for a secured cluster
# export OPENSEARCH_INSECURE_SKIP_VERIFY=true # self-signed dev TLS only
```
## 3. Single-node note (important)
On a single-node cluster, any index created with `number_of_replicas >= 1`
leaves its replica shard **unassigned**, so the index health goes **Yellow**.
Yellow does **not** block reads or writes — it is safe for local testing — but
to keep the cluster Green set **`number_of_replicas: 0`** at store
registration (as in the Option A example above). The driver default is `1`
(it assumes a ≥2-node cluster).
## 4. Exercise the flow
1. Bind a knowledge base to the store and ingest a few documents.
2. Confirm the per-dimension index appears:
`curl -s 'localhost:9200/_cat/indices?v' | grep weknora`
(e.g. `weknora_<storeprefix>_768` + alias, plus `weknora_<storeprefix>_keywords`).
3. Run a retrieval query against the bound KB and confirm hits come back.
4. Copy the KB to another KB and confirm the docs are reindexed
(`opensearch.reindex_executed` audit event).
5. Toggle chunk enabled-status / tag and confirm `_update_by_query` applies it.
## 5. Tear down
```bash
docker compose -f docker-compose.dev.yml --profile opensearch down -v
```
## Scope notes
- Large-batch async reindex / delete (task polling) is a follow-up; the sync
paths handle typical KB sizes (pagination is bounded by `max_result_window`,
default 10000).
- Native `hybrid` query + search pipeline is out of scope — fusion stays at the
service layer (RRF).

View File

@@ -29,6 +29,16 @@ export interface FieldSchema {
sensitive?: boolean sensitive?: boolean
description?: string description?: string
default?: any default?: any
// Inclusive bounds for number fields (omitempty on the backend). When
// absent the UI falls back to per-field heuristics (isReplicaField).
min?: number
max?: number
// Closed value set for string fields (e.g. knn_engine ∈ lucene|faiss).
// When non-empty the UI renders a select instead of a free-text input.
enum?: string[]
// Marks a field that cannot change after store creation. Informational
// for now (edit mode is fully read-only); kept for forward use.
immutable?: boolean
} }
// ===== API Functions ===== // ===== API Functions =====

View File

@@ -1107,11 +1107,17 @@ export default {
shards_num: 'Shards', shards_num: 'Shards',
replica_number: 'In-memory Replicas', replica_number: 'In-memory Replicas',
desired_shard_count: 'Shard Count', desired_shard_count: 'Shard Count',
insecure_skip_verify: 'Skip TLS Verification',
hnsw_m: 'HNSW M (graph degree)',
hnsw_ef_construction: 'HNSW ef_construction',
hnsw_ef_search: 'HNSW ef_search',
knn_engine: 'k-NN Engine',
}, },
envTag: 'DEFAULT', envTag: 'DEFAULT',
testConnection: 'Test Connection', testConnection: 'Test Connection',
testing: 'Testing...', testing: 'Testing...',
immutableNotice: 'Engine type, connection, and index settings cannot be changed after creation.\nTo change these, delete and recreate.', immutableNotice: 'Engine type, connection, and index settings cannot be changed after creation.\nTo change these, delete and recreate.',
insecureSkipVerifyWarning: 'Disabling TLS certificate verification exposes the connection to man-in-the-middle attacks. Use only for self-signed development clusters — never in production.',
validation: { validation: {
nameRequired: 'Name is required', nameRequired: 'Name is required',
engineTypeRequired: 'Engine type is required', engineTypeRequired: 'Engine type is required',

View File

@@ -967,11 +967,17 @@ export default {
shards_num: "샤드 수", shards_num: "샤드 수",
replica_number: "인메모리 레플리카", replica_number: "인메모리 레플리카",
desired_shard_count: "샤드 수", desired_shard_count: "샤드 수",
insecure_skip_verify: "TLS 인증서 검증 생략",
hnsw_m: "HNSW M (그래프 차수)",
hnsw_ef_construction: "HNSW ef_construction",
hnsw_ef_search: "HNSW ef_search",
knn_engine: "k-NN 엔진",
}, },
envTag: "DEFAULT", envTag: "DEFAULT",
testConnection: "연결 테스트", testConnection: "연결 테스트",
testing: "테스트 중...", testing: "테스트 중...",
immutableNotice: "엔진 타입, 연결 정보, 인덱스 설정은 생성 후 변경할 수 없습니다.\n변경이 필요하면 삭제 후 다시 생성하세요.", immutableNotice: "엔진 타입, 연결 정보, 인덱스 설정은 생성 후 변경할 수 없습니다.\n변경이 필요하면 삭제 후 다시 생성하세요.",
insecureSkipVerifyWarning: "TLS 인증서 검증을 끄면 중간자 공격에 노출됩니다. 자체 서명 인증서를 쓰는 개발 클러스터에서만 사용하고, 운영 환경에서는 절대 사용하지 마세요.",
validation: { validation: {
nameRequired: "이름은 필수입니다", nameRequired: "이름은 필수입니다",
engineTypeRequired: "엔진 타입은 필수입니다", engineTypeRequired: "엔진 타입은 필수입니다",

View File

@@ -1019,11 +1019,17 @@ export default {
shards_num: 'Шарды', shards_num: 'Шарды',
replica_number: 'Реплики в памяти', replica_number: 'Реплики в памяти',
desired_shard_count: 'Количество шардов', desired_shard_count: 'Количество шардов',
insecure_skip_verify: 'Пропустить проверку TLS',
hnsw_m: 'HNSW M (степень графа)',
hnsw_ef_construction: 'HNSW ef_construction',
hnsw_ef_search: 'HNSW ef_search',
knn_engine: 'Движок k-NN',
}, },
envTag: 'DEFAULT', envTag: 'DEFAULT',
testConnection: 'Тест подключения', testConnection: 'Тест подключения',
testing: 'Тестирование...', testing: 'Тестирование...',
immutableNotice: 'Тип движка, подключение и настройки индекса нельзя изменить после создания.\nДля изменения удалите и создайте заново.', immutableNotice: 'Тип движка, подключение и настройки индекса нельзя изменить после создания.\nДля изменения удалите и создайте заново.',
insecureSkipVerifyWarning: 'Отключение проверки сертификата TLS делает соединение уязвимым для атак «человек посередине». Используйте только для dev-кластеров с самоподписанными сертификатами — никогда в продакшене.',
validation: { validation: {
nameRequired: 'Название обязательно', nameRequired: 'Название обязательно',
engineTypeRequired: 'Тип движка обязателен', engineTypeRequired: 'Тип движка обязателен',

View File

@@ -965,11 +965,17 @@ export default {
shards_num: "分片数", shards_num: "分片数",
replica_number: "内存副本数", replica_number: "内存副本数",
desired_shard_count: "分片数", desired_shard_count: "分片数",
insecure_skip_verify: "跳过 TLS 证书校验",
hnsw_m: "HNSW M图度数",
hnsw_ef_construction: "HNSW ef_construction",
hnsw_ef_search: "HNSW ef_search",
knn_engine: "k-NN 引擎",
}, },
envTag: "DEFAULT", envTag: "DEFAULT",
testConnection: "测试连接", testConnection: "测试连接",
testing: "测试中...", testing: "测试中...",
immutableNotice: "创建后无法更改引擎类型、连接和索引设置。\n如需更改请删除后重新创建。", immutableNotice: "创建后无法更改引擎类型、连接和索引设置。\n如需更改请删除后重新创建。",
insecureSkipVerifyWarning: "关闭 TLS 证书校验会使连接面临中间人攻击风险。仅可用于自签名证书的开发集群,切勿在生产环境使用。",
validation: { validation: {
nameRequired: "名称为必填项", nameRequired: "名称为必填项",
engineTypeRequired: "引擎类型为必填项", engineTypeRequired: "引擎类型为必填项",

View File

@@ -165,10 +165,15 @@
:label="fieldLabel(field.name)" :label="fieldLabel(field.name)"
:name="`connection_config.${field.name}`" :name="`connection_config.${field.name}`"
> >
<t-switch <div v-if="field.type === 'boolean'" class="boolean-field">
v-if="field.type === 'boolean'" <t-switch v-model="form.connection_config[field.name]" />
v-model="form.connection_config[field.name]" <div
/> v-if="field.name === 'insecure_skip_verify' && form.connection_config[field.name]"
class="field-warning"
>
{{ t('vectorStoreSettings.insecureSkipVerifyWarning') }}
</div>
</div>
<t-input <t-input
v-else-if="field.type === 'string' && field.sensitive" v-else-if="field.type === 'string' && field.sensitive"
v-model="form.connection_config[field.name]" v-model="form.connection_config[field.name]"
@@ -201,12 +206,20 @@
<template v-if="showAdvanced"> <template v-if="showAdvanced">
<template v-for="field in selectedType.index_fields" :key="field.name"> <template v-for="field in selectedType.index_fields" :key="field.name">
<t-form-item :label="fieldLabel(field.name)" :name="`index_config.${field.name}`"> <t-form-item :label="fieldLabel(field.name)" :name="`index_config.${field.name}`">
<!-- Closed value set dropdown (e.g. knn_engine) -->
<t-select
v-if="field.enum && field.enum.length"
v-model="form.index_config[field.name]"
:placeholder="field.default?.toString() || ''"
>
<t-option v-for="opt in field.enum" :key="opt" :value="opt" :label="opt" />
</t-select>
<t-input-number <t-input-number
v-if="field.type === 'number'" v-else-if="field.type === 'number'"
v-model="form.index_config[field.name]" v-model="form.index_config[field.name]"
:placeholder="field.default?.toString()" :placeholder="field.default?.toString()"
:min="1" :min="field.min ?? 1"
:max="isReplicaField(field.name) ? 10 : 64" :max="field.max ?? (isReplicaField(field.name) ? 10 : 64)"
theme="normal" theme="normal"
style="width: 100%;" style="width: 100%;"
/> />
@@ -861,6 +874,23 @@ onMounted(async () => {
white-space: pre-line; white-space: pre-line;
} }
/* Wrap switch + warning in a vertical stack so the warning sits on its own
line below the switch, independent of TDesign's form-item content flex
(which is a nowrap row — margin-top alone has no visible effect there). */
.boolean-field {
display: flex;
flex-direction: column;
align-items: flex-start;
width: 100%;
}
.field-warning {
margin-top: 8px;
font-size: 12px;
line-height: 1.4;
color: var(--td-error-color, #d54941);
}
.readonly-fields { .readonly-fields {
padding: 10px 14px; padding: 10px 14px;
background: var(--td-bg-color-secondarycontainer); background: var(--td-bg-color-secondarycontainer);

View File

@@ -0,0 +1,50 @@
package opensearch
import "context"
// AuditSink receives audit events emitted from within the driver at the
// exact moment they occur (index provisioned, reindex executed). The driver
// owns this abstraction so it imports no service package — the dependency
// arrow stays one-way (service implements AuditSink; the driver only invokes
// it). A nil sink is a no-op, so tests and the env-path (no audit service)
// need no special casing.
type AuditSink interface {
// EmitIndexCreated fires once when the driver provisions a new k-NN
// index. alias is the per-dimension alias (or the keyword-only index
// name); dim is the embedding dimension (0 for the keyword-only index).
EmitIndexCreated(ctx context.Context, alias string, dim int)
// EmitReindexExecuted fires when CopyIndices finishes copying docs from
// one index to another.
EmitReindexExecuted(ctx context.Context, srcAlias, dstAlias string, docs int64)
}
// nopSink is the null-object used when no sink is configured, so emit call
// sites never need a nil check.
type nopSink struct{}
func (nopSink) EmitIndexCreated(context.Context, string, int) {}
func (nopSink) EmitReindexExecuted(context.Context, string, string, int64) {}
var _ AuditSink = nopSink{}
// Option configures a Repository at construction time.
type Option func(*Repository)
// WithAuditSink injects an audit sink. A nil sink is ignored (the Repository
// keeps its default no-op behavior).
func WithAuditSink(s AuditSink) Option {
return func(r *Repository) {
if s != nil {
r.sink = s
}
}
}
// auditSink returns the configured sink, or a no-op if none was set (e.g. a
// Repository built directly in tests, or constructed without WithAuditSink).
func (r *Repository) auditSink() AuditSink {
if r.sink == nil {
return nopSink{}
}
return r.sink
}

View File

@@ -0,0 +1,153 @@
package opensearch
import (
"context"
"strings"
"sync"
"testing"
"github.com/Tencent/WeKnora/internal/types"
)
// spySink records audit events for assertions.
type spySink struct {
mu sync.Mutex
indexCreated []indexCreatedEvent
reindex []reindexEvent
}
type indexCreatedEvent struct {
alias string
dim int
}
type reindexEvent struct {
src, dst string
docs int64
}
func (s *spySink) EmitIndexCreated(_ context.Context, alias string, dim int) {
s.mu.Lock()
defer s.mu.Unlock()
s.indexCreated = append(s.indexCreated, indexCreatedEvent{alias, dim})
}
func (s *spySink) EmitReindexExecuted(_ context.Context, src, dst string, docs int64) {
s.mu.Lock()
defer s.mu.Unlock()
s.reindex = append(s.reindex, reindexEvent{src, dst, docs})
}
// TestBuildInternalCfg_ReadsHNSWFields verifies the HNSW IndexConfig fields
// flow into internalCfg, falling back to defaults when unset.
func TestBuildInternalCfg_ReadsHNSWFields(t *testing.T) {
t.Run("reads set fields", func(t *testing.T) {
cfg, err := buildInternalCfg(&types.IndexConfig{
HNSWM: 24,
HNSWEFConstruction: 200,
HNSWEFSearch: 128,
KNNEngine: "faiss",
NumberOfShards: 2,
})
if err != nil {
t.Fatal(err)
}
if cfg.hnswM != 24 || cfg.hnswEFConstruction != 200 || cfg.efSearch != 128 || cfg.knnEngine != "faiss" {
t.Errorf("HNSW not wired: %+v", cfg)
}
if cfg.shards != 2 {
t.Errorf("shards: want 2, got %d", cfg.shards)
}
})
t.Run("defaults when unset", func(t *testing.T) {
cfg, err := buildInternalCfg(&types.IndexConfig{})
if err != nil {
t.Fatal(err)
}
if cfg.hnswM != 16 || cfg.hnswEFConstruction != 100 || cfg.efSearch != 100 || cfg.knnEngine != "lucene" {
t.Errorf("defaults not applied: %+v", cfg)
}
})
t.Run("nil config is all defaults", func(t *testing.T) {
cfg, err := buildInternalCfg(nil)
if err != nil {
t.Fatal(err)
}
if cfg.knnEngine != "lucene" || cfg.hnswM != 16 {
t.Errorf("nil defaults wrong: %+v", cfg)
}
})
}
// TestBuildIndexMapping_ReflectsHNSWConfig verifies the end-to-end path
// IndexConfig → buildInternalCfg → buildIndexMapping carries the operator's
// HNSW values into the cluster mapping JSON (regression guard for the wire-
// through, since defaults coincide with common values).
func TestBuildIndexMapping_ReflectsHNSWConfig(t *testing.T) {
cfg, err := buildInternalCfg(&types.IndexConfig{
HNSWM: 24,
HNSWEFConstruction: 200,
HNSWEFSearch: 128,
KNNEngine: "faiss",
})
if err != nil {
t.Fatal(err)
}
body, err := buildIndexMapping(cfg, 768)
if err != nil {
t.Fatal(err)
}
s := string(body)
for _, want := range []string{`"m":24`, `"ef_construction":200`, `"engine":"faiss"`, `"knn.algo_param.ef_search":128`} {
if !strings.Contains(s, want) {
t.Errorf("mapping JSON missing %q\n%s", want, s)
}
}
}
// TestWithAuditSink_SetsAndNilSafe verifies the functional option.
func TestWithAuditSink_SetsAndNilSafe(t *testing.T) {
spy := &spySink{}
r := &Repository{}
WithAuditSink(spy)(r)
if r.sink != spy {
t.Fatal("WithAuditSink did not set the sink")
}
// nil must not clobber an already-set sink
WithAuditSink(nil)(r)
if r.sink != spy {
t.Fatal("WithAuditSink(nil) clobbered the sink")
}
}
// TestAuditSink_NopByDefault verifies a Repository with no sink does not panic
// when the audit accessor is used (nopSink fallback).
func TestAuditSink_NopByDefault(t *testing.T) {
var _ AuditSink = nopSink{} // compile-time assertion
r := &Repository{} // sink left nil
// Must not panic.
r.auditSink().EmitIndexCreated(context.Background(), "weknora_768", 768)
r.auditSink().EmitReindexExecuted(context.Background(), "a", "b", 3)
}
// TestAuditSink_EmitIndexCreated_OnEnsureReady verifies createIndexAndAlias
// emits exactly one index-created event with the per-dim alias when a new
// index is provisioned.
func TestAuditSink_EmitIndexCreated_OnEnsureReady(t *testing.T) {
repo, ts := newTestRepo(t, (&indexLifecycleHandler{}).ServeHTTP)
defer ts.Close()
spy := &spySink{}
repo.sink = spy
if err := repo.ensureReady(context.Background(), 768); err != nil {
t.Fatalf("ensureReady: %v", err)
}
if len(spy.indexCreated) != 1 {
t.Fatalf("want 1 index_created event, got %d", len(spy.indexCreated))
}
got := spy.indexCreated[0]
if got.alias != "weknora_test_768" || got.dim != 768 {
t.Errorf("event mismatch: %+v", got)
}
}

View File

@@ -0,0 +1,107 @@
package opensearch
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"sort"
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
)
// BatchUpdateChunkEnabledStatus flips is_enabled for the given chunks. The
// status map is grouped by value into one _update_by_query per distinct value
// (mirrors the Qdrant grouping pattern), so the request body carries the
// chunk ids via a terms filter and the new value via bound script params —
// never per-chunk string interpolation.
//
// Targets the cross-dim <base>_* pattern: a chunk's embedding dimension is
// not known here, and the same chunk_id is unique across the store's dim
// indices + the keyword-only index.
func (r *Repository) BatchUpdateChunkEnabledStatus(ctx context.Context, chunkStatusMap map[string]bool) error {
if len(chunkStatusMap) == 0 {
return nil
}
groups := map[bool][]string{}
for id, v := range chunkStatusMap {
groups[v] = append(groups[v], id)
}
// Deterministic order (false then true) for predictable behavior/tests.
for _, v := range []bool{false, true} {
ids := groups[v]
if len(ids) == 0 {
continue
}
sort.Strings(ids)
if err := r.updateByQueryScript(ctx, ids,
"ctx._source.is_enabled = params.v", map[string]any{"v": v}); err != nil {
return err
}
}
return nil
}
// BatchUpdateChunkTagID sets tag_id for the given chunks, grouped by tag.
func (r *Repository) BatchUpdateChunkTagID(ctx context.Context, chunkTagMap map[string]string) error {
if len(chunkTagMap) == 0 {
return nil
}
groups := map[string][]string{}
for id, tag := range chunkTagMap {
groups[tag] = append(groups[tag], id)
}
tags := make([]string, 0, len(groups))
for tag := range groups {
tags = append(tags, tag)
}
sort.Strings(tags)
for _, tag := range tags {
ids := groups[tag]
sort.Strings(ids)
if err := r.updateByQueryScript(ctx, ids,
"ctx._source.tag_id = params.v", map[string]any{"v": tag}); err != nil {
return err
}
}
return nil
}
// updateByQueryScript runs an _update_by_query over the cross-dim <base>_*
// pattern, matching the given chunk ids via a terms filter and applying a
// constant Painless source with caller values flowing only through bound
// params (Painless-injection-safe).
func (r *Repository) updateByQueryScript(
ctx context.Context, chunkIDs []string, source string, params map[string]any,
) error {
body, err := json.Marshal(map[string]any{
"query": map[string]any{
"terms": map[string]any{"chunk_id": chunkIDs},
},
"script": map[string]any{
"lang": "painless",
"source": source,
"params": params,
},
})
if err != nil {
return fmt.Errorf("opensearch: marshal update_by_query body: %w", err)
}
// Q2: UpdateByQueryParams.Refresh is *bool — the wire value "wait_for" is
// not expressible via the typed SDK, so we force an immediate refresh.
refresh := true
resp, err := r.client.UpdateByQuery(ctx, osapi.UpdateByQueryReq{
Indices: []string{r.baseIndex + "_*"},
Body: bytes.NewReader(body),
Params: osapi.UpdateByQueryParams{Refresh: &refresh},
})
if err != nil {
return wrapTransport(err)
}
if resp == nil {
return nil
}
defer drainAndClose(resp.Inspect().Response.Body)
return inspectByQueryResponse(io.LimitReader(resp.Inspect().Response.Body, 16<<20))
}

View File

@@ -0,0 +1,51 @@
package opensearch
import (
"context"
"encoding/json"
"fmt"
"io"
"strings"
"github.com/Tencent/WeKnora/internal/logger"
)
// inspectByQueryResponse parses an _update_by_query / _delete_by_query
// response and surfaces partial failures without leaking cluster-side reason
// strings (which may embed document content). Mirrors inspectBulkResponse:
// the full reason goes to the debug log only; the returned error carries the
// bounded id + type. A non-zero version_conflicts count with no hard failures
// is logged as a warning but not treated as an error.
func inspectByQueryResponse(body io.Reader) error {
var r struct {
VersionConflicts int `json:"version_conflicts"`
Failures []struct {
ID string `json:"id"`
Cause struct {
Type string `json:"type"`
Reason string `json:"reason"`
} `json:"cause"`
} `json:"failures"`
}
if err := json.NewDecoder(body).Decode(&r); err != nil {
return fmt.Errorf("opensearch: parse by-query response: %w", ErrTransport)
}
log := logger.GetLogger(context.Background())
if len(r.Failures) == 0 {
if r.VersionConflicts > 0 {
log.Warnf("[OpenSearch] by-query had %d version conflicts (proceeded)", r.VersionConflicts)
}
return nil
}
var msgs []string
for _, f := range r.Failures {
// Full reason → debug log only (may contain document content).
log.Debugf("[OpenSearch] by-query failure: id=%s type=%s reason=%s",
f.ID, f.Cause.Type, f.Cause.Reason)
if len(msgs) < 5 {
msgs = append(msgs, fmt.Sprintf("[%s] %s", f.ID, f.Cause.Type))
}
}
return fmt.Errorf("opensearch: by-query partial failure (%d failed, first 5: %s): %w",
len(r.Failures), strings.Join(msgs, "; "), ErrTransport)
}

View File

@@ -15,16 +15,15 @@ type internalCfg struct {
} }
// buildInternalCfg projects IndexConfig to the driver-internal view, // buildInternalCfg projects IndexConfig to the driver-internal view,
// substituting defaults for unset fields. Validation of value ranges // substituting defaults for unset (zero / empty) fields. Validation of value
// (e.g. hnsw_m / ef_construction caps) is a service-layer concern handled // ranges (e.g. hnsw_m / ef_construction caps) is a service-layer concern
// elsewhere; this function applies defaults only and never rejects. // handled elsewhere (validateOpenSearchIndexConfig at CreateStore); this
// function applies defaults only and never rejects. The env-path bypasses
// service validation entirely, so the defaults below are its safety net.
// //
// OpenSearch-specific overrides (knn_engine, hnsw_m, hnsw_ef_construction, // The OpenSearch-specific HNSW fields (knn_engine, hnsw_m,
// hnsw_ef_search) are intentionally NOT read from IndexConfig here: // hnsw_ef_construction, hnsw_ef_search) are read here. They are omitempty on
// IndexConfig is a schema shared across all drivers, and adding OpenSearch- // IndexConfig, so they do not affect other drivers' serialized config.
// specific fields would surface them in the shared VectorStoreFieldInfo
// form visible to every driver's create UI. Wiring those fields through to
// IndexConfig is a follow-up that lands alongside the activation switch.
func buildInternalCfg(c *types.IndexConfig) (internalCfg, error) { func buildInternalCfg(c *types.IndexConfig) (internalCfg, error) {
cfg := internalCfg{ cfg := internalCfg{
shards: 4, // matches the keyword-index default upstream shards: 4, // matches the keyword-index default upstream
@@ -43,5 +42,17 @@ func buildInternalCfg(c *types.IndexConfig) (internalCfg, error) {
if c.NumberOfReplicas > 0 { if c.NumberOfReplicas > 0 {
cfg.replicas = c.NumberOfReplicas cfg.replicas = c.NumberOfReplicas
} }
if c.KNNEngine != "" {
cfg.knnEngine = c.KNNEngine
}
if c.HNSWM > 0 {
cfg.hnswM = c.HNSWM
}
if c.HNSWEFConstruction > 0 {
cfg.hnswEFConstruction = c.HNSWEFConstruction
}
if c.HNSWEFSearch > 0 {
cfg.efSearch = c.HNSWEFSearch
}
return cfg, nil return cfg, nil
} }

View File

@@ -0,0 +1,194 @@
package opensearch
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"strings"
"github.com/google/uuid"
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
"github.com/Tencent/WeKnora/internal/logger"
"github.com/Tencent/WeKnora/internal/types"
)
// copyBatchSize is the pagination size for the source scan. Kept under the
// BatchSave per-call cap so each copied page is a single bulk request.
const copyBatchSize = 500
// copySourceDoc is the full _source read during CopyIndices — it includes the
// embedding vector and is_recommended, which the retrieve-path hit struct
// omits because retrieval does not need them.
type copySourceDoc struct {
Content string `json:"content"`
SourceID string `json:"source_id"`
SourceType int `json:"source_type"`
ChunkID string `json:"chunk_id"`
KnowledgeID string `json:"knowledge_id"`
KnowledgeBaseID string `json:"knowledge_base_id"`
TagID string `json:"tag_id"`
IsEnabled bool `json:"is_enabled"`
IsRecommended bool `json:"is_recommended"`
Embedding []float32 `json:"embedding"`
}
// transformSourceID mirrors the sibling drivers' source_id remap:
// - regular chunk (source_id == chunk_id) → target chunk id
// - generated question (source_id == "<chunk>-<q>") → "<targetChunk>-<q>"
// - anything else → a fresh uuid
func transformSourceID(sourceID, chunkID, targetChunkID string) string {
switch {
case sourceID == chunkID:
return targetChunkID
case strings.HasPrefix(sourceID, chunkID+"-"):
return targetChunkID + "-" + strings.TrimPrefix(sourceID, chunkID+"-")
default:
return uuid.New().String()
}
}
// CopyIndices copies all docs of one knowledge base into another (within the
// same store) by scanning the source and re-saving via BatchSave — mirroring
// the Elasticsearch / Qdrant drivers (search→BatchSave), which yields the
// source_id transformation and dim/keyword routing for free. Runs
// synchronously and paginates; the large-batch background-task path is a
// later change.
//
// NOTE: from/size pagination is bounded by the index's max_result_window
// (default 10000). Copies larger than that require the scroll-based async
// path (a later change).
func (r *Repository) CopyIndices(
ctx context.Context,
sourceKnowledgeBaseID string,
sourceToTargetKBIDMap map[string]string, // keyed by source knowledge_id (mirrors sibling drivers)
sourceToTargetChunkIDMap map[string]string,
targetKnowledgeBaseID string,
dimension int,
knowledgeType string,
) error {
log := logger.GetLogger(ctx)
if len(sourceToTargetChunkIDMap) == 0 {
log.Warn("[OpenSearch] CopyIndices: empty chunk mapping, skipping")
return nil
}
if dimension <= 0 {
return fmt.Errorf("opensearch: CopyIndices requires dim > 0, got %d: %w",
dimension, ErrDimensionMismatch)
}
if err := r.ensureReady(ctx, dimension); err != nil {
return err
}
alias := r.indexAlias(dimension)
var total int64
for from := 0; ; from += copyBatchSize {
docs, err := r.copyScanBatch(ctx, alias, sourceKnowledgeBaseID, from, copyBatchSize)
if err != nil {
return err
}
if len(docs) == 0 {
break
}
infos := make([]*types.IndexInfo, 0, len(docs))
embMap := make(map[string][]float32, len(docs))
enabledMap := make(map[string]bool, len(docs))
for i := range docs {
d := &docs[i]
targetChunkID, ok := sourceToTargetChunkIDMap[d.ChunkID]
if !ok {
log.Warnf("[OpenSearch] CopyIndices: source chunk %s not mapped, skipping", d.ChunkID)
continue
}
targetKnowledgeID, ok := sourceToTargetKBIDMap[d.KnowledgeID]
if !ok {
log.Warnf("[OpenSearch] CopyIndices: source knowledge %s not mapped, skipping", d.KnowledgeID)
continue
}
targetSourceID := transformSourceID(d.SourceID, d.ChunkID, targetChunkID)
if len(d.Embedding) > 0 {
// BatchSave looks up embeddings by SourceID (lookupEmbedding),
// so key by the target source id — not the chunk id, which is
// the Elasticsearch driver's convention.
embMap[targetSourceID] = d.Embedding
}
enabledMap[targetChunkID] = d.IsEnabled
infos = append(infos, &types.IndexInfo{
Content: d.Content,
SourceID: targetSourceID,
SourceType: types.SourceType(d.SourceType),
ChunkID: targetChunkID,
KnowledgeID: targetKnowledgeID,
KnowledgeBaseID: targetKnowledgeBaseID,
KnowledgeType: knowledgeType,
TagID: d.TagID,
IsEnabled: d.IsEnabled,
IsRecommended: d.IsRecommended,
})
}
if len(infos) > 0 {
params := map[string]any{
"embedding": embMap,
"chunk_enabled": enabledMap,
}
if err := r.BatchSave(ctx, infos, params); err != nil {
return fmt.Errorf("opensearch: CopyIndices batch save: %w", err)
}
total += int64(len(infos))
}
if len(docs) < copyBatchSize {
break
}
}
log.Infof("[OpenSearch] CopyIndices: copied %d docs (KB %s → %s, dim=%d)",
total, sourceKnowledgeBaseID, targetKnowledgeBaseID, dimension)
r.auditSink().EmitReindexExecuted(ctx, alias, alias, total)
return nil
}
// copyScanBatch reads one page of docs belonging to sourceKB from the per-dim
// index, decoding the full _source (including the embedding vector).
func (r *Repository) copyScanBatch(
ctx context.Context, index, sourceKB string, from, size int,
) ([]copySourceDoc, error) {
body, err := json.Marshal(map[string]any{
"from": from,
"size": size,
"query": map[string]any{
"bool": map[string]any{
"filter": []any{
map[string]any{"term": map[string]any{"knowledge_base_id": sourceKB}},
},
},
},
})
if err != nil {
return nil, fmt.Errorf("opensearch: marshal copy scan body: %w", err)
}
req := osapi.SearchReq{Indices: []string{index}, Body: bytes.NewReader(body)}
resp, err := r.client.Search(ctx, &req)
if err != nil {
if isNotFound(err) {
return nil, fmt.Errorf("opensearch: index %s missing: %w", index, ErrIndexNotFound)
}
return nil, wrapTransport(err)
}
defer drainAndClose(resp.Inspect().Response.Body)
var parsed struct {
Hits struct {
Hits []struct {
Source copySourceDoc `json:"_source"`
} `json:"hits"`
} `json:"hits"`
}
if err := json.NewDecoder(io.LimitReader(resp.Inspect().Response.Body, 64<<20)).Decode(&parsed); err != nil {
return nil, fmt.Errorf("opensearch: parse copy scan response: %w", ErrTransport)
}
out := make([]copySourceDoc, len(parsed.Hits.Hits))
for i, h := range parsed.Hits.Hits {
out[i] = h.Source
}
return out, nil
}

View File

@@ -0,0 +1,193 @@
package opensearch
import (
"context"
"io"
"net/http"
"strings"
"sync"
"testing"
)
// TestTransformSourceID covers the generated-question / regular / fallback
// branches that CopyIndices uses to remap source_id.
func TestTransformSourceID(t *testing.T) {
t.Run("regular chunk uses target chunk id", func(t *testing.T) {
if got := transformSourceID("chunk1", "chunk1", "tgt1"); got != "tgt1" {
t.Errorf("want tgt1, got %s", got)
}
})
t.Run("generated question preserves question id", func(t *testing.T) {
if got := transformSourceID("chunk1-q7", "chunk1", "tgt1"); got != "tgt1-q7" {
t.Errorf("want tgt1-q7, got %s", got)
}
})
t.Run("unrelated source id gets fresh uuid", func(t *testing.T) {
got := transformSourceID("totally-different", "chunk1", "tgt1")
if got == "totally-different" || got == "tgt1" || len(got) != 36 {
t.Errorf("want fresh uuid, got %q", got)
}
})
}
// TestCopyIndices_EmptyMapping_NoOp verifies an empty chunk map short-circuits
// before any HTTP call.
func TestCopyIndices_EmptyMapping_NoOp(t *testing.T) {
repo, ts := newTestRepo(t, func(w http.ResponseWriter, r *http.Request) {
t.Errorf("unexpected HTTP call: %s %s", r.Method, r.URL.Path)
})
defer ts.Close()
err := repo.CopyIndices(context.Background(), "kbSrc", map[string]string{}, map[string]string{}, "kbDst", 768, "manual")
if err != nil {
t.Fatalf("want nil, got %v", err)
}
}
// TestCopyIndices_ScanThenBatchSave verifies the search→BatchSave path:
// remaps IDs, keys the embedding by the *target source id* (OpenSearch
// BatchSave's lookup key), and emits one reindex audit event.
func TestCopyIndices_ScanThenBatchSave(t *testing.T) {
var (
mu sync.Mutex
bulkBody string
searchCnt int
)
handler := func(w http.ResponseWriter, r *http.Request) {
switch {
case r.Method == http.MethodHead:
w.WriteHeader(http.StatusOK) // alias exists → ensureReady no-op
case strings.Contains(r.URL.Path, "_search"):
mu.Lock()
searchCnt++
first := searchCnt == 1
mu.Unlock()
if first {
_, _ = w.Write([]byte(`{"hits":{"hits":[
{"_source":{"content":"c","source_id":"srcChunk","source_type":1,"chunk_id":"srcChunk","knowledge_id":"srcKnow","knowledge_base_id":"kbSrc","tag_id":"t","is_enabled":true,"is_recommended":false,"embedding":[0.1,0.2,0.3]}}
]}}`))
} else {
_, _ = w.Write([]byte(`{"hits":{"hits":[]}}`))
}
case strings.HasSuffix(r.URL.Path, "/_bulk"):
b, _ := io.ReadAll(r.Body)
mu.Lock()
bulkBody = string(b)
mu.Unlock()
_, _ = w.Write([]byte(`{"errors":false,"items":[]}`))
default:
_, _ = w.Write([]byte(`{}`))
}
}
repo, ts := newTestRepo(t, handler)
defer ts.Close()
spy := &spySink{}
repo.sink = spy
err := repo.CopyIndices(context.Background(), "kbSrc",
map[string]string{"srcKnow": "tgtKnow"}, // knowledge_id remap (sourceToTargetKBIDMap is keyed by knowledge_id, mirroring ES)
map[string]string{"srcChunk": "tgtChunk"},
"kbDst", 768, "manual")
if err != nil {
t.Fatalf("CopyIndices: %v", err)
}
mu.Lock()
defer mu.Unlock()
if bulkBody == "" {
t.Fatal("no bulk request captured")
}
// Target IDs present, source IDs gone from the written doc.
for _, want := range []string{`"chunk_id":"tgtChunk"`, `"knowledge_id":"tgtKnow"`, `"knowledge_base_id":"kbDst"`, `"source_id":"tgtChunk"`} {
if !strings.Contains(bulkBody, want) {
t.Errorf("bulk body missing %q\n%s", want, bulkBody)
}
}
if strings.Contains(bulkBody, `"knowledge_base_id":"kbSrc"`) {
t.Errorf("bulk body leaked source KB id\n%s", bulkBody)
}
// Embedding written (keyed by target source id internally).
if !strings.Contains(bulkBody, "0.1") {
t.Errorf("embedding not written\n%s", bulkBody)
}
if len(spy.reindex) != 1 || spy.reindex[0].docs != 1 {
t.Errorf("want 1 reindex event with docs=1, got %+v", spy.reindex)
}
}
// TestBatchUpdateChunkEnabledStatus_GroupedUpdateByQuery verifies the status
// map is grouped by value into one _update_by_query per distinct value, each
// passing chunk ids via terms + the new value via bound script params.
func TestBatchUpdateChunkEnabledStatus_GroupedUpdateByQuery(t *testing.T) {
var (
mu sync.Mutex
bodies []string
)
handler := func(w http.ResponseWriter, r *http.Request) {
if strings.Contains(r.URL.Path, "_update_by_query") {
b, _ := io.ReadAll(r.Body)
mu.Lock()
bodies = append(bodies, string(b))
mu.Unlock()
_, _ = w.Write([]byte(`{"updated":1,"version_conflicts":0,"failures":[]}`))
return
}
_, _ = w.Write([]byte(`{}`))
}
repo, ts := newTestRepo(t, handler)
defer ts.Close()
err := repo.BatchUpdateChunkEnabledStatus(context.Background(), map[string]bool{
"c1": true, "c2": false, "c3": true,
})
if err != nil {
t.Fatalf("BatchUpdateChunkEnabledStatus: %v", err)
}
mu.Lock()
defer mu.Unlock()
if len(bodies) != 2 {
t.Fatalf("want 2 grouped update_by_query calls (true/false), got %d: %v", len(bodies), bodies)
}
joined := strings.Join(bodies, "\n")
for _, want := range []string{"c1", "c2", "c3", "is_enabled", "params"} {
if !strings.Contains(joined, want) {
t.Errorf("update_by_query bodies missing %q\n%s", want, joined)
}
}
}
func TestBatchUpdateChunkEnabledStatus_Empty_NoOp(t *testing.T) {
repo, ts := newTestRepo(t, func(w http.ResponseWriter, r *http.Request) {
t.Errorf("unexpected HTTP call: %s %s", r.Method, r.URL.Path)
})
defer ts.Close()
if err := repo.BatchUpdateChunkEnabledStatus(context.Background(), map[string]bool{}); err != nil {
t.Fatalf("want nil, got %v", err)
}
}
// TestInspectByQueryResponse covers the success path and the failure path,
// asserting cluster-side reason text is NOT surfaced in the returned error.
func TestInspectByQueryResponse(t *testing.T) {
t.Run("clean response", func(t *testing.T) {
body := strings.NewReader(`{"updated":5,"version_conflicts":0,"failures":[]}`)
if err := inspectByQueryResponse(body); err != nil {
t.Fatalf("want nil, got %v", err)
}
})
t.Run("failures do not leak reason", func(t *testing.T) {
body := strings.NewReader(`{"updated":1,"version_conflicts":0,"failures":[
{"id":"c9","cause":{"type":"version_conflict_engine_exception","reason":"SECRET document body leaked here"}}
]}`)
err := inspectByQueryResponse(body)
if err == nil {
t.Fatal("want error for failures, got nil")
}
if strings.Contains(err.Error(), "SECRET") {
t.Errorf("error leaked cluster reason: %v", err)
}
if !strings.Contains(err.Error(), "version_conflict_engine_exception") {
t.Errorf("error should surface bounded type: %v", err)
}
})
}

View File

@@ -0,0 +1,23 @@
package opensearch
import (
"context"
"github.com/Tencent/WeKnora/internal/types"
)
// TestConnection verifies an OpenSearch cluster is reachable, runs a
// supported version, and has the k-NN plugin installed on every node. It is
// the connectivity probe used by the VectorStore service's CreateStore
// health-check (the driver's unexported probes are reused here). Returns a
// wrapped sentinel error on failure; nil on success.
func TestConnection(ctx context.Context, cfg *types.ConnectionConfig) error {
client, err := NewOpenSearchClient(cfg)
if err != nil {
return err
}
if err := probeVersion(ctx, client); err != nil {
return err
}
return probeKNNPlugin(ctx, client)
}

View File

@@ -0,0 +1,50 @@
package opensearch
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
"github.com/Tencent/WeKnora/internal/types"
)
// clusterHandler serves both GET / (version info) and /_cat/plugins, so the
// full TestConnection probe (version + k-NN plugin) can run end to end.
func clusterHandler(distribution, number string, plugins []osapi.CatPluginResp) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path == "/_cat/plugins" {
_ = json.NewEncoder(w).Encode(plugins)
return
}
_, _ = w.Write([]byte(`{"version":{"distribution":"` + distribution + `","number":"` + number + `"}}`))
}
}
func TestTestConnection_Success(t *testing.T) {
ts := httptest.NewServer(clusterHandler("opensearch", "3.3.2", []osapi.CatPluginResp{
{Name: "node-1", Component: "opensearch-knn"},
}))
defer ts.Close()
if err := TestConnection(context.Background(), &types.ConnectionConfig{Addr: ts.URL}); err != nil {
t.Errorf("healthy cluster: want nil, got %v", err)
}
}
func TestTestConnection_RejectsElasticsearch(t *testing.T) {
ts := httptest.NewServer(clusterHandler("elasticsearch", "8.10.4", nil))
defer ts.Close()
if err := TestConnection(context.Background(), &types.ConnectionConfig{Addr: ts.URL}); err == nil {
t.Error("elasticsearch cluster should be rejected")
}
}
func TestTestConnection_EmptyAddr(t *testing.T) {
if err := TestConnection(context.Background(), &types.ConnectionConfig{}); err == nil {
t.Error("empty addr should be rejected")
}
}

View File

@@ -77,6 +77,11 @@ func (r *Repository) createIndexAndAlias(ctx context.Context, dim int) error {
} }
return fmt.Errorf("put alias %s → %s: %w", alias, realIndex, err) return fmt.Errorf("put alias %s → %s: %w", alias, realIndex, err)
} }
if indexCreated {
// Emit only when we actually provisioned the index (not when a
// concurrent writer / existing index short-circuited above).
r.auditSink().EmitIndexCreated(ctx, alias, dim)
}
return nil return nil
} }
@@ -217,6 +222,7 @@ func (r *Repository) ensureKeywordsIndex(ctx context.Context) error {
r.keywordsErr = err r.keywordsErr = err
return err return err
} }
created := false
if err := r.indicesCreate(ctx, name, body); err != nil { if err := r.indicesCreate(ctx, name, body); err != nil {
if !isAlreadyExistsError(err) { if !isAlreadyExistsError(err) {
r.keywordsErr = err r.keywordsErr = err
@@ -224,9 +230,15 @@ func (r *Repository) ensureKeywordsIndex(ctx context.Context) error {
} }
// resource_already_exists_exception — race with concurrent process, // resource_already_exists_exception — race with concurrent process,
// treat as success. // treat as success.
} else {
created = true
} }
r.keywordsReady = true r.keywordsReady = true
r.keywordsErr = nil r.keywordsErr = nil
if created {
// dim=0 marks the dim-less keyword-only index.
r.auditSink().EmitIndexCreated(ctx, name, 0)
}
return nil return nil
} }

View File

@@ -64,6 +64,11 @@ type Repository struct {
keywordsMu sync.Mutex keywordsMu sync.Mutex
keywordsReady bool keywordsReady bool
keywordsErr error keywordsErr error
// sink receives audit events (index created / reindex executed). nil
// means no auditing; use r.auditSink() to get a non-nil sink. Set via
// WithAuditSink at construction.
sink AuditSink
} }
// Compile-time interface satisfaction (Go best practice — keeps the build // Compile-time interface satisfaction (Go best practice — keeps the build
@@ -87,6 +92,10 @@ var _ interfaces.RetrieveEngineRepository = (*Repository)(nil)
// indexCfg is optional — pass nil to use env var (OPENSEARCH_INDEX) or // indexCfg is optional — pass nil to use env var (OPENSEARCH_INDEX) or
// default ("weknora") values. // default ("weknora") values.
// //
// Optional behavior is configured via functional options (e.g.
// WithAuditSink). Passing no options keeps audit emission as a no-op, so the
// env-path and tests need no extra wiring.
//
// Returns a typed sentinel error wrapped with %w; callers translate to // Returns a typed sentinel error wrapped with %w; callers translate to
// AppError at the engine-factory boundary. // AppError at the engine-factory boundary.
func NewRepository( func NewRepository(
@@ -94,6 +103,7 @@ func NewRepository(
client *osapi.Client, client *osapi.Client,
storeID string, storeID string,
indexCfg *types.IndexConfig, indexCfg *types.IndexConfig,
opts ...Option,
) (interfaces.RetrieveEngineRepository, error) { ) (interfaces.RetrieveEngineRepository, error) {
log := logger.GetLogger(ctx) log := logger.GetLogger(ctx)
@@ -135,6 +145,9 @@ func NewRepository(
once: make(map[int]*sync.Once), once: make(map[int]*sync.Once),
initErr: make(map[int]error), initErr: make(map[int]error),
} }
for _, opt := range opts {
opt(r)
}
log.Infof("[OpenSearch] repository ready (baseIndex=%s, knn_engine=%s, hnsw_m=%d)", log.Infof("[OpenSearch] repository ready (baseIndex=%s, knn_engine=%s, hnsw_m=%d)",
base, icfg.knnEngine, icfg.hnswM) base, icfg.knnEngine, icfg.hnswM)
return r, nil return r, nil

View File

@@ -1123,34 +1123,14 @@ func TestNewRepository_AcceptsLongStoreID(t *testing.T) {
} }
// ============================================================================ // ============================================================================
// Stub coverage — every stub returns the not-enabled sentinel // Stub coverage — remaining stubs return the not-enabled sentinel
//
// CopyIndices / BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID are now
// implemented (see copy_bulk_test.go for their behavioral tests); their stub
// assertions were removed. EstimateStorageSize keeps its conservative
// lower-bound until the real _stats-based implementation lands.
// ============================================================================ // ============================================================================
func TestStub_CopyIndices_ReturnsFeatureNotEnabled(t *testing.T) {
t.Parallel()
r := &Repository{}
err := r.CopyIndices(context.Background(), "kb1", nil, nil, "kb2", 768, "")
if !errors.Is(err, ErrFeatureNotEnabled) {
t.Errorf("CopyIndices: want ErrFeatureNotEnabled, got %v", err)
}
}
func TestStub_BatchUpdateChunkEnabledStatus_ReturnsFeatureNotEnabled(t *testing.T) {
t.Parallel()
r := &Repository{}
if err := r.BatchUpdateChunkEnabledStatus(context.Background(), nil); !errors.Is(err, ErrFeatureNotEnabled) {
t.Errorf("want ErrFeatureNotEnabled, got %v", err)
}
}
func TestStub_BatchUpdateChunkTagID_ReturnsFeatureNotEnabled(t *testing.T) {
t.Parallel()
r := &Repository{}
if err := r.BatchUpdateChunkTagID(context.Background(), nil); !errors.Is(err, ErrFeatureNotEnabled) {
t.Errorf("want ErrFeatureNotEnabled, got %v", err)
}
}
func TestStub_EstimateStorageSize_EmptyZero_NonEmptyPositive(t *testing.T) { func TestStub_EstimateStorageSize_EmptyZero_NonEmptyPositive(t *testing.T) {
t.Parallel() t.Parallel()
r := &Repository{} r := &Repository{}

View File

@@ -6,43 +6,12 @@ import (
"github.com/Tencent/WeKnora/internal/types" "github.com/Tencent/WeKnora/internal/types"
) )
// This file holds the remaining stubs for methods whose real // This file holds the remaining stubs for methods whose real implementation
// implementation has not landed yet — the async / batch paths and the // has not landed yet — the rolling-reindex swap (swapToVersion) and the
// rolling-reindex swap. Each stub returns ErrFeatureNotEnabled (or, for // precise storage estimate. CopyIndices (copy.go), BatchUpdateChunkEnabledStatus
// EstimateStorageSize, a conservative lower-bound) so any accidental // and BatchUpdateChunkTagID (bulk_update.go) are now implemented. Each
// invocation surfaces loudly. The driver as a whole is still gated dead // remaining stub returns ErrFeatureNotEnabled (or, for EstimateStorageSize, a
// code (no registry / factory / env path mentions it); these stubs // conservative lower-bound) so any accidental invocation surfaces loudly.
// disappear when their behaviours arrive in follow-up commits.
// CopyIndices: the async _reindex path with task polling for >10K-doc
// batches arrives in a later change.
func (r *Repository) CopyIndices(
_ context.Context,
_ string, // sourceKnowledgeBaseID
_ map[string]string, // sourceToTargetKBIDMap
_ map[string]string, // sourceToTargetChunkIDMap
_ string, // targetKnowledgeBaseID
_ int, // dimension
_ string, // knowledgeType
) error {
return ErrFeatureNotEnabled
}
// BatchUpdateChunkEnabledStatus: the _update_by_query path arrives in a
// later change.
func (r *Repository) BatchUpdateChunkEnabledStatus(
_ context.Context, _ map[string]bool,
) error {
return ErrFeatureNotEnabled
}
// BatchUpdateChunkTagID: the _update_by_query path arrives in a later
// change.
func (r *Repository) BatchUpdateChunkTagID(
_ context.Context, _ map[string]string,
) error {
return ErrFeatureNotEnabled
}
// EstimateStorageSize: the real implementation that reads cluster // EstimateStorageSize: the real implementation that reads cluster
// `_stats` for the per-dim alias arrives in a later change. For now we // `_stats` for the per-dim alias arrives in a later change. For now we

View File

@@ -12,12 +12,10 @@ import (
"github.com/Tencent/WeKnora/internal/types" "github.com/Tencent/WeKnora/internal/types"
) )
// newOpenSearchClient builds a TLS-hardened, pool-tuned *osapi.Client for // NewOpenSearchClient builds a TLS-hardened, pool-tuned *osapi.Client for
// the OpenSearch driver. The caller wires it into the registry from the // the OpenSearch driver. The caller wires it into the registry from the
// env path (container) and the DB-store path (engine factory); the // env path (container) and the DB-store path (engine factory); the
// Repository constructor itself receives the pre-built client. While the // Repository constructor itself receives the pre-built client.
// driver is still gated dead code, no code path here is reachable from
// production — the activation switch lands in a later change.
// //
// TLS posture: // TLS posture:
// - MinVersion: TLS 1.2 (TLS 1.3 negotiated when both ends support). // - MinVersion: TLS 1.2 (TLS 1.3 negotiated when both ends support).
@@ -33,7 +31,7 @@ import (
// - IdleConnTimeout: 90s (typical LB keep-alive) // - IdleConnTimeout: 90s (typical LB keep-alive)
// - ResponseHeaderTimeout: 30s (per-request safety net) // - ResponseHeaderTimeout: 30s (per-request safety net)
// - ExpectContinueTimeout: 1s // - ExpectContinueTimeout: 1s
func newOpenSearchClient(cfg *types.ConnectionConfig) (*osapi.Client, error) { func NewOpenSearchClient(cfg *types.ConnectionConfig) (*osapi.Client, error) {
if cfg == nil || cfg.Addr == "" { if cfg == nil || cfg.Addr == "" {
return nil, fmt.Errorf("opensearch: ConnectionConfig.Addr required: %w", ErrConfigInvalid) return nil, fmt.Errorf("opensearch: ConnectionConfig.Addr required: %w", ErrConfigInvalid)
} }

View File

@@ -14,11 +14,11 @@ import (
// transport error later. // transport error later.
func TestNewOpenSearchClient_RejectsEmptyAddr(t *testing.T) { func TestNewOpenSearchClient_RejectsEmptyAddr(t *testing.T) {
t.Parallel() t.Parallel()
_, err := newOpenSearchClient(&types.ConnectionConfig{Addr: ""}) _, err := NewOpenSearchClient(&types.ConnectionConfig{Addr: ""})
if !errors.Is(err, ErrConfigInvalid) { if !errors.Is(err, ErrConfigInvalid) {
t.Fatalf("empty addr: want ErrConfigInvalid, got %v", err) t.Fatalf("empty addr: want ErrConfigInvalid, got %v", err)
} }
_, err = newOpenSearchClient(nil) _, err = NewOpenSearchClient(nil)
if !errors.Is(err, ErrConfigInvalid) { if !errors.Is(err, ErrConfigInvalid) {
t.Fatalf("nil cfg: want ErrConfigInvalid, got %v", err) t.Fatalf("nil cfg: want ErrConfigInvalid, got %v", err)
} }
@@ -30,13 +30,13 @@ func TestNewOpenSearchClient_RejectsEmptyAddr(t *testing.T) {
// returns successfully. // returns successfully.
func TestNewOpenSearchClient_Succeeds_OnValidAddr(t *testing.T) { func TestNewOpenSearchClient_Succeeds_OnValidAddr(t *testing.T) {
t.Parallel() t.Parallel()
client, err := newOpenSearchClient(&types.ConnectionConfig{ client, err := NewOpenSearchClient(&types.ConnectionConfig{
Addr: "https://opensearch.example.com:9200", Addr: "https://opensearch.example.com:9200",
Username: "admin", Username: "admin",
Password: "secret", // not a real password — wire-format only Password: "secret", // not a real password — wire-format only
}) })
if err != nil { if err != nil {
t.Fatalf("newOpenSearchClient: %v", err) t.Fatalf("NewOpenSearchClient: %v", err)
} }
if client == nil { if client == nil {
t.Fatal("client must be non-nil on success") t.Fatal("client must be non-nil on success")

View File

@@ -67,6 +67,14 @@ func (s *vectorStoreService) CreateStore(ctx context.Context, store *types.Vecto
return err return err
} }
// 2.6. Engine-specific index config validation (OpenSearch HNSW bounds).
// Create-only: UpdateStore mutates just the name, so this is not re-run there.
if store.EngineType == types.OpenSearchRetrieverEngineType {
if err := validateOpenSearchIndexConfig(store.IndexConfig); err != nil {
return err
}
}
// 3. Duplicate check — DB stores // 3. Duplicate check — DB stores
endpoint := store.ConnectionConfig.GetEndpoint() endpoint := store.ConnectionConfig.GetEndpoint()
indexName := store.IndexConfig.GetIndexNameOrDefault(store.EngineType) indexName := store.IndexConfig.GetIndexNameOrDefault(store.EngineType)
@@ -452,8 +460,51 @@ func validateConnectionConfig(engineType types.RetrieverEngineType, config types
if config.Database == "" { if config.Database == "" {
return errors.NewValidationError("database is required for doris") return errors.NewValidationError("database is required for doris")
} }
case types.OpenSearchRetrieverEngineType:
if config.Addr == "" {
return errors.NewValidationError("addr is required for opensearch")
}
case types.SQLiteRetrieverEngineType: case types.SQLiteRetrieverEngineType:
// No connection config needed for SQLite // No connection config needed for SQLite
} }
return nil return nil
} }
// openSearch HNSW bound constants. Shards / replicas are NOT validated here —
// the flat types.ValidateIndexConfig already enforces those caps for every
// engine. These caps mirror the GetVectorStoreTypes Min/Max so the UI and
// backend agree. A zero / empty field means "use the driver default" and is
// always accepted.
const (
osHNSWMMin = 2
osHNSWMMax = 100
osHNSWEFConstructionMin = 2
osHNSWEFConstructionMax = 4096
osHNSWEFSearchMin = 1
osHNSWEFSearchMax = 10000
)
// validateOpenSearchIndexConfig validates the OpenSearch-specific HNSW fields.
// Called from CreateStore only (the store is create-only; UpdateStore mutates
// just the name). Unset fields (zero / empty) fall back to driver defaults and
// are accepted.
func validateOpenSearchIndexConfig(ic types.IndexConfig) error {
if ic.HNSWM != 0 && (ic.HNSWM < osHNSWMMin || ic.HNSWM > osHNSWMMax) {
return errors.NewValidationError(
fmt.Sprintf("hnsw_m must be between %d and %d", osHNSWMMin, osHNSWMMax))
}
if ic.HNSWEFConstruction != 0 &&
(ic.HNSWEFConstruction < osHNSWEFConstructionMin || ic.HNSWEFConstruction > osHNSWEFConstructionMax) {
return errors.NewValidationError(
fmt.Sprintf("hnsw_ef_construction must be between %d and %d", osHNSWEFConstructionMin, osHNSWEFConstructionMax))
}
if ic.HNSWEFSearch != 0 &&
(ic.HNSWEFSearch < osHNSWEFSearchMin || ic.HNSWEFSearch > osHNSWEFSearchMax) {
return errors.NewValidationError(
fmt.Sprintf("hnsw_ef_search must be between %d and %d", osHNSWEFSearchMin, osHNSWEFSearchMax))
}
if ic.KNNEngine != "" && ic.KNNEngine != "lucene" && ic.KNNEngine != "faiss" {
return errors.NewValidationError(`knn_engine must be "lucene" or "faiss"`)
}
return nil
}

View File

@@ -11,6 +11,7 @@ import (
"strings" "strings"
"time" "time"
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
"github.com/Tencent/WeKnora/internal/errors" "github.com/Tencent/WeKnora/internal/errors"
"github.com/Tencent/WeKnora/internal/logger" "github.com/Tencent/WeKnora/internal/logger"
"github.com/Tencent/WeKnora/internal/types" "github.com/Tencent/WeKnora/internal/types"
@@ -47,6 +48,8 @@ func (s *vectorStoreService) TestConnection(
return testWeaviateConnection(ctx, config) return testWeaviateConnection(ctx, config)
case types.DorisRetrieverEngineType: case types.DorisRetrieverEngineType:
return testDorisConnection(ctx, config) return testDorisConnection(ctx, config)
case types.OpenSearchRetrieverEngineType:
return testOpenSearchConnection(ctx, config)
case types.SQLiteRetrieverEngineType: case types.SQLiteRetrieverEngineType:
// SQLite is file-based, no remote connection to test // SQLite is file-based, no remote connection to test
return "", nil return "", nil
@@ -305,3 +308,23 @@ func testDorisConnection(ctx context.Context, config types.ConnectionConfig) (st
} }
return version, nil return version, nil
} }
// testOpenSearchConnection verifies the cluster is reachable, runs a
// supported OpenSearch version, and has the k-NN plugin installed. The driver
// owns the probe logic; a generic message is returned on failure so cluster
// internals are not surfaced to the API caller.
func testOpenSearchConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
if config.Addr == "" {
return "", errors.NewBadRequestError("failed to create opensearch connection: addr is required")
}
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
defer cancel()
if err := openSearchRepo.TestConnection(testCtx, &config); err != nil {
logger.Warnf(ctx, "OpenSearch connection test failed: %v", err)
return "", errors.NewBadRequestError(
"failed to connect to opensearch: check address, credentials, version (>= 2.4), and that the k-NN plugin is installed")
}
// Version is detected during the probe but not surfaced here; lazy index
// creation re-validates on first use.
return "", nil
}

View File

@@ -0,0 +1,47 @@
package service
import (
"testing"
"github.com/Tencent/WeKnora/internal/types"
)
func TestValidateConnectionConfig_OpenSearch_RequiresAddr(t *testing.T) {
if err := validateConnectionConfig(types.OpenSearchRetrieverEngineType, types.ConnectionConfig{}); err == nil {
t.Error("empty addr should be rejected for opensearch")
}
if err := validateConnectionConfig(types.OpenSearchRetrieverEngineType,
types.ConnectionConfig{Addr: "https://os:9200"}); err != nil {
t.Errorf("valid addr should pass: %v", err)
}
}
func TestValidateOpenSearchIndexConfig_BoundaryMatrix(t *testing.T) {
tests := []struct {
name string
ic types.IndexConfig
wantErr bool
}{
{"all unset (defaults)", types.IndexConfig{}, false},
{"valid mid-range", types.IndexConfig{HNSWM: 16, HNSWEFConstruction: 100, HNSWEFSearch: 100, KNNEngine: "lucene"}, false},
{"valid faiss", types.IndexConfig{KNNEngine: "faiss"}, false},
{"valid boundaries", types.IndexConfig{HNSWM: 2, HNSWEFConstruction: 2, HNSWEFSearch: 1}, false},
{"valid upper boundaries", types.IndexConfig{HNSWM: 100, HNSWEFConstruction: 4096, HNSWEFSearch: 10000}, false},
{"hnsw_m too low", types.IndexConfig{HNSWM: 1}, true},
{"hnsw_m too high", types.IndexConfig{HNSWM: 101}, true},
{"ef_construction too high", types.IndexConfig{HNSWEFConstruction: 4097}, true},
{"ef_search too high", types.IndexConfig{HNSWEFSearch: 10001}, true},
{"invalid engine", types.IndexConfig{KNNEngine: "nmslib"}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := validateOpenSearchIndexConfig(tt.ic)
if tt.wantErr && err == nil {
t.Errorf("want error, got nil")
}
if !tt.wantErr && err != nil {
t.Errorf("want nil, got %v", err)
}
})
}
}

View File

@@ -0,0 +1,66 @@
package container
import (
"context"
"encoding/json"
"github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
"github.com/Tencent/WeKnora/internal/logger"
"github.com/Tencent/WeKnora/internal/types"
"github.com/Tencent/WeKnora/internal/types/interfaces"
)
// auditSinkAdapter bridges the OpenSearch driver's AuditSink (which the driver
// owns so it imports no service package) to the service-layer AuditLogService.
// This keeps the dependency one-way: the driver depends only on its own
// AuditSink abstraction; the container implements it.
type auditSinkAdapter struct {
svc interfaces.AuditLogService
}
// newAuditSinkAdapter returns an opensearch.AuditSink backed by svc. A nil svc
// yields a sink whose emits are no-ops.
func newAuditSinkAdapter(svc interfaces.AuditLogService) opensearch.AuditSink {
return auditSinkAdapter{svc: svc}
}
func (a auditSinkAdapter) EmitIndexCreated(ctx context.Context, alias string, dim int) {
a.emit(ctx, types.AuditActionOpenSearchIndexCreated, alias,
map[string]any{"alias": alias, "dim": dim})
}
func (a auditSinkAdapter) EmitReindexExecuted(ctx context.Context, srcAlias, dstAlias string, docs int64) {
a.emit(ctx, types.AuditActionOpenSearchReindexExecuted, dstAlias,
map[string]any{"src_alias": srcAlias, "dst_alias": dstAlias, "docs": docs})
}
// emit writes one audit entry. It skips (with a warning) when the context
// carries no tenant — driver events can fire from background task contexts
// (e.g. lazy index creation under an async copy task), and writing tenant_id=0
// would collide with the system-scope sentinel and corrupt the audit trail.
func (a auditSinkAdapter) emit(ctx context.Context, action types.AuditAction, target string, detail map[string]any) {
if a.svc == nil {
return
}
tid, ok := types.TenantIDFromContext(ctx)
if !ok {
logger.GetLogger(ctx).Warnf("[audit] %s: no tenant in context, skipping audit (target=%s)", action, target)
return
}
// Details is a typed JSON blob — only bounded, non-secret fields. Never
// include cluster reason strings or connection secrets.
b, err := json.Marshal(detail)
if err != nil {
logger.GetLogger(ctx).Warnf("[audit] %s: marshal details failed: %v", action, err)
b = []byte("{}")
}
if err := a.svc.Log(ctx, &types.AuditLog{
TenantID: tid,
Action: action,
TargetType: "opensearch_index",
TargetID: target,
Details: types.JSON(b),
}); err != nil {
logger.GetLogger(ctx).Warnf("[audit] %s emit failed: %v", action, err)
}
}

View File

@@ -0,0 +1,74 @@
package container
import (
"context"
"testing"
"github.com/gin-gonic/gin"
"github.com/Tencent/WeKnora/internal/types"
"github.com/Tencent/WeKnora/internal/types/interfaces"
)
// fakeAuditSvc records Log calls.
type fakeAuditSvc struct {
logged []*types.AuditLog
err error
}
func (f *fakeAuditSvc) Log(_ context.Context, e *types.AuditLog) error {
f.logged = append(f.logged, e)
return f.err
}
func (f *fakeAuditSvc) LogDenied(context.Context, *gin.Context, uint64, string, string, types.TenantRole) error {
return nil
}
func (f *fakeAuditSvc) List(context.Context, uint64, *interfaces.AuditLogQuery) ([]*types.AuditLog, error) {
return nil, nil
}
func (f *fakeAuditSvc) Purge(context.Context, int) (int64, error) { return 0, nil }
func ctxWithTenant(id uint64) context.Context {
return context.WithValue(context.Background(), types.TenantIDContextKey, id)
}
func TestAuditSinkAdapter_EmitsWithTenant(t *testing.T) {
f := &fakeAuditSvc{}
sink := newAuditSinkAdapter(f)
sink.EmitIndexCreated(ctxWithTenant(42), "weknora_768", 768)
if len(f.logged) != 1 {
t.Fatalf("want 1 audit entry, got %d", len(f.logged))
}
e := f.logged[0]
if e.TenantID != 42 {
t.Errorf("tenant: want 42, got %d", e.TenantID)
}
if e.Action != types.AuditActionOpenSearchIndexCreated {
t.Errorf("action: want %s, got %s", types.AuditActionOpenSearchIndexCreated, e.Action)
}
if e.Details == nil {
t.Error("details should be populated")
}
sink.EmitReindexExecuted(ctxWithTenant(7), "src", "dst", 9)
if len(f.logged) != 2 || f.logged[1].Action != types.AuditActionOpenSearchReindexExecuted {
t.Errorf("reindex audit not recorded: %+v", f.logged)
}
}
func TestAuditSinkAdapter_SkipsWithoutTenant(t *testing.T) {
f := &fakeAuditSvc{}
sink := newAuditSinkAdapter(f)
// background ctx carries no tenant → adapter must skip (never write tenant=0)
sink.EmitIndexCreated(context.Background(), "weknora_768", 768)
sink.EmitReindexExecuted(context.Background(), "a", "b", 1)
if len(f.logged) != 0 {
t.Errorf("want 0 audit entries without tenant, got %d", len(f.logged))
}
}
func TestAuditSinkAdapter_NilServiceNoPanic(t *testing.T) {
sink := newAuditSinkAdapter(nil)
sink.EmitIndexCreated(ctxWithTenant(1), "x", 1) // must not panic
}

View File

@@ -40,6 +40,7 @@ import (
elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8" elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8"
milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus" milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus"
neo4jRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/neo4j" neo4jRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/neo4j"
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres" postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres"
qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant" qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant"
sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite" sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite"
@@ -785,10 +786,16 @@ func initFileService(cfg *config.Config) (interfaces.FileService, error) {
// Returns: // Returns:
// - Configured retrieval engine registry // - Configured retrieval engine registry
// - Error if initialization fails // - Error if initialization fails
func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.RetrieveEngineRegistry, error) { func initRetrieveEngineRegistry(
db *gorm.DB, cfg *config.Config, auditSvc interfaces.AuditLogService,
) (interfaces.RetrieveEngineRegistry, error) {
registry := retriever.NewRetrieveEngineRegistry() registry := retriever.NewRetrieveEngineRegistry()
retrieveDriver := strings.Split(os.Getenv("RETRIEVE_DRIVER"), ",") retrieveDriver := strings.Split(os.Getenv("RETRIEVE_DRIVER"), ",")
log := logger.GetLogger(context.Background()) log := logger.GetLogger(context.Background())
// Audit sink for OpenSearch driver events (index created / reindex). Driver
// events fire under a tenant-scoped ctx at indexing time; the env-path
// registration ctx below has no tenant, so those emits self-skip.
auditSink := newAuditSinkAdapter(auditSvc)
if slices.Contains(retrieveDriver, "postgres") { if slices.Contains(retrieveDriver, "postgres") {
postgresRepo := postgresRepo.NewPostgresRetrieveEngineRepository(db) postgresRepo := postgresRepo.NewPostgresRetrieveEngineRepository(db)
@@ -854,6 +861,29 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
} }
} }
if slices.Contains(retrieveDriver, "opensearch") {
cc := &types.ConnectionConfig{
Addr: os.Getenv("OPENSEARCH_ADDR"),
Username: os.Getenv("OPENSEARCH_USERNAME"),
Password: os.Getenv("OPENSEARCH_PASSWORD"),
InsecureSkipVerify: strings.EqualFold(os.Getenv("OPENSEARCH_INSECURE_SKIP_VERIFY"), "true"),
}
client, err := openSearchRepo.NewOpenSearchClient(cc)
if err != nil {
log.Errorf("Create opensearch client failed: %v", err)
} else if repo, err := openSearchRepo.NewRepository(
context.Background(), client, "", nil, openSearchRepo.WithAuditSink(auditSink),
); err != nil {
log.Errorf("Create opensearch repository failed: %v", err)
} else if err := registry.Register(
retriever.NewKVHybridRetrieveEngine(repo, types.OpenSearchRetrieverEngineType),
); err != nil {
log.Errorf("Register opensearch retrieve engine failed: %v", err)
} else {
log.Infof("Register opensearch retrieve engine success")
}
}
if slices.Contains(retrieveDriver, "qdrant") { if slices.Contains(retrieveDriver, "qdrant") {
qdrantHost := os.Getenv("QDRANT_HOST") qdrantHost := os.Getenv("QDRANT_HOST")
if qdrantHost == "" { if qdrantHost == "" {
@@ -1061,7 +1091,7 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
} }
// ─── DB store registration (byStoreID) ─── // ─── DB store registration (byStoreID) ───
if storeReg, ok := registry.(*retriever.RetrieveEngineRegistry); ok { if storeReg, ok := registry.(*retriever.RetrieveEngineRegistry); ok {
loadDBStoresIntoRegistry(storeReg, db, cfg) loadDBStoresIntoRegistry(storeReg, db, cfg, auditSink)
} }
return registry, nil return registry, nil
@@ -1069,7 +1099,9 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
// loadDBStoresIntoRegistry loads VectorStore records from DB and registers them // loadDBStoresIntoRegistry loads VectorStore records from DB and registers them
// in the registry's byStoreID map. Failures are logged and skipped (non-fatal). // in the registry's byStoreID map. Failures are logged and skipped (non-fatal).
func loadDBStoresIntoRegistry(storeRegistry interfaces.StoreRegistry, db *gorm.DB, cfg *config.Config) { func loadDBStoresIntoRegistry(
storeRegistry interfaces.StoreRegistry, db *gorm.DB, cfg *config.Config, auditSink openSearchRepo.AuditSink,
) {
ctx := context.Background() ctx := context.Background()
log := logger.GetLogger(ctx) log := logger.GetLogger(ctx)
@@ -1086,7 +1118,7 @@ func loadDBStoresIntoRegistry(storeRegistry interfaces.StoreRegistry, db *gorm.D
log.Infof("Loading %d vector store(s) from database", len(stores)) log.Infof("Loading %d vector store(s) from database", len(stores))
for _, store := range stores { for _, store := range stores {
svc, err := createEngineServiceFromStore(ctx, store, db, cfg) svc, err := createEngineServiceFromStore(ctx, store, db, cfg, auditSink)
if err != nil { if err != nil {
log.Errorf("Failed to create engine for store %s (%s): %v", store.ID, store.Name, err) log.Errorf("Failed to create engine for store %s (%s): %v", store.ID, store.Name, err)
continue continue

View File

@@ -23,6 +23,7 @@ import (
elasticsearchRepoV7 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v7" elasticsearchRepoV7 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v7"
elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8" elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8"
milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus" milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus"
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres" postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres"
qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant" qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant"
sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite" sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite"
@@ -35,21 +36,27 @@ import (
"github.com/tencent/vectordatabase-sdk-go/tcvectordb" "github.com/tencent/vectordatabase-sdk-go/tcvectordb"
) )
// NewEngineFactory returns an EngineFactory function closed over db and cfg. // NewEngineFactory returns an EngineFactory function closed over db, cfg, and
// Registered in dig and injected into VectorStoreService for dynamic registry updates. // an audit sink (built from the AuditLogService). Registered in dig and
func NewEngineFactory(db *gorm.DB, cfg *config.Config) interfaces.EngineFactory { // injected into VectorStoreService for dynamic registry updates. The
// EngineFactory type itself is unchanged — the audit sink is captured in the
// closure rather than added to the signature.
func NewEngineFactory(db *gorm.DB, cfg *config.Config, auditSvc interfaces.AuditLogService) interfaces.EngineFactory {
sink := newAuditSinkAdapter(auditSvc)
return func(ctx context.Context, store types.VectorStore) (interfaces.RetrieveEngineService, error) { return func(ctx context.Context, store types.VectorStore) (interfaces.RetrieveEngineService, error) {
return createEngineServiceFromStore(ctx, store, db, cfg) return createEngineServiceFromStore(ctx, store, db, cfg, sink)
} }
} }
// createEngineServiceFromStore creates a RetrieveEngineService from a VectorStore's config. // createEngineServiceFromStore creates a RetrieveEngineService from a VectorStore's config.
// This is the DB store counterpart of the env-based initialization in initRetrieveEngineRegistry. // This is the DB store counterpart of the env-based initialization in initRetrieveEngineRegistry.
// auditSink may be nil (audit becomes a no-op).
func createEngineServiceFromStore( func createEngineServiceFromStore(
ctx context.Context, ctx context.Context,
store types.VectorStore, store types.VectorStore,
db *gorm.DB, db *gorm.DB,
cfg *config.Config, cfg *config.Config,
auditSink openSearchRepo.AuditSink,
) (interfaces.RetrieveEngineService, error) { ) (interfaces.RetrieveEngineService, error) {
switch store.EngineType { switch store.EngineType {
case types.PostgresRetrieverEngineType: case types.PostgresRetrieverEngineType:
@@ -68,11 +75,40 @@ func createEngineServiceFromStore(
return createSQLiteEngine(store, db) return createSQLiteEngine(store, db)
case types.TencentVectorDBRetrieverEngineType: case types.TencentVectorDBRetrieverEngineType:
return createTencentVectorDBEngine(store) return createTencentVectorDBEngine(store)
case types.OpenSearchRetrieverEngineType:
return createOpenSearchEngine(ctx, store, auditSink)
default: default:
return nil, fmt.Errorf("unsupported engine type: %s", store.EngineType) return nil, fmt.Errorf("unsupported engine type: %s", store.EngineType)
} }
} }
// createOpenSearchEngine builds an OpenSearch k-NN retrieve engine. Mirrors
// createElasticsearchV8Engine but uses the driver's TLS-hardened client
// constructor and injects the audit sink. NewRepository probes the cluster
// (version + k-NN plugin), so an unreachable cluster fails here at
// registration rather than on first query.
func createOpenSearchEngine(
ctx context.Context, store types.VectorStore, auditSink openSearchRepo.AuditSink,
) (interfaces.RetrieveEngineService, error) {
client, err := openSearchRepo.NewOpenSearchClient(&store.ConnectionConfig)
if err != nil {
return nil, fmt.Errorf("create opensearch client: %w", err)
}
// Env stores share the cluster without a per-store index prefix; DB stores
// fold their (>=16-char) ID into the index name. NewRepository enforces the
// length rule, so map env-store IDs to "".
storeID := store.ID
if types.IsEnvStoreID(storeID) {
storeID = ""
}
repo, err := openSearchRepo.NewRepository(ctx, client, storeID, &store.IndexConfig,
openSearchRepo.WithAuditSink(auditSink))
if err != nil {
return nil, fmt.Errorf("create opensearch repository: %w", err)
}
return retriever.NewKVHybridRetrieveEngine(repo, types.OpenSearchRetrieverEngineType), nil
}
func createPostgresEngine(store types.VectorStore, db *gorm.DB) (interfaces.RetrieveEngineService, error) { func createPostgresEngine(store types.VectorStore, db *gorm.DB) (interfaces.RetrieveEngineService, error) {
if store.ConnectionConfig.UseDefaultConnection { if store.ConnectionConfig.UseDefaultConnection {
repo := postgresRepo.NewPostgresRetrieveEngineRepository(db) repo := postgresRepo.NewPostgresRetrieveEngineRepository(db)

View File

@@ -0,0 +1,104 @@
package container
import (
"context"
"net/http"
"net/http/httptest"
"testing"
"gorm.io/driver/sqlite"
"gorm.io/gorm"
"github.com/Tencent/WeKnora/internal/config"
"github.com/Tencent/WeKnora/internal/types"
)
// osClusterHandler simulates an OpenSearch cluster for the driver's
// construction probe: GET / (version) + /_cat/plugins (k-NN on every node).
func osClusterHandler(distribution, number string, knnInstalled bool) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path == "/_cat/plugins" {
if knnInstalled {
_, _ = w.Write([]byte(`[{"name":"node-1","component":"opensearch-knn"}]`))
} else {
_, _ = w.Write([]byte(`[{"name":"node-1","component":"opensearch-sql"}]`))
}
return
}
_, _ = w.Write([]byte(`{"version":{"distribution":"` + distribution + `","number":"` + number + `"}}`))
}
}
func TestCreateOpenSearchEngine_WiresClientAndRepo(t *testing.T) {
ts := httptest.NewServer(osClusterHandler("opensearch", "3.3.2", true))
defer ts.Close()
store := types.VectorStore{
ID: "", // env-style (no index prefix)
EngineType: types.OpenSearchRetrieverEngineType,
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL},
}
svc, err := createOpenSearchEngine(context.Background(), store, nil)
if err != nil {
t.Fatalf("createOpenSearchEngine: %v", err)
}
if svc == nil {
t.Fatal("want non-nil engine service")
}
if svc.EngineType() != types.OpenSearchRetrieverEngineType {
t.Errorf("engine type: want opensearch, got %s", svc.EngineType())
}
}
func TestCreateOpenSearchEngine_RejectsBadCluster(t *testing.T) {
// Elasticsearch distribution must be rejected at construction.
ts := httptest.NewServer(osClusterHandler("elasticsearch", "8.10.4", true))
defer ts.Close()
_, err := createOpenSearchEngine(context.Background(),
types.VectorStore{EngineType: types.OpenSearchRetrieverEngineType,
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL}}, nil)
if err == nil {
t.Error("elasticsearch cluster should be rejected at engine creation")
}
}
func TestCreateEngineServiceFromStore_OpenSearchCaseReached(t *testing.T) {
ts := httptest.NewServer(osClusterHandler("opensearch", "2.11.0", true))
defer ts.Close()
svc, err := createEngineServiceFromStore(context.Background(),
types.VectorStore{EngineType: types.OpenSearchRetrieverEngineType,
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL}},
nil, &config.Config{}, nil)
if err != nil {
t.Fatalf("createEngineServiceFromStore (opensearch case): %v", err)
}
if svc == nil || svc.EngineType() != types.OpenSearchRetrieverEngineType {
t.Errorf("opensearch case not wired correctly: %v", svc)
}
}
// TestInitRetrieveEngineRegistry_OpenSearchEnvPath exercises the
// RETRIEVE_DRIVER=opensearch env-path registration block end to end.
func TestInitRetrieveEngineRegistry_OpenSearchEnvPath(t *testing.T) {
ts := httptest.NewServer(osClusterHandler("opensearch", "3.3.2", true))
defer ts.Close()
// In-memory DB: the vector_stores table is absent, so loadDBStores logs
// and returns (non-fatal) — only the env-path block matters here.
db, err := gorm.Open(sqlite.Open(":memory:"), &gorm.Config{})
if err != nil {
t.Fatalf("open in-mem db: %v", err)
}
t.Setenv("RETRIEVE_DRIVER", "opensearch")
t.Setenv("OPENSEARCH_ADDR", ts.URL)
registry, err := initRetrieveEngineRegistry(db, &config.Config{}, &fakeAuditSvc{})
if err != nil {
t.Fatalf("initRetrieveEngineRegistry: %v", err)
}
if _, err := registry.GetRetrieveEngineService(types.OpenSearchRetrieverEngineType); err != nil {
t.Errorf("opensearch engine not registered via env path: %v", err)
}
}

View File

@@ -50,6 +50,10 @@ var retrieverEngineMapping = map[string][]RetrieverEngineParams{
{RetrieverType: KeywordsRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType}, {RetrieverType: KeywordsRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType},
{RetrieverType: VectorRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType}, {RetrieverType: VectorRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType},
}, },
"opensearch": {
{RetrieverType: KeywordsRetrieverType, RetrieverEngineType: OpenSearchRetrieverEngineType},
{RetrieverType: VectorRetrieverType, RetrieverEngineType: OpenSearchRetrieverEngineType},
},
} }
// GetRetrieverEngineMapping returns the retriever engine mapping // GetRetrieverEngineMapping returns the retriever engine mapping

View File

@@ -86,6 +86,7 @@ var validEngineTypes = map[RetrieverEngineType]bool{
WeaviateRetrieverEngineType: true, WeaviateRetrieverEngineType: true,
DorisRetrieverEngineType: true, DorisRetrieverEngineType: true,
TencentVectorDBRetrieverEngineType: true, TencentVectorDBRetrieverEngineType: true,
OpenSearchRetrieverEngineType: true,
} }
// IsValidEngineType checks whether the given engine type is valid for VectorStore. // IsValidEngineType checks whether the given engine type is valid for VectorStore.
@@ -250,6 +251,14 @@ type IndexConfig struct {
DesiredShardCount int `yaml:"desired_shard_count" json:"desired_shard_count,omitempty"` // Weaviate: number of shards per collection DesiredShardCount int `yaml:"desired_shard_count" json:"desired_shard_count,omitempty"` // Weaviate: number of shards per collection
BucketsNum int `yaml:"buckets_num" json:"buckets_num,omitempty"` // Doris: number of buckets per table (DISTRIBUTED BY HASH ... BUCKETS N) BucketsNum int `yaml:"buckets_num" json:"buckets_num,omitempty"` // Doris: number of buckets per table (DISTRIBUTED BY HASH ... BUCKETS N)
ReplicationNum int `yaml:"replication_num" json:"replication_num,omitempty"` // Doris: replication_num PROPERTIES ReplicationNum int `yaml:"replication_num" json:"replication_num,omitempty"` // Doris: replication_num PROPERTIES
// --- OpenSearch k-NN HNSW fields ---
// All omitempty so other engines' serialized IndexConfig is unchanged.
// Zero / empty values fall back to the driver defaults in buildInternalCfg.
HNSWM int `yaml:"hnsw_m" json:"hnsw_m,omitempty"` // OpenSearch: HNSW graph degree (M)
HNSWEFConstruction int `yaml:"hnsw_ef_construction" json:"hnsw_ef_construction,omitempty"` // OpenSearch: HNSW index-build candidate list size
HNSWEFSearch int `yaml:"hnsw_ef_search" json:"hnsw_ef_search,omitempty"` // OpenSearch: HNSW search candidate list size (faiss; lucene reads at query time)
KNNEngine string `yaml:"knn_engine" json:"knn_engine,omitempty"` // OpenSearch: k-NN backend ("lucene" | "faiss")
} }
// Value implements the driver.Valuer interface. // Value implements the driver.Valuer interface.
@@ -727,9 +736,32 @@ func GetVectorStoreTypes() []VectorStoreTypeInfo {
{Name: "replication_num", Type: "number", Required: false, Description: "Replication Num", Default: 1}, {Name: "replication_num", Type: "number", Required: false, Description: "Replication Num", Default: 1},
}, },
}, },
{
Type: "opensearch",
DisplayName: "OpenSearch",
ConnectionFields: []VectorStoreFieldInfo{
{Name: "addr", Type: "string", Required: true, Description: "URL", Default: "https://localhost:9200"},
{Name: "username", Type: "string", Required: false, Description: "Username", Default: "admin"},
{Name: "password", Type: "string", Required: false, Sensitive: true, Description: "Password"},
{Name: "insecure_skip_verify", Type: "boolean", Required: false, Default: false,
Description: "Skip TLS certificate verification. For self-signed dev clusters only — never enable in production."},
},
IndexFields: []VectorStoreFieldInfo{
{Name: "index_name", Type: "string", Required: false, Description: "Index Name", Default: "weknora"},
{Name: "number_of_shards", Type: "number", Required: false, Description: "Shards", Default: 4, Min: floatPtr(1), Max: floatPtr(64)},
{Name: "number_of_replicas", Type: "number", Required: false, Description: "Replicas", Default: 1, Min: floatPtr(0), Max: floatPtr(10)},
{Name: "hnsw_m", Type: "number", Required: false, Description: "HNSW graph degree (M). Immutable after index creation.", Default: 16, Min: floatPtr(2), Max: floatPtr(100), Immutable: true},
{Name: "hnsw_ef_construction", Type: "number", Required: false, Description: "HNSW build candidate list. Higher (e.g. 200-512) improves recall at the cost of build time. Immutable after creation.", Default: 100, Min: floatPtr(2), Max: floatPtr(4096), Immutable: true},
{Name: "hnsw_ef_search", Type: "number", Required: false, Description: "HNSW search candidate list. Effective on the faiss engine; the lucene engine reads it at query time. Immutable (no settings-update path).", Default: 100, Min: floatPtr(1), Max: floatPtr(10000), Immutable: true},
{Name: "knn_engine", Type: "string", Required: false, Description: "k-NN backend.", Default: "lucene", Enum: []string{"lucene", "faiss"}, Immutable: true},
},
},
} }
} }
// floatPtr returns a pointer to v, for setting VectorStoreFieldInfo Min/Max.
func floatPtr(v float64) *float64 { return &v }
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
// BuildEnvVectorStores — virtual stores from RETRIEVE_DRIVER env var // BuildEnvVectorStores — virtual stores from RETRIEVE_DRIVER env var
// --------------------------------------------------------------------------- // ---------------------------------------------------------------------------
@@ -821,6 +853,21 @@ func buildEnvStoreForDriver(driver string, envLookup EnvLookupFunc) *VectorStore
IndexName: envLookup("ELASTICSEARCH_INDEX"), IndexName: envLookup("ELASTICSEARCH_INDEX"),
}, },
} }
case "opensearch":
return &VectorStore{
ID: "__env_opensearch__",
Name: "OpenSearch",
EngineType: OpenSearchRetrieverEngineType,
ConnectionConfig: ConnectionConfig{
Addr: envLookup("OPENSEARCH_ADDR"),
Username: envLookup("OPENSEARCH_USERNAME"),
Password: envLookup("OPENSEARCH_PASSWORD"),
InsecureSkipVerify: strings.EqualFold(envLookup("OPENSEARCH_INSECURE_SKIP_VERIFY"), "true"),
},
IndexConfig: IndexConfig{
IndexName: envLookup("OPENSEARCH_INDEX"),
},
}
case "qdrant": case "qdrant":
return &VectorStore{ return &VectorStore{
ID: "__env_qdrant__", ID: "__env_qdrant__",

View File

@@ -0,0 +1,154 @@
package types
import (
"encoding/json"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// findVSType returns the VectorStoreTypeInfo with the given Type, or fails.
func findVSType(t *testing.T, typeName string) VectorStoreTypeInfo {
t.Helper()
for _, vt := range GetVectorStoreTypes() {
if vt.Type == typeName {
return vt
}
}
t.Fatalf("GetVectorStoreTypes() has no entry for %q", typeName)
return VectorStoreTypeInfo{}
}
// findField returns the field with the given Name from a slice, or fails.
func findField(t *testing.T, fields []VectorStoreFieldInfo, name string) VectorStoreFieldInfo {
t.Helper()
for _, f := range fields {
if f.Name == name {
return f
}
}
t.Fatalf("no field named %q", name)
return VectorStoreFieldInfo{}
}
// TestIndexConfig_OpenSearchFieldsOmittedForOtherEngines verifies the new HNSW
// fields are omitempty so other engines' serialized IndexConfig is unchanged.
func TestIndexConfig_OpenSearchFieldsOmittedForOtherEngines(t *testing.T) {
t.Run("omitted when unset", func(t *testing.T) {
b, err := json.Marshal(IndexConfig{IndexName: "weknora", NumberOfShards: 4})
require.NoError(t, err)
s := string(b)
assert.NotContains(t, s, "hnsw_m")
assert.NotContains(t, s, "hnsw_ef_construction")
assert.NotContains(t, s, "hnsw_ef_search")
assert.NotContains(t, s, "knn_engine")
})
t.Run("present when set", func(t *testing.T) {
b, err := json.Marshal(IndexConfig{
HNSWM: 24,
HNSWEFConstruction: 200,
HNSWEFSearch: 128,
KNNEngine: "faiss",
})
require.NoError(t, err)
s := string(b)
assert.Contains(t, s, `"hnsw_m":24`)
assert.Contains(t, s, `"hnsw_ef_construction":200`)
assert.Contains(t, s, `"hnsw_ef_search":128`)
assert.Contains(t, s, `"knn_engine":"faiss"`)
})
}
// TestIsValidEngineType_OpenSearch verifies OpenSearch is now a valid DB-store engine.
func TestIsValidEngineType_OpenSearch(t *testing.T) {
assert.True(t, IsValidEngineType(OpenSearchRetrieverEngineType))
}
// TestGetVectorStoreTypes_OpenSearchEntry verifies the OpenSearch metadata entry
// exposes the connection + HNSW index fields with the right bounds/enum/immutable.
func TestGetVectorStoreTypes_OpenSearchEntry(t *testing.T) {
vt := findVSType(t, "opensearch")
assert.Equal(t, "OpenSearch", vt.DisplayName)
// Connection fields
insecure := findField(t, vt.ConnectionFields, "insecure_skip_verify")
assert.Equal(t, "boolean", insecure.Type)
assert.Equal(t, false, insecure.Default) // never default-true
pw := findField(t, vt.ConnectionFields, "password")
assert.True(t, pw.Sensitive)
// HNSW index fields: bounds match the flat-validator-aligned caps (14-D)
m := findField(t, vt.IndexFields, "hnsw_m")
require.NotNil(t, m.Min)
require.NotNil(t, m.Max)
assert.Equal(t, 2.0, *m.Min)
assert.Equal(t, 100.0, *m.Max)
assert.True(t, m.Immutable)
shards := findField(t, vt.IndexFields, "number_of_shards")
require.NotNil(t, shards.Max)
assert.Equal(t, 64.0, *shards.Max) // flat ValidateIndexConfig maxShards
replicas := findField(t, vt.IndexFields, "number_of_replicas")
require.NotNil(t, replicas.Max)
assert.Equal(t, 10.0, *replicas.Max) // flat maxReplicas
eng := findField(t, vt.IndexFields, "knn_engine")
assert.ElementsMatch(t, []string{"lucene", "faiss"}, eng.Enum)
assert.True(t, eng.Immutable)
efs := findField(t, vt.IndexFields, "hnsw_ef_search")
assert.True(t, efs.Immutable) // no PutSettings path → immutable
}
// TestBuildEnvVectorStores_OpenSearch verifies the env-store builder case.
func TestBuildEnvVectorStores_OpenSearch(t *testing.T) {
lookup := mockEnvLookup(map[string]string{
"OPENSEARCH_ADDR": "https://os:9200",
"OPENSEARCH_USERNAME": "admin",
"OPENSEARCH_PASSWORD": "secret",
"OPENSEARCH_INDEX": "weknora",
"OPENSEARCH_INSECURE_SKIP_VERIFY": "true",
})
stores := BuildEnvVectorStores("opensearch", lookup)
require.Len(t, stores, 1)
s := stores[0]
assert.Equal(t, "__env_opensearch__", s.ID)
assert.Equal(t, OpenSearchRetrieverEngineType, s.EngineType)
assert.Equal(t, "https://os:9200", s.ConnectionConfig.Addr)
assert.Equal(t, "admin", s.ConnectionConfig.Username)
assert.Equal(t, "secret", s.ConnectionConfig.Password)
assert.True(t, s.ConnectionConfig.InsecureSkipVerify)
assert.Equal(t, "weknora", s.IndexConfig.IndexName)
}
// TestBuildEnvVectorStores_OpenSearch_InsecureDefaultsFalse verifies the TLS
// skip flag is false unless the env var is explicitly "true".
func TestBuildEnvVectorStores_OpenSearch_InsecureDefaultsFalse(t *testing.T) {
lookup := mockEnvLookup(map[string]string{"OPENSEARCH_ADDR": "https://os:9200"})
stores := BuildEnvVectorStores("opensearch", lookup)
require.Len(t, stores, 1)
assert.False(t, stores[0].ConnectionConfig.InsecureSkipVerify)
}
// TestRetrieverEngineMapping_OpenSearch verifies the RETRIEVE_DRIVER mapping.
func TestRetrieverEngineMapping_OpenSearch(t *testing.T) {
m := GetRetrieverEngineMapping()
params, ok := m["opensearch"]
require.True(t, ok, `retrieverEngineMapping missing "opensearch"`)
require.Len(t, params, 2)
var hasKeywords, hasVector bool
for _, p := range params {
assert.Equal(t, OpenSearchRetrieverEngineType, p.RetrieverEngineType)
switch p.RetrieverType {
case KeywordsRetrieverType:
hasKeywords = true
case VectorRetrieverType:
hasVector = true
}
}
assert.True(t, hasKeywords && hasVector, "expected both Keywords and Vector retriever types")
}

View File

@@ -246,7 +246,7 @@ func TestGetVectorStoreTypes(t *testing.T) {
types := GetVectorStoreTypes() types := GetVectorStoreTypes()
t.Run("returns supported external engine types (excludes postgres and sqlite)", func(t *testing.T) { t.Run("returns supported external engine types (excludes postgres and sqlite)", func(t *testing.T) {
assert.Len(t, types, 6) assert.Len(t, types, 7)
}) })
t.Run("type names match engine constants", func(t *testing.T) { t.Run("type names match engine constants", func(t *testing.T) {
@@ -260,6 +260,7 @@ func TestGetVectorStoreTypes(t *testing.T) {
assert.Contains(t, typeNames, "tencent_vectordb") assert.Contains(t, typeNames, "tencent_vectordb")
assert.Contains(t, typeNames, "weaviate") assert.Contains(t, typeNames, "weaviate")
assert.Contains(t, typeNames, "doris") assert.Contains(t, typeNames, "doris")
assert.Contains(t, typeNames, "opensearch")
assert.NotContains(t, typeNames, "postgres") assert.NotContains(t, typeNames, "postgres")
assert.NotContains(t, typeNames, "sqlite") assert.NotContains(t, typeNames, "sqlite")
}) })
@@ -399,9 +400,10 @@ func TestIsValidEngineType(t *testing.T) {
// GetVectorStoreTypes does not list them, Validate rejects them, and // GetVectorStoreTypes does not list them, Validate rejects them, and
// env stores reach the engine registry through BuildEnvVectorStores // env stores reach the engine registry through BuildEnvVectorStores
// instead of through CreateStore. // instead of through CreateStore.
// Note: opensearch is now a VALID DB-store engine (activated in this PR);
// see TestIsValidEngineType_OpenSearch in vectorstore_opensearch_test.go.
invalidTypes := []RetrieverEngineType{ invalidTypes := []RetrieverEngineType{
"unknown", "unknown",
"opensearch",
"", "",
PostgresRetrieverEngineType, PostgresRetrieverEngineType,
SQLiteRetrieverEngineType, SQLiteRetrieverEngineType,
@@ -1350,64 +1352,3 @@ func TestOpenSearchRetrieverEngineType_DistinctFromExisting(t *testing.T) {
"OpenSearch wire value must not collide with %s", e) "OpenSearch wire value must not collide with %s", e)
} }
} }
// TestOpenSearchRetrieverEngineType_NotInValidEngineTypes verifies
// that PR 1 does NOT add the new engine type to validEngineTypes —
// this is the gate that keeps OpenSearch VectorStore registration
// rejected until activation lands in a later PR.
func TestOpenSearchRetrieverEngineType_NotInValidEngineTypes(t *testing.T) {
assert.False(t, IsValidEngineType(OpenSearchRetrieverEngineType),
"OpenSearch must remain invalid for VectorStore registration "+
"until activation lands in a later PR (gated activation)")
}
// The next three tests are defense-in-depth companions to
// TestOpenSearchRetrieverEngineType_NotInValidEngineTypes. Each pins
// a separate activation surface that must remain closed in PR 1 (and
// in PR 2). Activation lands together with a coordinated flip in
// PR 3, at which point each of these `assert.False` / `assert.NotContains`
// / `assert.Nil` lines flips to its positive counterpart in the same
// diff — the test suite becomes the activation checklist.
// TestRetrieverEngineMapping_OpenSearchNotRegistered pins that
// `retrieverEngineMapping` does not have an `"opensearch"` key. Without
// this entry, setting `RETRIEVE_DRIVER=opensearch` is a silent no-op
// (GetDefaultRetrieverEngines drops the unknown driver from its loop
// at tenant.go).
func TestRetrieverEngineMapping_OpenSearchNotRegistered(t *testing.T) {
mapping := GetRetrieverEngineMapping()
_, ok := mapping["opensearch"]
assert.False(t, ok,
"retrieverEngineMapping must not register opensearch until "+
"activation lands in a later PR (gated activation)")
}
// TestGetVectorStoreTypes_OmitsOpenSearch pins that the
// /api/v1/vector-stores/types response does NOT list opensearch.
// Without this entry, the UI dropdown cannot offer OpenSearch as a
// store type even if a frontend renderer accidentally tries.
func TestGetVectorStoreTypes_OmitsOpenSearch(t *testing.T) {
listed := GetVectorStoreTypes()
for _, info := range listed {
assert.NotEqual(t, "opensearch", info.Type,
"GetVectorStoreTypes must not surface opensearch until "+
"activation lands in a later PR (gated activation)")
}
}
// TestBuildEnvVectorStores_OpenSearchSkipped pins that
// `BuildEnvVectorStores("opensearch", lookup)` returns nil — the
// `default:` arm of `buildEnvStoreForDriver`. Without an explicit
// `case "opensearch":` arm, container.go cannot synthesize an env
// store for the driver, completing the third lock on the activation
// chain.
func TestBuildEnvVectorStores_OpenSearchSkipped(t *testing.T) {
stores := BuildEnvVectorStores("opensearch", mockEnvLookup(map[string]string{
"OPENSEARCH_ADDR": "https://os:9200",
"OPENSEARCH_USERNAME": "admin",
}))
assert.Empty(t, stores,
"BuildEnvVectorStores must not synthesize an env store for "+
"opensearch until activation lands in a later PR (gated "+
"activation)")
}