mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
feat(retriever): activate OpenSearch k-NN driver (PR 3 of 3)
Phase 3 (#1440) gate flip. PR 1 (#1445) + PR 2a (#1481) + PR 2b (#1482) laid the type prep + driver skeleton + read/write paths as gated dead code; this PR wires every activation surface so opensearch becomes a registerable VectorStore engine. Activation wiring - internal/types: validEngineTypes / GetVectorStoreTypes (with HNSW bounds + knn_engine enum + Immutable hints) / retrieverEngineMapping / buildEnvStoreForDriver — every gated surface now recognises "opensearch". IndexConfig grows four omitempty HNSW fields (HNSWM / HNSWEFConstruction / HNSWEFSearch / KNNEngine), keeping other engines' serialised config byte-identical. - internal/container: createOpenSearchEngine + the switch case in createEngineServiceFromStore; the RETRIEVE_DRIVER=opensearch env path in initRetrieveEngineRegistry; NewEngineFactory now closes over the AuditLogService (the EngineFactory type itself is unchanged). - internal/application/service/vectorstore_healthcheck.go: a testOpenSearchConnection case so CreateStore's connectivity probe accepts opensearch instead of returning 400. - internal/application/repository/retriever/opensearch/transport.go: NewOpenSearchClient is exported so the factory and env path can build the TLS-hardened client; healthcheck.go reuses the unexported probeVersion / probeKNNPlugin for the service-layer probe. Service-layer validation - validateOpenSearchIndexConfig validates the HNSW caps (m 2-100, ef_construction 2-4096, ef_search 1-10000, knn_engine ∈ lucene|faiss). Shards/replicas continue to be enforced by the flat ValidateIndexConfig. Create-only: UpdateStore mutates the name only. - validateConnectionConfig requires addr for opensearch. Sync implementations (stubs.go shrinks) - CopyIndices (copy.go) mirrors the Elasticsearch / Qdrant pattern — search → BatchSave with the source_id remap for generated questions — so dim/keyword routing and the source_id contract come from BatchSave for free. embeddingMap is keyed by the *target* SourceID because OpenSearch's BatchSave looks up embeddings by SourceID (lookupEmbedding), not by chunk_id (the ES driver's convention). Pagination is from/size; copies larger than max_result_window (default 10000) need the scroll-based async path that lands later. - BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID (bulk_update.go) group the input by target value and issue one _update_by_query per group over the cross-dim <base>_* pattern. Caller values flow through bound script params only — never string-interpolated into the Painless source — closing the script-injection surface. - inspectByQueryResponse (byquery.go) mirrors inspectBulkResponse: the full failure reason goes to the debug log only; the returned error carries the bounded id + type. - UpdateByQueryParams.Refresh is *bool in opensearch-go v4.6.0 (the same shape as DeleteByQuery's quirk), so refresh=wait_for is not expressible; we use refresh=true. Driver-owned audit (DIP) - A new opensearch.AuditSink interface (with nopSink + WithAuditSink functional option) lets the driver emit opensearch.index_created and opensearch.reindex_executed events without importing any service package — the service layer implements the interface. NewRepository takes opts, so existing 4-arg test call sites keep compiling unchanged. - internal/container/audit_sink.go bridges AuditSink to AuditLogService. When the context carries no tenant (the env-path registration ctx during boot, for example) the adapter skips the emit with a warning rather than silently writing tenant_id=0, which would collide with the system-scope sentinel. Frontend + polish - FieldSchema (frontend/src/api/vector-store.ts) gains min/max/enum/ immutable. VectorStoreSettings.vue is now schema-driven: a closed `enum` renders a t-select; number inputs use the schema's `:min`/`:max` and fall back to the legacy replica-vs-shard heuristic only when the schema does not pin them; a danger-coloured warning fires when insecure_skip_verify is toggled on (the switch and warning are wrapped in a vertical stack so the warning sits on its own row below the switch). - i18n: labels for hnsw_m / hnsw_ef_construction / hnsw_ef_search / knn_engine / insecure_skip_verify plus the warning copy in en-US, ko-KR, zh-CN, ru-RU. - docker-compose.dev.yml: an opensearch profile (single-node 3.3.2 with security plugin disabled for dev only). OpenSearch Dashboards lives in a separate, opt-in opensearch-ui profile so the heavy UI container is not forced up alongside the cluster (the driver e2e is fully curl-verifiable against :9200). The new docs/dev/opensearch-integration-test.md covers the end-to-end exercise and the single-node guidance (set replicas=0 to keep the cluster Green). Gating-guard tests flipped - The "OpenSearch is NOT in validEngineTypes / mapping / types list / env builder / stubs" guard tests from PR 1 / PR 2 are replaced by their positive counterparts in this PR. The test suite was the activation checklist; the activation flip is its diff. Backward compatibility - Additive everywhere. IndexConfig's new HNSW fields are omitempty so other engines' serialised config is byte-identical. Existing Elasticsearch / Qdrant / Milvus / Weaviate / Doris / TencentVectorDB stores are untouched. No migrations. Test plan - go build ./... clean - go vet ./... clean - gofmt -l clean on touched files - go test ./... — only TestOssEnsureBucket_CreateFails (Aliyun OSS endpoint), the docreader gRPC tests, and the doris SQL-shape tests fail; all three are pre-existing on upstream/main and untouched by this PR. - New tests across internal/types, opensearch, service and container — including a full end-to-end env-path test that exercises initRetrieveEngineRegistry with RETRIEVE_DRIVER=opensearch against an httptest cluster.
This commit is contained in:
@@ -119,6 +119,61 @@ services:
|
||||
- qdrant
|
||||
- full
|
||||
|
||||
# OpenSearch k-NN (Phase 3 driver). Single-node dev profile with the
|
||||
# security plugin disabled → plain HTTP on :9200, no auth/TLS. The image
|
||||
# bundles the opensearch-knn plugin. For production use a secured,
|
||||
# multi-node cluster. See docs/dev/opensearch-integration-test.md.
|
||||
opensearch:
|
||||
image: opensearchproject/opensearch:3.3.2
|
||||
container_name: WeKnora-opensearch-dev
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
# dev only: plain HTTP on :9200, no TLS/auth. The entrypoint script
|
||||
# honours DISABLE_SECURITY_PLUGIN (env var) to skip both the demo
|
||||
# install and the OPENSEARCH_INITIAL_ADMIN_PASSWORD requirement.
|
||||
- DISABLE_SECURITY_PLUGIN=true
|
||||
- DISABLE_INSTALL_DEMO_CONFIG=true
|
||||
- OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
|
||||
- bootstrap.memory_lock=true
|
||||
ulimits:
|
||||
memlock:
|
||||
soft: -1
|
||||
hard: -1
|
||||
ports:
|
||||
- "${OPENSEARCH_PORT:-9200}:9200"
|
||||
volumes:
|
||||
- opensearch_data_dev:/usr/share/opensearch/data
|
||||
networks:
|
||||
- WeKnora-network-dev
|
||||
restart: unless-stopped
|
||||
profiles:
|
||||
- opensearch
|
||||
- full
|
||||
# Also a member of opensearch-ui so the Dashboards depends_on resolves
|
||||
# when only that profile is active (`--profile opensearch-ui up`).
|
||||
- opensearch-ui
|
||||
|
||||
# Optional UI for visual index/mapping/query inspection. Decoupled from the
|
||||
# "opensearch" / "full" profiles so the heavy Dashboards container is never
|
||||
# forced up alongside the cluster — the driver e2e is fully curl-verifiable
|
||||
# against :9200. Start it on demand with `--profile opensearch-ui up -d`
|
||||
# (depends_on pulls the cluster in automatically).
|
||||
opensearch-dashboards:
|
||||
image: opensearchproject/opensearch-dashboards:3.3.0
|
||||
container_name: WeKnora-opensearch-dashboards-dev
|
||||
environment:
|
||||
- OPENSEARCH_HOSTS=["http://opensearch:9200"]
|
||||
- DISABLE_SECURITY_DASHBOARDS_PLUGIN=true
|
||||
ports:
|
||||
- "${OPENSEARCH_DASHBOARDS_PORT:-5601}:5601"
|
||||
networks:
|
||||
- WeKnora-network-dev
|
||||
depends_on:
|
||||
- opensearch
|
||||
restart: unless-stopped
|
||||
profiles:
|
||||
- opensearch-ui
|
||||
|
||||
milvus:
|
||||
image: milvusdb/milvus:v2.6.11
|
||||
container_name: WeKnora-milvus-dev
|
||||
@@ -468,6 +523,7 @@ volumes:
|
||||
neo4j-data-dev:
|
||||
jaeger_data_dev:
|
||||
qdrant_data_dev:
|
||||
opensearch_data_dev:
|
||||
milvus_data_dev:
|
||||
docreader-tmp-dev:
|
||||
langfuse_clickhouse_data_dev:
|
||||
|
||||
109
docs/dev/opensearch-integration-test.md
Normal file
109
docs/dev/opensearch-integration-test.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# OpenSearch k-NN driver — local integration test
|
||||
|
||||
This guide brings up a single-node OpenSearch cluster and exercises the
|
||||
OpenSearch retrieve engine end to end. The driver lives in
|
||||
`internal/application/repository/retriever/opensearch/`.
|
||||
|
||||
## 1. Start a dev cluster
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yml --profile opensearch up -d
|
||||
```
|
||||
|
||||
This starts:
|
||||
|
||||
- `opensearch` on `http://localhost:9200` — single-node, **security plugin
|
||||
disabled** (plain HTTP, no auth/TLS). The image bundles the
|
||||
`opensearch-knn` plugin.
|
||||
|
||||
> **OpenSearch Dashboards is optional** and lives in a separate
|
||||
> `opensearch-ui` profile, so it is *not* started by `--profile opensearch`.
|
||||
> The whole integration test below is curl-verifiable against `:9200`. If you
|
||||
> want the web UI (Dev Tools console / visual index inspection), start it on
|
||||
> demand:
|
||||
>
|
||||
> ```bash
|
||||
> docker compose -f docker-compose.dev.yml --profile opensearch-ui up -d
|
||||
> # opensearch-dashboards on http://localhost:5601 (depends_on pulls the cluster in)
|
||||
> ```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
curl -s localhost:9200 | jq '.version.distribution, .version.number'
|
||||
# "opensearch" "3.3.2"
|
||||
curl -s 'localhost:9200/_cat/plugins?format=json' | jq -r '.[].component' | grep opensearch-knn
|
||||
```
|
||||
|
||||
> Production clusters must enable the security plugin (TLS + auth). The dev
|
||||
> profile disables it only to keep local setup trivial. When connecting to a
|
||||
> secured cluster, set `username` / `password` and — for self-signed certs in
|
||||
> dev only — `insecure_skip_verify=true`.
|
||||
|
||||
## 2. Register the store
|
||||
|
||||
### Option A — DB store (UI / API)
|
||||
|
||||
`POST /api/v1/vector-stores`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "opensearch-local",
|
||||
"engine_type": "opensearch",
|
||||
"connection_config": { "addr": "http://localhost:9200" },
|
||||
"index_config": {
|
||||
"number_of_shards": 1,
|
||||
"number_of_replicas": 0,
|
||||
"hnsw_m": 16,
|
||||
"hnsw_ef_construction": 100,
|
||||
"knn_engine": "lucene"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`CreateStore` runs the connection probe (version + k-NN plugin) before
|
||||
persisting; a bad address / unsupported version / missing plugin is rejected
|
||||
with `400`.
|
||||
|
||||
### Option B — env store
|
||||
|
||||
```bash
|
||||
export RETRIEVE_DRIVER=opensearch
|
||||
export OPENSEARCH_ADDR=http://localhost:9200
|
||||
# export OPENSEARCH_USERNAME / OPENSEARCH_PASSWORD for a secured cluster
|
||||
# export OPENSEARCH_INSECURE_SKIP_VERIFY=true # self-signed dev TLS only
|
||||
```
|
||||
|
||||
## 3. Single-node note (important)
|
||||
|
||||
On a single-node cluster, any index created with `number_of_replicas >= 1`
|
||||
leaves its replica shard **unassigned**, so the index health goes **Yellow**.
|
||||
Yellow does **not** block reads or writes — it is safe for local testing — but
|
||||
to keep the cluster Green set **`number_of_replicas: 0`** at store
|
||||
registration (as in the Option A example above). The driver default is `1`
|
||||
(it assumes a ≥2-node cluster).
|
||||
|
||||
## 4. Exercise the flow
|
||||
|
||||
1. Bind a knowledge base to the store and ingest a few documents.
|
||||
2. Confirm the per-dimension index appears:
|
||||
`curl -s 'localhost:9200/_cat/indices?v' | grep weknora`
|
||||
(e.g. `weknora_<storeprefix>_768` + alias, plus `weknora_<storeprefix>_keywords`).
|
||||
3. Run a retrieval query against the bound KB and confirm hits come back.
|
||||
4. Copy the KB to another KB and confirm the docs are reindexed
|
||||
(`opensearch.reindex_executed` audit event).
|
||||
5. Toggle chunk enabled-status / tag and confirm `_update_by_query` applies it.
|
||||
|
||||
## 5. Tear down
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yml --profile opensearch down -v
|
||||
```
|
||||
|
||||
## Scope notes
|
||||
|
||||
- Large-batch async reindex / delete (task polling) is a follow-up; the sync
|
||||
paths handle typical KB sizes (pagination is bounded by `max_result_window`,
|
||||
default 10000).
|
||||
- Native `hybrid` query + search pipeline is out of scope — fusion stays at the
|
||||
service layer (RRF).
|
||||
@@ -29,6 +29,16 @@ export interface FieldSchema {
|
||||
sensitive?: boolean
|
||||
description?: string
|
||||
default?: any
|
||||
// Inclusive bounds for number fields (omitempty on the backend). When
|
||||
// absent the UI falls back to per-field heuristics (isReplicaField).
|
||||
min?: number
|
||||
max?: number
|
||||
// Closed value set for string fields (e.g. knn_engine ∈ lucene|faiss).
|
||||
// When non-empty the UI renders a select instead of a free-text input.
|
||||
enum?: string[]
|
||||
// Marks a field that cannot change after store creation. Informational
|
||||
// for now (edit mode is fully read-only); kept for forward use.
|
||||
immutable?: boolean
|
||||
}
|
||||
|
||||
// ===== API Functions =====
|
||||
|
||||
@@ -1107,11 +1107,17 @@ export default {
|
||||
shards_num: 'Shards',
|
||||
replica_number: 'In-memory Replicas',
|
||||
desired_shard_count: 'Shard Count',
|
||||
insecure_skip_verify: 'Skip TLS Verification',
|
||||
hnsw_m: 'HNSW M (graph degree)',
|
||||
hnsw_ef_construction: 'HNSW ef_construction',
|
||||
hnsw_ef_search: 'HNSW ef_search',
|
||||
knn_engine: 'k-NN Engine',
|
||||
},
|
||||
envTag: 'DEFAULT',
|
||||
testConnection: 'Test Connection',
|
||||
testing: 'Testing...',
|
||||
immutableNotice: 'Engine type, connection, and index settings cannot be changed after creation.\nTo change these, delete and recreate.',
|
||||
insecureSkipVerifyWarning: 'Disabling TLS certificate verification exposes the connection to man-in-the-middle attacks. Use only for self-signed development clusters — never in production.',
|
||||
validation: {
|
||||
nameRequired: 'Name is required',
|
||||
engineTypeRequired: 'Engine type is required',
|
||||
|
||||
@@ -967,11 +967,17 @@ export default {
|
||||
shards_num: "샤드 수",
|
||||
replica_number: "인메모리 레플리카",
|
||||
desired_shard_count: "샤드 수",
|
||||
insecure_skip_verify: "TLS 인증서 검증 생략",
|
||||
hnsw_m: "HNSW M (그래프 차수)",
|
||||
hnsw_ef_construction: "HNSW ef_construction",
|
||||
hnsw_ef_search: "HNSW ef_search",
|
||||
knn_engine: "k-NN 엔진",
|
||||
},
|
||||
envTag: "DEFAULT",
|
||||
testConnection: "연결 테스트",
|
||||
testing: "테스트 중...",
|
||||
immutableNotice: "엔진 타입, 연결 정보, 인덱스 설정은 생성 후 변경할 수 없습니다.\n변경이 필요하면 삭제 후 다시 생성하세요.",
|
||||
insecureSkipVerifyWarning: "TLS 인증서 검증을 끄면 중간자 공격에 노출됩니다. 자체 서명 인증서를 쓰는 개발 클러스터에서만 사용하고, 운영 환경에서는 절대 사용하지 마세요.",
|
||||
validation: {
|
||||
nameRequired: "이름은 필수입니다",
|
||||
engineTypeRequired: "엔진 타입은 필수입니다",
|
||||
|
||||
@@ -1019,11 +1019,17 @@ export default {
|
||||
shards_num: 'Шарды',
|
||||
replica_number: 'Реплики в памяти',
|
||||
desired_shard_count: 'Количество шардов',
|
||||
insecure_skip_verify: 'Пропустить проверку TLS',
|
||||
hnsw_m: 'HNSW M (степень графа)',
|
||||
hnsw_ef_construction: 'HNSW ef_construction',
|
||||
hnsw_ef_search: 'HNSW ef_search',
|
||||
knn_engine: 'Движок k-NN',
|
||||
},
|
||||
envTag: 'DEFAULT',
|
||||
testConnection: 'Тест подключения',
|
||||
testing: 'Тестирование...',
|
||||
immutableNotice: 'Тип движка, подключение и настройки индекса нельзя изменить после создания.\nДля изменения удалите и создайте заново.',
|
||||
insecureSkipVerifyWarning: 'Отключение проверки сертификата TLS делает соединение уязвимым для атак «человек посередине». Используйте только для dev-кластеров с самоподписанными сертификатами — никогда в продакшене.',
|
||||
validation: {
|
||||
nameRequired: 'Название обязательно',
|
||||
engineTypeRequired: 'Тип движка обязателен',
|
||||
|
||||
@@ -965,11 +965,17 @@ export default {
|
||||
shards_num: "分片数",
|
||||
replica_number: "内存副本数",
|
||||
desired_shard_count: "分片数",
|
||||
insecure_skip_verify: "跳过 TLS 证书校验",
|
||||
hnsw_m: "HNSW M(图度数)",
|
||||
hnsw_ef_construction: "HNSW ef_construction",
|
||||
hnsw_ef_search: "HNSW ef_search",
|
||||
knn_engine: "k-NN 引擎",
|
||||
},
|
||||
envTag: "DEFAULT",
|
||||
testConnection: "测试连接",
|
||||
testing: "测试中...",
|
||||
immutableNotice: "创建后无法更改引擎类型、连接和索引设置。\n如需更改,请删除后重新创建。",
|
||||
insecureSkipVerifyWarning: "关闭 TLS 证书校验会使连接面临中间人攻击风险。仅可用于自签名证书的开发集群,切勿在生产环境使用。",
|
||||
validation: {
|
||||
nameRequired: "名称为必填项",
|
||||
engineTypeRequired: "引擎类型为必填项",
|
||||
|
||||
@@ -165,10 +165,15 @@
|
||||
:label="fieldLabel(field.name)"
|
||||
:name="`connection_config.${field.name}`"
|
||||
>
|
||||
<t-switch
|
||||
v-if="field.type === 'boolean'"
|
||||
v-model="form.connection_config[field.name]"
|
||||
/>
|
||||
<div v-if="field.type === 'boolean'" class="boolean-field">
|
||||
<t-switch v-model="form.connection_config[field.name]" />
|
||||
<div
|
||||
v-if="field.name === 'insecure_skip_verify' && form.connection_config[field.name]"
|
||||
class="field-warning"
|
||||
>
|
||||
{{ t('vectorStoreSettings.insecureSkipVerifyWarning') }}
|
||||
</div>
|
||||
</div>
|
||||
<t-input
|
||||
v-else-if="field.type === 'string' && field.sensitive"
|
||||
v-model="form.connection_config[field.name]"
|
||||
@@ -201,12 +206,20 @@
|
||||
<template v-if="showAdvanced">
|
||||
<template v-for="field in selectedType.index_fields" :key="field.name">
|
||||
<t-form-item :label="fieldLabel(field.name)" :name="`index_config.${field.name}`">
|
||||
<!-- Closed value set → dropdown (e.g. knn_engine) -->
|
||||
<t-select
|
||||
v-if="field.enum && field.enum.length"
|
||||
v-model="form.index_config[field.name]"
|
||||
:placeholder="field.default?.toString() || ''"
|
||||
>
|
||||
<t-option v-for="opt in field.enum" :key="opt" :value="opt" :label="opt" />
|
||||
</t-select>
|
||||
<t-input-number
|
||||
v-if="field.type === 'number'"
|
||||
v-else-if="field.type === 'number'"
|
||||
v-model="form.index_config[field.name]"
|
||||
:placeholder="field.default?.toString()"
|
||||
:min="1"
|
||||
:max="isReplicaField(field.name) ? 10 : 64"
|
||||
:min="field.min ?? 1"
|
||||
:max="field.max ?? (isReplicaField(field.name) ? 10 : 64)"
|
||||
theme="normal"
|
||||
style="width: 100%;"
|
||||
/>
|
||||
@@ -861,6 +874,23 @@ onMounted(async () => {
|
||||
white-space: pre-line;
|
||||
}
|
||||
|
||||
/* Wrap switch + warning in a vertical stack so the warning sits on its own
|
||||
line below the switch, independent of TDesign's form-item content flex
|
||||
(which is a nowrap row — margin-top alone has no visible effect there). */
|
||||
.boolean-field {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
.field-warning {
|
||||
margin-top: 8px;
|
||||
font-size: 12px;
|
||||
line-height: 1.4;
|
||||
color: var(--td-error-color, #d54941);
|
||||
}
|
||||
|
||||
.readonly-fields {
|
||||
padding: 10px 14px;
|
||||
background: var(--td-bg-color-secondarycontainer);
|
||||
|
||||
@@ -0,0 +1,50 @@
|
||||
package opensearch
|
||||
|
||||
import "context"
|
||||
|
||||
// AuditSink receives audit events emitted from within the driver at the
|
||||
// exact moment they occur (index provisioned, reindex executed). The driver
|
||||
// owns this abstraction so it imports no service package — the dependency
|
||||
// arrow stays one-way (service implements AuditSink; the driver only invokes
|
||||
// it). A nil sink is a no-op, so tests and the env-path (no audit service)
|
||||
// need no special casing.
|
||||
type AuditSink interface {
|
||||
// EmitIndexCreated fires once when the driver provisions a new k-NN
|
||||
// index. alias is the per-dimension alias (or the keyword-only index
|
||||
// name); dim is the embedding dimension (0 for the keyword-only index).
|
||||
EmitIndexCreated(ctx context.Context, alias string, dim int)
|
||||
// EmitReindexExecuted fires when CopyIndices finishes copying docs from
|
||||
// one index to another.
|
||||
EmitReindexExecuted(ctx context.Context, srcAlias, dstAlias string, docs int64)
|
||||
}
|
||||
|
||||
// nopSink is the null-object used when no sink is configured, so emit call
|
||||
// sites never need a nil check.
|
||||
type nopSink struct{}
|
||||
|
||||
func (nopSink) EmitIndexCreated(context.Context, string, int) {}
|
||||
func (nopSink) EmitReindexExecuted(context.Context, string, string, int64) {}
|
||||
|
||||
var _ AuditSink = nopSink{}
|
||||
|
||||
// Option configures a Repository at construction time.
|
||||
type Option func(*Repository)
|
||||
|
||||
// WithAuditSink injects an audit sink. A nil sink is ignored (the Repository
|
||||
// keeps its default no-op behavior).
|
||||
func WithAuditSink(s AuditSink) Option {
|
||||
return func(r *Repository) {
|
||||
if s != nil {
|
||||
r.sink = s
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// auditSink returns the configured sink, or a no-op if none was set (e.g. a
|
||||
// Repository built directly in tests, or constructed without WithAuditSink).
|
||||
func (r *Repository) auditSink() AuditSink {
|
||||
if r.sink == nil {
|
||||
return nopSink{}
|
||||
}
|
||||
return r.sink
|
||||
}
|
||||
@@ -0,0 +1,153 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"context"
|
||||
"strings"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// spySink records audit events for assertions.
|
||||
type spySink struct {
|
||||
mu sync.Mutex
|
||||
indexCreated []indexCreatedEvent
|
||||
reindex []reindexEvent
|
||||
}
|
||||
|
||||
type indexCreatedEvent struct {
|
||||
alias string
|
||||
dim int
|
||||
}
|
||||
type reindexEvent struct {
|
||||
src, dst string
|
||||
docs int64
|
||||
}
|
||||
|
||||
func (s *spySink) EmitIndexCreated(_ context.Context, alias string, dim int) {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
s.indexCreated = append(s.indexCreated, indexCreatedEvent{alias, dim})
|
||||
}
|
||||
|
||||
func (s *spySink) EmitReindexExecuted(_ context.Context, src, dst string, docs int64) {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
s.reindex = append(s.reindex, reindexEvent{src, dst, docs})
|
||||
}
|
||||
|
||||
// TestBuildInternalCfg_ReadsHNSWFields verifies the HNSW IndexConfig fields
|
||||
// flow into internalCfg, falling back to defaults when unset.
|
||||
func TestBuildInternalCfg_ReadsHNSWFields(t *testing.T) {
|
||||
t.Run("reads set fields", func(t *testing.T) {
|
||||
cfg, err := buildInternalCfg(&types.IndexConfig{
|
||||
HNSWM: 24,
|
||||
HNSWEFConstruction: 200,
|
||||
HNSWEFSearch: 128,
|
||||
KNNEngine: "faiss",
|
||||
NumberOfShards: 2,
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if cfg.hnswM != 24 || cfg.hnswEFConstruction != 200 || cfg.efSearch != 128 || cfg.knnEngine != "faiss" {
|
||||
t.Errorf("HNSW not wired: %+v", cfg)
|
||||
}
|
||||
if cfg.shards != 2 {
|
||||
t.Errorf("shards: want 2, got %d", cfg.shards)
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("defaults when unset", func(t *testing.T) {
|
||||
cfg, err := buildInternalCfg(&types.IndexConfig{})
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if cfg.hnswM != 16 || cfg.hnswEFConstruction != 100 || cfg.efSearch != 100 || cfg.knnEngine != "lucene" {
|
||||
t.Errorf("defaults not applied: %+v", cfg)
|
||||
}
|
||||
})
|
||||
|
||||
t.Run("nil config is all defaults", func(t *testing.T) {
|
||||
cfg, err := buildInternalCfg(nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if cfg.knnEngine != "lucene" || cfg.hnswM != 16 {
|
||||
t.Errorf("nil defaults wrong: %+v", cfg)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestBuildIndexMapping_ReflectsHNSWConfig verifies the end-to-end path
|
||||
// IndexConfig → buildInternalCfg → buildIndexMapping carries the operator's
|
||||
// HNSW values into the cluster mapping JSON (regression guard for the wire-
|
||||
// through, since defaults coincide with common values).
|
||||
func TestBuildIndexMapping_ReflectsHNSWConfig(t *testing.T) {
|
||||
cfg, err := buildInternalCfg(&types.IndexConfig{
|
||||
HNSWM: 24,
|
||||
HNSWEFConstruction: 200,
|
||||
HNSWEFSearch: 128,
|
||||
KNNEngine: "faiss",
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
body, err := buildIndexMapping(cfg, 768)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
s := string(body)
|
||||
for _, want := range []string{`"m":24`, `"ef_construction":200`, `"engine":"faiss"`, `"knn.algo_param.ef_search":128`} {
|
||||
if !strings.Contains(s, want) {
|
||||
t.Errorf("mapping JSON missing %q\n%s", want, s)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestWithAuditSink_SetsAndNilSafe verifies the functional option.
|
||||
func TestWithAuditSink_SetsAndNilSafe(t *testing.T) {
|
||||
spy := &spySink{}
|
||||
r := &Repository{}
|
||||
WithAuditSink(spy)(r)
|
||||
if r.sink != spy {
|
||||
t.Fatal("WithAuditSink did not set the sink")
|
||||
}
|
||||
// nil must not clobber an already-set sink
|
||||
WithAuditSink(nil)(r)
|
||||
if r.sink != spy {
|
||||
t.Fatal("WithAuditSink(nil) clobbered the sink")
|
||||
}
|
||||
}
|
||||
|
||||
// TestAuditSink_NopByDefault verifies a Repository with no sink does not panic
|
||||
// when the audit accessor is used (nopSink fallback).
|
||||
func TestAuditSink_NopByDefault(t *testing.T) {
|
||||
var _ AuditSink = nopSink{} // compile-time assertion
|
||||
r := &Repository{} // sink left nil
|
||||
// Must not panic.
|
||||
r.auditSink().EmitIndexCreated(context.Background(), "weknora_768", 768)
|
||||
r.auditSink().EmitReindexExecuted(context.Background(), "a", "b", 3)
|
||||
}
|
||||
|
||||
// TestAuditSink_EmitIndexCreated_OnEnsureReady verifies createIndexAndAlias
|
||||
// emits exactly one index-created event with the per-dim alias when a new
|
||||
// index is provisioned.
|
||||
func TestAuditSink_EmitIndexCreated_OnEnsureReady(t *testing.T) {
|
||||
repo, ts := newTestRepo(t, (&indexLifecycleHandler{}).ServeHTTP)
|
||||
defer ts.Close()
|
||||
spy := &spySink{}
|
||||
repo.sink = spy
|
||||
|
||||
if err := repo.ensureReady(context.Background(), 768); err != nil {
|
||||
t.Fatalf("ensureReady: %v", err)
|
||||
}
|
||||
if len(spy.indexCreated) != 1 {
|
||||
t.Fatalf("want 1 index_created event, got %d", len(spy.indexCreated))
|
||||
}
|
||||
got := spy.indexCreated[0]
|
||||
if got.alias != "weknora_test_768" || got.dim != 768 {
|
||||
t.Errorf("event mismatch: %+v", got)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,107 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"sort"
|
||||
|
||||
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
|
||||
)
|
||||
|
||||
// BatchUpdateChunkEnabledStatus flips is_enabled for the given chunks. The
|
||||
// status map is grouped by value into one _update_by_query per distinct value
|
||||
// (mirrors the Qdrant grouping pattern), so the request body carries the
|
||||
// chunk ids via a terms filter and the new value via bound script params —
|
||||
// never per-chunk string interpolation.
|
||||
//
|
||||
// Targets the cross-dim <base>_* pattern: a chunk's embedding dimension is
|
||||
// not known here, and the same chunk_id is unique across the store's dim
|
||||
// indices + the keyword-only index.
|
||||
func (r *Repository) BatchUpdateChunkEnabledStatus(ctx context.Context, chunkStatusMap map[string]bool) error {
|
||||
if len(chunkStatusMap) == 0 {
|
||||
return nil
|
||||
}
|
||||
groups := map[bool][]string{}
|
||||
for id, v := range chunkStatusMap {
|
||||
groups[v] = append(groups[v], id)
|
||||
}
|
||||
// Deterministic order (false then true) for predictable behavior/tests.
|
||||
for _, v := range []bool{false, true} {
|
||||
ids := groups[v]
|
||||
if len(ids) == 0 {
|
||||
continue
|
||||
}
|
||||
sort.Strings(ids)
|
||||
if err := r.updateByQueryScript(ctx, ids,
|
||||
"ctx._source.is_enabled = params.v", map[string]any{"v": v}); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// BatchUpdateChunkTagID sets tag_id for the given chunks, grouped by tag.
|
||||
func (r *Repository) BatchUpdateChunkTagID(ctx context.Context, chunkTagMap map[string]string) error {
|
||||
if len(chunkTagMap) == 0 {
|
||||
return nil
|
||||
}
|
||||
groups := map[string][]string{}
|
||||
for id, tag := range chunkTagMap {
|
||||
groups[tag] = append(groups[tag], id)
|
||||
}
|
||||
tags := make([]string, 0, len(groups))
|
||||
for tag := range groups {
|
||||
tags = append(tags, tag)
|
||||
}
|
||||
sort.Strings(tags)
|
||||
for _, tag := range tags {
|
||||
ids := groups[tag]
|
||||
sort.Strings(ids)
|
||||
if err := r.updateByQueryScript(ctx, ids,
|
||||
"ctx._source.tag_id = params.v", map[string]any{"v": tag}); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// updateByQueryScript runs an _update_by_query over the cross-dim <base>_*
|
||||
// pattern, matching the given chunk ids via a terms filter and applying a
|
||||
// constant Painless source with caller values flowing only through bound
|
||||
// params (Painless-injection-safe).
|
||||
func (r *Repository) updateByQueryScript(
|
||||
ctx context.Context, chunkIDs []string, source string, params map[string]any,
|
||||
) error {
|
||||
body, err := json.Marshal(map[string]any{
|
||||
"query": map[string]any{
|
||||
"terms": map[string]any{"chunk_id": chunkIDs},
|
||||
},
|
||||
"script": map[string]any{
|
||||
"lang": "painless",
|
||||
"source": source,
|
||||
"params": params,
|
||||
},
|
||||
})
|
||||
if err != nil {
|
||||
return fmt.Errorf("opensearch: marshal update_by_query body: %w", err)
|
||||
}
|
||||
// Q2: UpdateByQueryParams.Refresh is *bool — the wire value "wait_for" is
|
||||
// not expressible via the typed SDK, so we force an immediate refresh.
|
||||
refresh := true
|
||||
resp, err := r.client.UpdateByQuery(ctx, osapi.UpdateByQueryReq{
|
||||
Indices: []string{r.baseIndex + "_*"},
|
||||
Body: bytes.NewReader(body),
|
||||
Params: osapi.UpdateByQueryParams{Refresh: &refresh},
|
||||
})
|
||||
if err != nil {
|
||||
return wrapTransport(err)
|
||||
}
|
||||
if resp == nil {
|
||||
return nil
|
||||
}
|
||||
defer drainAndClose(resp.Inspect().Response.Body)
|
||||
return inspectByQueryResponse(io.LimitReader(resp.Inspect().Response.Body, 16<<20))
|
||||
}
|
||||
@@ -0,0 +1,51 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"strings"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/logger"
|
||||
)
|
||||
|
||||
// inspectByQueryResponse parses an _update_by_query / _delete_by_query
|
||||
// response and surfaces partial failures without leaking cluster-side reason
|
||||
// strings (which may embed document content). Mirrors inspectBulkResponse:
|
||||
// the full reason goes to the debug log only; the returned error carries the
|
||||
// bounded id + type. A non-zero version_conflicts count with no hard failures
|
||||
// is logged as a warning but not treated as an error.
|
||||
func inspectByQueryResponse(body io.Reader) error {
|
||||
var r struct {
|
||||
VersionConflicts int `json:"version_conflicts"`
|
||||
Failures []struct {
|
||||
ID string `json:"id"`
|
||||
Cause struct {
|
||||
Type string `json:"type"`
|
||||
Reason string `json:"reason"`
|
||||
} `json:"cause"`
|
||||
} `json:"failures"`
|
||||
}
|
||||
if err := json.NewDecoder(body).Decode(&r); err != nil {
|
||||
return fmt.Errorf("opensearch: parse by-query response: %w", ErrTransport)
|
||||
}
|
||||
log := logger.GetLogger(context.Background())
|
||||
if len(r.Failures) == 0 {
|
||||
if r.VersionConflicts > 0 {
|
||||
log.Warnf("[OpenSearch] by-query had %d version conflicts (proceeded)", r.VersionConflicts)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
var msgs []string
|
||||
for _, f := range r.Failures {
|
||||
// Full reason → debug log only (may contain document content).
|
||||
log.Debugf("[OpenSearch] by-query failure: id=%s type=%s reason=%s",
|
||||
f.ID, f.Cause.Type, f.Cause.Reason)
|
||||
if len(msgs) < 5 {
|
||||
msgs = append(msgs, fmt.Sprintf("[%s] %s", f.ID, f.Cause.Type))
|
||||
}
|
||||
}
|
||||
return fmt.Errorf("opensearch: by-query partial failure (%d failed, first 5: %s): %w",
|
||||
len(r.Failures), strings.Join(msgs, "; "), ErrTransport)
|
||||
}
|
||||
@@ -15,16 +15,15 @@ type internalCfg struct {
|
||||
}
|
||||
|
||||
// buildInternalCfg projects IndexConfig to the driver-internal view,
|
||||
// substituting defaults for unset fields. Validation of value ranges
|
||||
// (e.g. hnsw_m / ef_construction caps) is a service-layer concern handled
|
||||
// elsewhere; this function applies defaults only and never rejects.
|
||||
// substituting defaults for unset (zero / empty) fields. Validation of value
|
||||
// ranges (e.g. hnsw_m / ef_construction caps) is a service-layer concern
|
||||
// handled elsewhere (validateOpenSearchIndexConfig at CreateStore); this
|
||||
// function applies defaults only and never rejects. The env-path bypasses
|
||||
// service validation entirely, so the defaults below are its safety net.
|
||||
//
|
||||
// OpenSearch-specific overrides (knn_engine, hnsw_m, hnsw_ef_construction,
|
||||
// hnsw_ef_search) are intentionally NOT read from IndexConfig here:
|
||||
// IndexConfig is a schema shared across all drivers, and adding OpenSearch-
|
||||
// specific fields would surface them in the shared VectorStoreFieldInfo
|
||||
// form visible to every driver's create UI. Wiring those fields through to
|
||||
// IndexConfig is a follow-up that lands alongside the activation switch.
|
||||
// The OpenSearch-specific HNSW fields (knn_engine, hnsw_m,
|
||||
// hnsw_ef_construction, hnsw_ef_search) are read here. They are omitempty on
|
||||
// IndexConfig, so they do not affect other drivers' serialized config.
|
||||
func buildInternalCfg(c *types.IndexConfig) (internalCfg, error) {
|
||||
cfg := internalCfg{
|
||||
shards: 4, // matches the keyword-index default upstream
|
||||
@@ -43,5 +42,17 @@ func buildInternalCfg(c *types.IndexConfig) (internalCfg, error) {
|
||||
if c.NumberOfReplicas > 0 {
|
||||
cfg.replicas = c.NumberOfReplicas
|
||||
}
|
||||
if c.KNNEngine != "" {
|
||||
cfg.knnEngine = c.KNNEngine
|
||||
}
|
||||
if c.HNSWM > 0 {
|
||||
cfg.hnswM = c.HNSWM
|
||||
}
|
||||
if c.HNSWEFConstruction > 0 {
|
||||
cfg.hnswEFConstruction = c.HNSWEFConstruction
|
||||
}
|
||||
if c.HNSWEFSearch > 0 {
|
||||
cfg.efSearch = c.HNSWEFSearch
|
||||
}
|
||||
return cfg, nil
|
||||
}
|
||||
|
||||
194
internal/application/repository/retriever/opensearch/copy.go
Normal file
194
internal/application/repository/retriever/opensearch/copy.go
Normal file
@@ -0,0 +1,194 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"strings"
|
||||
|
||||
"github.com/google/uuid"
|
||||
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/logger"
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// copyBatchSize is the pagination size for the source scan. Kept under the
|
||||
// BatchSave per-call cap so each copied page is a single bulk request.
|
||||
const copyBatchSize = 500
|
||||
|
||||
// copySourceDoc is the full _source read during CopyIndices — it includes the
|
||||
// embedding vector and is_recommended, which the retrieve-path hit struct
|
||||
// omits because retrieval does not need them.
|
||||
type copySourceDoc struct {
|
||||
Content string `json:"content"`
|
||||
SourceID string `json:"source_id"`
|
||||
SourceType int `json:"source_type"`
|
||||
ChunkID string `json:"chunk_id"`
|
||||
KnowledgeID string `json:"knowledge_id"`
|
||||
KnowledgeBaseID string `json:"knowledge_base_id"`
|
||||
TagID string `json:"tag_id"`
|
||||
IsEnabled bool `json:"is_enabled"`
|
||||
IsRecommended bool `json:"is_recommended"`
|
||||
Embedding []float32 `json:"embedding"`
|
||||
}
|
||||
|
||||
// transformSourceID mirrors the sibling drivers' source_id remap:
|
||||
// - regular chunk (source_id == chunk_id) → target chunk id
|
||||
// - generated question (source_id == "<chunk>-<q>") → "<targetChunk>-<q>"
|
||||
// - anything else → a fresh uuid
|
||||
func transformSourceID(sourceID, chunkID, targetChunkID string) string {
|
||||
switch {
|
||||
case sourceID == chunkID:
|
||||
return targetChunkID
|
||||
case strings.HasPrefix(sourceID, chunkID+"-"):
|
||||
return targetChunkID + "-" + strings.TrimPrefix(sourceID, chunkID+"-")
|
||||
default:
|
||||
return uuid.New().String()
|
||||
}
|
||||
}
|
||||
|
||||
// CopyIndices copies all docs of one knowledge base into another (within the
|
||||
// same store) by scanning the source and re-saving via BatchSave — mirroring
|
||||
// the Elasticsearch / Qdrant drivers (search→BatchSave), which yields the
|
||||
// source_id transformation and dim/keyword routing for free. Runs
|
||||
// synchronously and paginates; the large-batch background-task path is a
|
||||
// later change.
|
||||
//
|
||||
// NOTE: from/size pagination is bounded by the index's max_result_window
|
||||
// (default 10000). Copies larger than that require the scroll-based async
|
||||
// path (a later change).
|
||||
func (r *Repository) CopyIndices(
|
||||
ctx context.Context,
|
||||
sourceKnowledgeBaseID string,
|
||||
sourceToTargetKBIDMap map[string]string, // keyed by source knowledge_id (mirrors sibling drivers)
|
||||
sourceToTargetChunkIDMap map[string]string,
|
||||
targetKnowledgeBaseID string,
|
||||
dimension int,
|
||||
knowledgeType string,
|
||||
) error {
|
||||
log := logger.GetLogger(ctx)
|
||||
if len(sourceToTargetChunkIDMap) == 0 {
|
||||
log.Warn("[OpenSearch] CopyIndices: empty chunk mapping, skipping")
|
||||
return nil
|
||||
}
|
||||
if dimension <= 0 {
|
||||
return fmt.Errorf("opensearch: CopyIndices requires dim > 0, got %d: %w",
|
||||
dimension, ErrDimensionMismatch)
|
||||
}
|
||||
if err := r.ensureReady(ctx, dimension); err != nil {
|
||||
return err
|
||||
}
|
||||
alias := r.indexAlias(dimension)
|
||||
|
||||
var total int64
|
||||
for from := 0; ; from += copyBatchSize {
|
||||
docs, err := r.copyScanBatch(ctx, alias, sourceKnowledgeBaseID, from, copyBatchSize)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if len(docs) == 0 {
|
||||
break
|
||||
}
|
||||
infos := make([]*types.IndexInfo, 0, len(docs))
|
||||
embMap := make(map[string][]float32, len(docs))
|
||||
enabledMap := make(map[string]bool, len(docs))
|
||||
for i := range docs {
|
||||
d := &docs[i]
|
||||
targetChunkID, ok := sourceToTargetChunkIDMap[d.ChunkID]
|
||||
if !ok {
|
||||
log.Warnf("[OpenSearch] CopyIndices: source chunk %s not mapped, skipping", d.ChunkID)
|
||||
continue
|
||||
}
|
||||
targetKnowledgeID, ok := sourceToTargetKBIDMap[d.KnowledgeID]
|
||||
if !ok {
|
||||
log.Warnf("[OpenSearch] CopyIndices: source knowledge %s not mapped, skipping", d.KnowledgeID)
|
||||
continue
|
||||
}
|
||||
targetSourceID := transformSourceID(d.SourceID, d.ChunkID, targetChunkID)
|
||||
if len(d.Embedding) > 0 {
|
||||
// BatchSave looks up embeddings by SourceID (lookupEmbedding),
|
||||
// so key by the target source id — not the chunk id, which is
|
||||
// the Elasticsearch driver's convention.
|
||||
embMap[targetSourceID] = d.Embedding
|
||||
}
|
||||
enabledMap[targetChunkID] = d.IsEnabled
|
||||
infos = append(infos, &types.IndexInfo{
|
||||
Content: d.Content,
|
||||
SourceID: targetSourceID,
|
||||
SourceType: types.SourceType(d.SourceType),
|
||||
ChunkID: targetChunkID,
|
||||
KnowledgeID: targetKnowledgeID,
|
||||
KnowledgeBaseID: targetKnowledgeBaseID,
|
||||
KnowledgeType: knowledgeType,
|
||||
TagID: d.TagID,
|
||||
IsEnabled: d.IsEnabled,
|
||||
IsRecommended: d.IsRecommended,
|
||||
})
|
||||
}
|
||||
if len(infos) > 0 {
|
||||
params := map[string]any{
|
||||
"embedding": embMap,
|
||||
"chunk_enabled": enabledMap,
|
||||
}
|
||||
if err := r.BatchSave(ctx, infos, params); err != nil {
|
||||
return fmt.Errorf("opensearch: CopyIndices batch save: %w", err)
|
||||
}
|
||||
total += int64(len(infos))
|
||||
}
|
||||
if len(docs) < copyBatchSize {
|
||||
break
|
||||
}
|
||||
}
|
||||
log.Infof("[OpenSearch] CopyIndices: copied %d docs (KB %s → %s, dim=%d)",
|
||||
total, sourceKnowledgeBaseID, targetKnowledgeBaseID, dimension)
|
||||
r.auditSink().EmitReindexExecuted(ctx, alias, alias, total)
|
||||
return nil
|
||||
}
|
||||
|
||||
// copyScanBatch reads one page of docs belonging to sourceKB from the per-dim
|
||||
// index, decoding the full _source (including the embedding vector).
|
||||
func (r *Repository) copyScanBatch(
|
||||
ctx context.Context, index, sourceKB string, from, size int,
|
||||
) ([]copySourceDoc, error) {
|
||||
body, err := json.Marshal(map[string]any{
|
||||
"from": from,
|
||||
"size": size,
|
||||
"query": map[string]any{
|
||||
"bool": map[string]any{
|
||||
"filter": []any{
|
||||
map[string]any{"term": map[string]any{"knowledge_base_id": sourceKB}},
|
||||
},
|
||||
},
|
||||
},
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("opensearch: marshal copy scan body: %w", err)
|
||||
}
|
||||
req := osapi.SearchReq{Indices: []string{index}, Body: bytes.NewReader(body)}
|
||||
resp, err := r.client.Search(ctx, &req)
|
||||
if err != nil {
|
||||
if isNotFound(err) {
|
||||
return nil, fmt.Errorf("opensearch: index %s missing: %w", index, ErrIndexNotFound)
|
||||
}
|
||||
return nil, wrapTransport(err)
|
||||
}
|
||||
defer drainAndClose(resp.Inspect().Response.Body)
|
||||
var parsed struct {
|
||||
Hits struct {
|
||||
Hits []struct {
|
||||
Source copySourceDoc `json:"_source"`
|
||||
} `json:"hits"`
|
||||
} `json:"hits"`
|
||||
}
|
||||
if err := json.NewDecoder(io.LimitReader(resp.Inspect().Response.Body, 64<<20)).Decode(&parsed); err != nil {
|
||||
return nil, fmt.Errorf("opensearch: parse copy scan response: %w", ErrTransport)
|
||||
}
|
||||
out := make([]copySourceDoc, len(parsed.Hits.Hits))
|
||||
for i, h := range parsed.Hits.Hits {
|
||||
out[i] = h.Source
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
@@ -0,0 +1,193 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"context"
|
||||
"io"
|
||||
"net/http"
|
||||
"strings"
|
||||
"sync"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestTransformSourceID covers the generated-question / regular / fallback
|
||||
// branches that CopyIndices uses to remap source_id.
|
||||
func TestTransformSourceID(t *testing.T) {
|
||||
t.Run("regular chunk uses target chunk id", func(t *testing.T) {
|
||||
if got := transformSourceID("chunk1", "chunk1", "tgt1"); got != "tgt1" {
|
||||
t.Errorf("want tgt1, got %s", got)
|
||||
}
|
||||
})
|
||||
t.Run("generated question preserves question id", func(t *testing.T) {
|
||||
if got := transformSourceID("chunk1-q7", "chunk1", "tgt1"); got != "tgt1-q7" {
|
||||
t.Errorf("want tgt1-q7, got %s", got)
|
||||
}
|
||||
})
|
||||
t.Run("unrelated source id gets fresh uuid", func(t *testing.T) {
|
||||
got := transformSourceID("totally-different", "chunk1", "tgt1")
|
||||
if got == "totally-different" || got == "tgt1" || len(got) != 36 {
|
||||
t.Errorf("want fresh uuid, got %q", got)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// TestCopyIndices_EmptyMapping_NoOp verifies an empty chunk map short-circuits
|
||||
// before any HTTP call.
|
||||
func TestCopyIndices_EmptyMapping_NoOp(t *testing.T) {
|
||||
repo, ts := newTestRepo(t, func(w http.ResponseWriter, r *http.Request) {
|
||||
t.Errorf("unexpected HTTP call: %s %s", r.Method, r.URL.Path)
|
||||
})
|
||||
defer ts.Close()
|
||||
err := repo.CopyIndices(context.Background(), "kbSrc", map[string]string{}, map[string]string{}, "kbDst", 768, "manual")
|
||||
if err != nil {
|
||||
t.Fatalf("want nil, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestCopyIndices_ScanThenBatchSave verifies the search→BatchSave path:
|
||||
// remaps IDs, keys the embedding by the *target source id* (OpenSearch
|
||||
// BatchSave's lookup key), and emits one reindex audit event.
|
||||
func TestCopyIndices_ScanThenBatchSave(t *testing.T) {
|
||||
var (
|
||||
mu sync.Mutex
|
||||
bulkBody string
|
||||
searchCnt int
|
||||
)
|
||||
handler := func(w http.ResponseWriter, r *http.Request) {
|
||||
switch {
|
||||
case r.Method == http.MethodHead:
|
||||
w.WriteHeader(http.StatusOK) // alias exists → ensureReady no-op
|
||||
case strings.Contains(r.URL.Path, "_search"):
|
||||
mu.Lock()
|
||||
searchCnt++
|
||||
first := searchCnt == 1
|
||||
mu.Unlock()
|
||||
if first {
|
||||
_, _ = w.Write([]byte(`{"hits":{"hits":[
|
||||
{"_source":{"content":"c","source_id":"srcChunk","source_type":1,"chunk_id":"srcChunk","knowledge_id":"srcKnow","knowledge_base_id":"kbSrc","tag_id":"t","is_enabled":true,"is_recommended":false,"embedding":[0.1,0.2,0.3]}}
|
||||
]}}`))
|
||||
} else {
|
||||
_, _ = w.Write([]byte(`{"hits":{"hits":[]}}`))
|
||||
}
|
||||
case strings.HasSuffix(r.URL.Path, "/_bulk"):
|
||||
b, _ := io.ReadAll(r.Body)
|
||||
mu.Lock()
|
||||
bulkBody = string(b)
|
||||
mu.Unlock()
|
||||
_, _ = w.Write([]byte(`{"errors":false,"items":[]}`))
|
||||
default:
|
||||
_, _ = w.Write([]byte(`{}`))
|
||||
}
|
||||
}
|
||||
repo, ts := newTestRepo(t, handler)
|
||||
defer ts.Close()
|
||||
spy := &spySink{}
|
||||
repo.sink = spy
|
||||
|
||||
err := repo.CopyIndices(context.Background(), "kbSrc",
|
||||
map[string]string{"srcKnow": "tgtKnow"}, // knowledge_id remap (sourceToTargetKBIDMap is keyed by knowledge_id, mirroring ES)
|
||||
map[string]string{"srcChunk": "tgtChunk"},
|
||||
"kbDst", 768, "manual")
|
||||
if err != nil {
|
||||
t.Fatalf("CopyIndices: %v", err)
|
||||
}
|
||||
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
if bulkBody == "" {
|
||||
t.Fatal("no bulk request captured")
|
||||
}
|
||||
// Target IDs present, source IDs gone from the written doc.
|
||||
for _, want := range []string{`"chunk_id":"tgtChunk"`, `"knowledge_id":"tgtKnow"`, `"knowledge_base_id":"kbDst"`, `"source_id":"tgtChunk"`} {
|
||||
if !strings.Contains(bulkBody, want) {
|
||||
t.Errorf("bulk body missing %q\n%s", want, bulkBody)
|
||||
}
|
||||
}
|
||||
if strings.Contains(bulkBody, `"knowledge_base_id":"kbSrc"`) {
|
||||
t.Errorf("bulk body leaked source KB id\n%s", bulkBody)
|
||||
}
|
||||
// Embedding written (keyed by target source id internally).
|
||||
if !strings.Contains(bulkBody, "0.1") {
|
||||
t.Errorf("embedding not written\n%s", bulkBody)
|
||||
}
|
||||
if len(spy.reindex) != 1 || spy.reindex[0].docs != 1 {
|
||||
t.Errorf("want 1 reindex event with docs=1, got %+v", spy.reindex)
|
||||
}
|
||||
}
|
||||
|
||||
// TestBatchUpdateChunkEnabledStatus_GroupedUpdateByQuery verifies the status
|
||||
// map is grouped by value into one _update_by_query per distinct value, each
|
||||
// passing chunk ids via terms + the new value via bound script params.
|
||||
func TestBatchUpdateChunkEnabledStatus_GroupedUpdateByQuery(t *testing.T) {
|
||||
var (
|
||||
mu sync.Mutex
|
||||
bodies []string
|
||||
)
|
||||
handler := func(w http.ResponseWriter, r *http.Request) {
|
||||
if strings.Contains(r.URL.Path, "_update_by_query") {
|
||||
b, _ := io.ReadAll(r.Body)
|
||||
mu.Lock()
|
||||
bodies = append(bodies, string(b))
|
||||
mu.Unlock()
|
||||
_, _ = w.Write([]byte(`{"updated":1,"version_conflicts":0,"failures":[]}`))
|
||||
return
|
||||
}
|
||||
_, _ = w.Write([]byte(`{}`))
|
||||
}
|
||||
repo, ts := newTestRepo(t, handler)
|
||||
defer ts.Close()
|
||||
|
||||
err := repo.BatchUpdateChunkEnabledStatus(context.Background(), map[string]bool{
|
||||
"c1": true, "c2": false, "c3": true,
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("BatchUpdateChunkEnabledStatus: %v", err)
|
||||
}
|
||||
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
if len(bodies) != 2 {
|
||||
t.Fatalf("want 2 grouped update_by_query calls (true/false), got %d: %v", len(bodies), bodies)
|
||||
}
|
||||
joined := strings.Join(bodies, "\n")
|
||||
for _, want := range []string{"c1", "c2", "c3", "is_enabled", "params"} {
|
||||
if !strings.Contains(joined, want) {
|
||||
t.Errorf("update_by_query bodies missing %q\n%s", want, joined)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBatchUpdateChunkEnabledStatus_Empty_NoOp(t *testing.T) {
|
||||
repo, ts := newTestRepo(t, func(w http.ResponseWriter, r *http.Request) {
|
||||
t.Errorf("unexpected HTTP call: %s %s", r.Method, r.URL.Path)
|
||||
})
|
||||
defer ts.Close()
|
||||
if err := repo.BatchUpdateChunkEnabledStatus(context.Background(), map[string]bool{}); err != nil {
|
||||
t.Fatalf("want nil, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestInspectByQueryResponse covers the success path and the failure path,
|
||||
// asserting cluster-side reason text is NOT surfaced in the returned error.
|
||||
func TestInspectByQueryResponse(t *testing.T) {
|
||||
t.Run("clean response", func(t *testing.T) {
|
||||
body := strings.NewReader(`{"updated":5,"version_conflicts":0,"failures":[]}`)
|
||||
if err := inspectByQueryResponse(body); err != nil {
|
||||
t.Fatalf("want nil, got %v", err)
|
||||
}
|
||||
})
|
||||
t.Run("failures do not leak reason", func(t *testing.T) {
|
||||
body := strings.NewReader(`{"updated":1,"version_conflicts":0,"failures":[
|
||||
{"id":"c9","cause":{"type":"version_conflict_engine_exception","reason":"SECRET document body leaked here"}}
|
||||
]}`)
|
||||
err := inspectByQueryResponse(body)
|
||||
if err == nil {
|
||||
t.Fatal("want error for failures, got nil")
|
||||
}
|
||||
if strings.Contains(err.Error(), "SECRET") {
|
||||
t.Errorf("error leaked cluster reason: %v", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "version_conflict_engine_exception") {
|
||||
t.Errorf("error should surface bounded type: %v", err)
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,23 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"context"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// TestConnection verifies an OpenSearch cluster is reachable, runs a
|
||||
// supported version, and has the k-NN plugin installed on every node. It is
|
||||
// the connectivity probe used by the VectorStore service's CreateStore
|
||||
// health-check (the driver's unexported probes are reused here). Returns a
|
||||
// wrapped sentinel error on failure; nil on success.
|
||||
func TestConnection(ctx context.Context, cfg *types.ConnectionConfig) error {
|
||||
client, err := NewOpenSearchClient(cfg)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if err := probeVersion(ctx, client); err != nil {
|
||||
return err
|
||||
}
|
||||
return probeKNNPlugin(ctx, client)
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
package opensearch
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
|
||||
osapi "github.com/opensearch-project/opensearch-go/v4/opensearchapi"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// clusterHandler serves both GET / (version info) and /_cat/plugins, so the
|
||||
// full TestConnection probe (version + k-NN plugin) can run end to end.
|
||||
func clusterHandler(distribution, number string, plugins []osapi.CatPluginResp) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
if r.URL.Path == "/_cat/plugins" {
|
||||
_ = json.NewEncoder(w).Encode(plugins)
|
||||
return
|
||||
}
|
||||
_, _ = w.Write([]byte(`{"version":{"distribution":"` + distribution + `","number":"` + number + `"}}`))
|
||||
}
|
||||
}
|
||||
|
||||
func TestTestConnection_Success(t *testing.T) {
|
||||
ts := httptest.NewServer(clusterHandler("opensearch", "3.3.2", []osapi.CatPluginResp{
|
||||
{Name: "node-1", Component: "opensearch-knn"},
|
||||
}))
|
||||
defer ts.Close()
|
||||
if err := TestConnection(context.Background(), &types.ConnectionConfig{Addr: ts.URL}); err != nil {
|
||||
t.Errorf("healthy cluster: want nil, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestTestConnection_RejectsElasticsearch(t *testing.T) {
|
||||
ts := httptest.NewServer(clusterHandler("elasticsearch", "8.10.4", nil))
|
||||
defer ts.Close()
|
||||
if err := TestConnection(context.Background(), &types.ConnectionConfig{Addr: ts.URL}); err == nil {
|
||||
t.Error("elasticsearch cluster should be rejected")
|
||||
}
|
||||
}
|
||||
|
||||
func TestTestConnection_EmptyAddr(t *testing.T) {
|
||||
if err := TestConnection(context.Background(), &types.ConnectionConfig{}); err == nil {
|
||||
t.Error("empty addr should be rejected")
|
||||
}
|
||||
}
|
||||
@@ -77,6 +77,11 @@ func (r *Repository) createIndexAndAlias(ctx context.Context, dim int) error {
|
||||
}
|
||||
return fmt.Errorf("put alias %s → %s: %w", alias, realIndex, err)
|
||||
}
|
||||
if indexCreated {
|
||||
// Emit only when we actually provisioned the index (not when a
|
||||
// concurrent writer / existing index short-circuited above).
|
||||
r.auditSink().EmitIndexCreated(ctx, alias, dim)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -217,6 +222,7 @@ func (r *Repository) ensureKeywordsIndex(ctx context.Context) error {
|
||||
r.keywordsErr = err
|
||||
return err
|
||||
}
|
||||
created := false
|
||||
if err := r.indicesCreate(ctx, name, body); err != nil {
|
||||
if !isAlreadyExistsError(err) {
|
||||
r.keywordsErr = err
|
||||
@@ -224,9 +230,15 @@ func (r *Repository) ensureKeywordsIndex(ctx context.Context) error {
|
||||
}
|
||||
// resource_already_exists_exception — race with concurrent process,
|
||||
// treat as success.
|
||||
} else {
|
||||
created = true
|
||||
}
|
||||
r.keywordsReady = true
|
||||
r.keywordsErr = nil
|
||||
if created {
|
||||
// dim=0 marks the dim-less keyword-only index.
|
||||
r.auditSink().EmitIndexCreated(ctx, name, 0)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
|
||||
@@ -64,6 +64,11 @@ type Repository struct {
|
||||
keywordsMu sync.Mutex
|
||||
keywordsReady bool
|
||||
keywordsErr error
|
||||
|
||||
// sink receives audit events (index created / reindex executed). nil
|
||||
// means no auditing; use r.auditSink() to get a non-nil sink. Set via
|
||||
// WithAuditSink at construction.
|
||||
sink AuditSink
|
||||
}
|
||||
|
||||
// Compile-time interface satisfaction (Go best practice — keeps the build
|
||||
@@ -87,6 +92,10 @@ var _ interfaces.RetrieveEngineRepository = (*Repository)(nil)
|
||||
// indexCfg is optional — pass nil to use env var (OPENSEARCH_INDEX) or
|
||||
// default ("weknora") values.
|
||||
//
|
||||
// Optional behavior is configured via functional options (e.g.
|
||||
// WithAuditSink). Passing no options keeps audit emission as a no-op, so the
|
||||
// env-path and tests need no extra wiring.
|
||||
//
|
||||
// Returns a typed sentinel error wrapped with %w; callers translate to
|
||||
// AppError at the engine-factory boundary.
|
||||
func NewRepository(
|
||||
@@ -94,6 +103,7 @@ func NewRepository(
|
||||
client *osapi.Client,
|
||||
storeID string,
|
||||
indexCfg *types.IndexConfig,
|
||||
opts ...Option,
|
||||
) (interfaces.RetrieveEngineRepository, error) {
|
||||
log := logger.GetLogger(ctx)
|
||||
|
||||
@@ -135,6 +145,9 @@ func NewRepository(
|
||||
once: make(map[int]*sync.Once),
|
||||
initErr: make(map[int]error),
|
||||
}
|
||||
for _, opt := range opts {
|
||||
opt(r)
|
||||
}
|
||||
log.Infof("[OpenSearch] repository ready (baseIndex=%s, knn_engine=%s, hnsw_m=%d)",
|
||||
base, icfg.knnEngine, icfg.hnswM)
|
||||
return r, nil
|
||||
|
||||
@@ -1123,34 +1123,14 @@ func TestNewRepository_AcceptsLongStoreID(t *testing.T) {
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Stub coverage — every stub returns the not-enabled sentinel
|
||||
// Stub coverage — remaining stubs return the not-enabled sentinel
|
||||
//
|
||||
// CopyIndices / BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID are now
|
||||
// implemented (see copy_bulk_test.go for their behavioral tests); their stub
|
||||
// assertions were removed. EstimateStorageSize keeps its conservative
|
||||
// lower-bound until the real _stats-based implementation lands.
|
||||
// ============================================================================
|
||||
|
||||
func TestStub_CopyIndices_ReturnsFeatureNotEnabled(t *testing.T) {
|
||||
t.Parallel()
|
||||
r := &Repository{}
|
||||
err := r.CopyIndices(context.Background(), "kb1", nil, nil, "kb2", 768, "")
|
||||
if !errors.Is(err, ErrFeatureNotEnabled) {
|
||||
t.Errorf("CopyIndices: want ErrFeatureNotEnabled, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestStub_BatchUpdateChunkEnabledStatus_ReturnsFeatureNotEnabled(t *testing.T) {
|
||||
t.Parallel()
|
||||
r := &Repository{}
|
||||
if err := r.BatchUpdateChunkEnabledStatus(context.Background(), nil); !errors.Is(err, ErrFeatureNotEnabled) {
|
||||
t.Errorf("want ErrFeatureNotEnabled, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestStub_BatchUpdateChunkTagID_ReturnsFeatureNotEnabled(t *testing.T) {
|
||||
t.Parallel()
|
||||
r := &Repository{}
|
||||
if err := r.BatchUpdateChunkTagID(context.Background(), nil); !errors.Is(err, ErrFeatureNotEnabled) {
|
||||
t.Errorf("want ErrFeatureNotEnabled, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestStub_EstimateStorageSize_EmptyZero_NonEmptyPositive(t *testing.T) {
|
||||
t.Parallel()
|
||||
r := &Repository{}
|
||||
|
||||
@@ -6,43 +6,12 @@ import (
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// This file holds the remaining stubs for methods whose real
|
||||
// implementation has not landed yet — the async / batch paths and the
|
||||
// rolling-reindex swap. Each stub returns ErrFeatureNotEnabled (or, for
|
||||
// EstimateStorageSize, a conservative lower-bound) so any accidental
|
||||
// invocation surfaces loudly. The driver as a whole is still gated dead
|
||||
// code (no registry / factory / env path mentions it); these stubs
|
||||
// disappear when their behaviours arrive in follow-up commits.
|
||||
|
||||
// CopyIndices: the async _reindex path with task polling for >10K-doc
|
||||
// batches arrives in a later change.
|
||||
func (r *Repository) CopyIndices(
|
||||
_ context.Context,
|
||||
_ string, // sourceKnowledgeBaseID
|
||||
_ map[string]string, // sourceToTargetKBIDMap
|
||||
_ map[string]string, // sourceToTargetChunkIDMap
|
||||
_ string, // targetKnowledgeBaseID
|
||||
_ int, // dimension
|
||||
_ string, // knowledgeType
|
||||
) error {
|
||||
return ErrFeatureNotEnabled
|
||||
}
|
||||
|
||||
// BatchUpdateChunkEnabledStatus: the _update_by_query path arrives in a
|
||||
// later change.
|
||||
func (r *Repository) BatchUpdateChunkEnabledStatus(
|
||||
_ context.Context, _ map[string]bool,
|
||||
) error {
|
||||
return ErrFeatureNotEnabled
|
||||
}
|
||||
|
||||
// BatchUpdateChunkTagID: the _update_by_query path arrives in a later
|
||||
// change.
|
||||
func (r *Repository) BatchUpdateChunkTagID(
|
||||
_ context.Context, _ map[string]string,
|
||||
) error {
|
||||
return ErrFeatureNotEnabled
|
||||
}
|
||||
// This file holds the remaining stubs for methods whose real implementation
|
||||
// has not landed yet — the rolling-reindex swap (swapToVersion) and the
|
||||
// precise storage estimate. CopyIndices (copy.go), BatchUpdateChunkEnabledStatus
|
||||
// and BatchUpdateChunkTagID (bulk_update.go) are now implemented. Each
|
||||
// remaining stub returns ErrFeatureNotEnabled (or, for EstimateStorageSize, a
|
||||
// conservative lower-bound) so any accidental invocation surfaces loudly.
|
||||
|
||||
// EstimateStorageSize: the real implementation that reads cluster
|
||||
// `_stats` for the per-dim alias arrives in a later change. For now we
|
||||
|
||||
@@ -12,12 +12,10 @@ import (
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// newOpenSearchClient builds a TLS-hardened, pool-tuned *osapi.Client for
|
||||
// NewOpenSearchClient builds a TLS-hardened, pool-tuned *osapi.Client for
|
||||
// the OpenSearch driver. The caller wires it into the registry from the
|
||||
// env path (container) and the DB-store path (engine factory); the
|
||||
// Repository constructor itself receives the pre-built client. While the
|
||||
// driver is still gated dead code, no code path here is reachable from
|
||||
// production — the activation switch lands in a later change.
|
||||
// Repository constructor itself receives the pre-built client.
|
||||
//
|
||||
// TLS posture:
|
||||
// - MinVersion: TLS 1.2 (TLS 1.3 negotiated when both ends support).
|
||||
@@ -33,7 +31,7 @@ import (
|
||||
// - IdleConnTimeout: 90s (typical LB keep-alive)
|
||||
// - ResponseHeaderTimeout: 30s (per-request safety net)
|
||||
// - ExpectContinueTimeout: 1s
|
||||
func newOpenSearchClient(cfg *types.ConnectionConfig) (*osapi.Client, error) {
|
||||
func NewOpenSearchClient(cfg *types.ConnectionConfig) (*osapi.Client, error) {
|
||||
if cfg == nil || cfg.Addr == "" {
|
||||
return nil, fmt.Errorf("opensearch: ConnectionConfig.Addr required: %w", ErrConfigInvalid)
|
||||
}
|
||||
|
||||
@@ -14,11 +14,11 @@ import (
|
||||
// transport error later.
|
||||
func TestNewOpenSearchClient_RejectsEmptyAddr(t *testing.T) {
|
||||
t.Parallel()
|
||||
_, err := newOpenSearchClient(&types.ConnectionConfig{Addr: ""})
|
||||
_, err := NewOpenSearchClient(&types.ConnectionConfig{Addr: ""})
|
||||
if !errors.Is(err, ErrConfigInvalid) {
|
||||
t.Fatalf("empty addr: want ErrConfigInvalid, got %v", err)
|
||||
}
|
||||
_, err = newOpenSearchClient(nil)
|
||||
_, err = NewOpenSearchClient(nil)
|
||||
if !errors.Is(err, ErrConfigInvalid) {
|
||||
t.Fatalf("nil cfg: want ErrConfigInvalid, got %v", err)
|
||||
}
|
||||
@@ -30,13 +30,13 @@ func TestNewOpenSearchClient_RejectsEmptyAddr(t *testing.T) {
|
||||
// returns successfully.
|
||||
func TestNewOpenSearchClient_Succeeds_OnValidAddr(t *testing.T) {
|
||||
t.Parallel()
|
||||
client, err := newOpenSearchClient(&types.ConnectionConfig{
|
||||
client, err := NewOpenSearchClient(&types.ConnectionConfig{
|
||||
Addr: "https://opensearch.example.com:9200",
|
||||
Username: "admin",
|
||||
Password: "secret", // not a real password — wire-format only
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("newOpenSearchClient: %v", err)
|
||||
t.Fatalf("NewOpenSearchClient: %v", err)
|
||||
}
|
||||
if client == nil {
|
||||
t.Fatal("client must be non-nil on success")
|
||||
|
||||
@@ -67,6 +67,14 @@ func (s *vectorStoreService) CreateStore(ctx context.Context, store *types.Vecto
|
||||
return err
|
||||
}
|
||||
|
||||
// 2.6. Engine-specific index config validation (OpenSearch HNSW bounds).
|
||||
// Create-only: UpdateStore mutates just the name, so this is not re-run there.
|
||||
if store.EngineType == types.OpenSearchRetrieverEngineType {
|
||||
if err := validateOpenSearchIndexConfig(store.IndexConfig); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Duplicate check — DB stores
|
||||
endpoint := store.ConnectionConfig.GetEndpoint()
|
||||
indexName := store.IndexConfig.GetIndexNameOrDefault(store.EngineType)
|
||||
@@ -452,8 +460,51 @@ func validateConnectionConfig(engineType types.RetrieverEngineType, config types
|
||||
if config.Database == "" {
|
||||
return errors.NewValidationError("database is required for doris")
|
||||
}
|
||||
case types.OpenSearchRetrieverEngineType:
|
||||
if config.Addr == "" {
|
||||
return errors.NewValidationError("addr is required for opensearch")
|
||||
}
|
||||
case types.SQLiteRetrieverEngineType:
|
||||
// No connection config needed for SQLite
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// openSearch HNSW bound constants. Shards / replicas are NOT validated here —
|
||||
// the flat types.ValidateIndexConfig already enforces those caps for every
|
||||
// engine. These caps mirror the GetVectorStoreTypes Min/Max so the UI and
|
||||
// backend agree. A zero / empty field means "use the driver default" and is
|
||||
// always accepted.
|
||||
const (
|
||||
osHNSWMMin = 2
|
||||
osHNSWMMax = 100
|
||||
osHNSWEFConstructionMin = 2
|
||||
osHNSWEFConstructionMax = 4096
|
||||
osHNSWEFSearchMin = 1
|
||||
osHNSWEFSearchMax = 10000
|
||||
)
|
||||
|
||||
// validateOpenSearchIndexConfig validates the OpenSearch-specific HNSW fields.
|
||||
// Called from CreateStore only (the store is create-only; UpdateStore mutates
|
||||
// just the name). Unset fields (zero / empty) fall back to driver defaults and
|
||||
// are accepted.
|
||||
func validateOpenSearchIndexConfig(ic types.IndexConfig) error {
|
||||
if ic.HNSWM != 0 && (ic.HNSWM < osHNSWMMin || ic.HNSWM > osHNSWMMax) {
|
||||
return errors.NewValidationError(
|
||||
fmt.Sprintf("hnsw_m must be between %d and %d", osHNSWMMin, osHNSWMMax))
|
||||
}
|
||||
if ic.HNSWEFConstruction != 0 &&
|
||||
(ic.HNSWEFConstruction < osHNSWEFConstructionMin || ic.HNSWEFConstruction > osHNSWEFConstructionMax) {
|
||||
return errors.NewValidationError(
|
||||
fmt.Sprintf("hnsw_ef_construction must be between %d and %d", osHNSWEFConstructionMin, osHNSWEFConstructionMax))
|
||||
}
|
||||
if ic.HNSWEFSearch != 0 &&
|
||||
(ic.HNSWEFSearch < osHNSWEFSearchMin || ic.HNSWEFSearch > osHNSWEFSearchMax) {
|
||||
return errors.NewValidationError(
|
||||
fmt.Sprintf("hnsw_ef_search must be between %d and %d", osHNSWEFSearchMin, osHNSWEFSearchMax))
|
||||
}
|
||||
if ic.KNNEngine != "" && ic.KNNEngine != "lucene" && ic.KNNEngine != "faiss" {
|
||||
return errors.NewValidationError(`knn_engine must be "lucene" or "faiss"`)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -11,6 +11,7 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
|
||||
"github.com/Tencent/WeKnora/internal/errors"
|
||||
"github.com/Tencent/WeKnora/internal/logger"
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
@@ -47,6 +48,8 @@ func (s *vectorStoreService) TestConnection(
|
||||
return testWeaviateConnection(ctx, config)
|
||||
case types.DorisRetrieverEngineType:
|
||||
return testDorisConnection(ctx, config)
|
||||
case types.OpenSearchRetrieverEngineType:
|
||||
return testOpenSearchConnection(ctx, config)
|
||||
case types.SQLiteRetrieverEngineType:
|
||||
// SQLite is file-based, no remote connection to test
|
||||
return "", nil
|
||||
@@ -305,3 +308,23 @@ func testDorisConnection(ctx context.Context, config types.ConnectionConfig) (st
|
||||
}
|
||||
return version, nil
|
||||
}
|
||||
|
||||
// testOpenSearchConnection verifies the cluster is reachable, runs a
|
||||
// supported OpenSearch version, and has the k-NN plugin installed. The driver
|
||||
// owns the probe logic; a generic message is returned on failure so cluster
|
||||
// internals are not surfaced to the API caller.
|
||||
func testOpenSearchConnection(ctx context.Context, config types.ConnectionConfig) (string, error) {
|
||||
if config.Addr == "" {
|
||||
return "", errors.NewBadRequestError("failed to create opensearch connection: addr is required")
|
||||
}
|
||||
testCtx, cancel := context.WithTimeout(ctx, connectionTestTimeout)
|
||||
defer cancel()
|
||||
if err := openSearchRepo.TestConnection(testCtx, &config); err != nil {
|
||||
logger.Warnf(ctx, "OpenSearch connection test failed: %v", err)
|
||||
return "", errors.NewBadRequestError(
|
||||
"failed to connect to opensearch: check address, credentials, version (>= 2.4), and that the k-NN plugin is installed")
|
||||
}
|
||||
// Version is detected during the probe but not surfaced here; lazy index
|
||||
// creation re-validates on first use.
|
||||
return "", nil
|
||||
}
|
||||
|
||||
47
internal/application/service/vectorstore_opensearch_test.go
Normal file
47
internal/application/service/vectorstore_opensearch_test.go
Normal file
@@ -0,0 +1,47 @@
|
||||
package service
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
func TestValidateConnectionConfig_OpenSearch_RequiresAddr(t *testing.T) {
|
||||
if err := validateConnectionConfig(types.OpenSearchRetrieverEngineType, types.ConnectionConfig{}); err == nil {
|
||||
t.Error("empty addr should be rejected for opensearch")
|
||||
}
|
||||
if err := validateConnectionConfig(types.OpenSearchRetrieverEngineType,
|
||||
types.ConnectionConfig{Addr: "https://os:9200"}); err != nil {
|
||||
t.Errorf("valid addr should pass: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidateOpenSearchIndexConfig_BoundaryMatrix(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
ic types.IndexConfig
|
||||
wantErr bool
|
||||
}{
|
||||
{"all unset (defaults)", types.IndexConfig{}, false},
|
||||
{"valid mid-range", types.IndexConfig{HNSWM: 16, HNSWEFConstruction: 100, HNSWEFSearch: 100, KNNEngine: "lucene"}, false},
|
||||
{"valid faiss", types.IndexConfig{KNNEngine: "faiss"}, false},
|
||||
{"valid boundaries", types.IndexConfig{HNSWM: 2, HNSWEFConstruction: 2, HNSWEFSearch: 1}, false},
|
||||
{"valid upper boundaries", types.IndexConfig{HNSWM: 100, HNSWEFConstruction: 4096, HNSWEFSearch: 10000}, false},
|
||||
{"hnsw_m too low", types.IndexConfig{HNSWM: 1}, true},
|
||||
{"hnsw_m too high", types.IndexConfig{HNSWM: 101}, true},
|
||||
{"ef_construction too high", types.IndexConfig{HNSWEFConstruction: 4097}, true},
|
||||
{"ef_search too high", types.IndexConfig{HNSWEFSearch: 10001}, true},
|
||||
{"invalid engine", types.IndexConfig{KNNEngine: "nmslib"}, true},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := validateOpenSearchIndexConfig(tt.ic)
|
||||
if tt.wantErr && err == nil {
|
||||
t.Errorf("want error, got nil")
|
||||
}
|
||||
if !tt.wantErr && err != nil {
|
||||
t.Errorf("want nil, got %v", err)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
66
internal/container/audit_sink.go
Normal file
66
internal/container/audit_sink.go
Normal file
@@ -0,0 +1,66 @@
|
||||
package container
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
|
||||
"github.com/Tencent/WeKnora/internal/logger"
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
"github.com/Tencent/WeKnora/internal/types/interfaces"
|
||||
)
|
||||
|
||||
// auditSinkAdapter bridges the OpenSearch driver's AuditSink (which the driver
|
||||
// owns so it imports no service package) to the service-layer AuditLogService.
|
||||
// This keeps the dependency one-way: the driver depends only on its own
|
||||
// AuditSink abstraction; the container implements it.
|
||||
type auditSinkAdapter struct {
|
||||
svc interfaces.AuditLogService
|
||||
}
|
||||
|
||||
// newAuditSinkAdapter returns an opensearch.AuditSink backed by svc. A nil svc
|
||||
// yields a sink whose emits are no-ops.
|
||||
func newAuditSinkAdapter(svc interfaces.AuditLogService) opensearch.AuditSink {
|
||||
return auditSinkAdapter{svc: svc}
|
||||
}
|
||||
|
||||
func (a auditSinkAdapter) EmitIndexCreated(ctx context.Context, alias string, dim int) {
|
||||
a.emit(ctx, types.AuditActionOpenSearchIndexCreated, alias,
|
||||
map[string]any{"alias": alias, "dim": dim})
|
||||
}
|
||||
|
||||
func (a auditSinkAdapter) EmitReindexExecuted(ctx context.Context, srcAlias, dstAlias string, docs int64) {
|
||||
a.emit(ctx, types.AuditActionOpenSearchReindexExecuted, dstAlias,
|
||||
map[string]any{"src_alias": srcAlias, "dst_alias": dstAlias, "docs": docs})
|
||||
}
|
||||
|
||||
// emit writes one audit entry. It skips (with a warning) when the context
|
||||
// carries no tenant — driver events can fire from background task contexts
|
||||
// (e.g. lazy index creation under an async copy task), and writing tenant_id=0
|
||||
// would collide with the system-scope sentinel and corrupt the audit trail.
|
||||
func (a auditSinkAdapter) emit(ctx context.Context, action types.AuditAction, target string, detail map[string]any) {
|
||||
if a.svc == nil {
|
||||
return
|
||||
}
|
||||
tid, ok := types.TenantIDFromContext(ctx)
|
||||
if !ok {
|
||||
logger.GetLogger(ctx).Warnf("[audit] %s: no tenant in context, skipping audit (target=%s)", action, target)
|
||||
return
|
||||
}
|
||||
// Details is a typed JSON blob — only bounded, non-secret fields. Never
|
||||
// include cluster reason strings or connection secrets.
|
||||
b, err := json.Marshal(detail)
|
||||
if err != nil {
|
||||
logger.GetLogger(ctx).Warnf("[audit] %s: marshal details failed: %v", action, err)
|
||||
b = []byte("{}")
|
||||
}
|
||||
if err := a.svc.Log(ctx, &types.AuditLog{
|
||||
TenantID: tid,
|
||||
Action: action,
|
||||
TargetType: "opensearch_index",
|
||||
TargetID: target,
|
||||
Details: types.JSON(b),
|
||||
}); err != nil {
|
||||
logger.GetLogger(ctx).Warnf("[audit] %s emit failed: %v", action, err)
|
||||
}
|
||||
}
|
||||
74
internal/container/audit_sink_test.go
Normal file
74
internal/container/audit_sink_test.go
Normal file
@@ -0,0 +1,74 @@
|
||||
package container
|
||||
|
||||
import (
|
||||
"context"
|
||||
"testing"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
"github.com/Tencent/WeKnora/internal/types/interfaces"
|
||||
)
|
||||
|
||||
// fakeAuditSvc records Log calls.
|
||||
type fakeAuditSvc struct {
|
||||
logged []*types.AuditLog
|
||||
err error
|
||||
}
|
||||
|
||||
func (f *fakeAuditSvc) Log(_ context.Context, e *types.AuditLog) error {
|
||||
f.logged = append(f.logged, e)
|
||||
return f.err
|
||||
}
|
||||
func (f *fakeAuditSvc) LogDenied(context.Context, *gin.Context, uint64, string, string, types.TenantRole) error {
|
||||
return nil
|
||||
}
|
||||
func (f *fakeAuditSvc) List(context.Context, uint64, *interfaces.AuditLogQuery) ([]*types.AuditLog, error) {
|
||||
return nil, nil
|
||||
}
|
||||
func (f *fakeAuditSvc) Purge(context.Context, int) (int64, error) { return 0, nil }
|
||||
|
||||
func ctxWithTenant(id uint64) context.Context {
|
||||
return context.WithValue(context.Background(), types.TenantIDContextKey, id)
|
||||
}
|
||||
|
||||
func TestAuditSinkAdapter_EmitsWithTenant(t *testing.T) {
|
||||
f := &fakeAuditSvc{}
|
||||
sink := newAuditSinkAdapter(f)
|
||||
|
||||
sink.EmitIndexCreated(ctxWithTenant(42), "weknora_768", 768)
|
||||
if len(f.logged) != 1 {
|
||||
t.Fatalf("want 1 audit entry, got %d", len(f.logged))
|
||||
}
|
||||
e := f.logged[0]
|
||||
if e.TenantID != 42 {
|
||||
t.Errorf("tenant: want 42, got %d", e.TenantID)
|
||||
}
|
||||
if e.Action != types.AuditActionOpenSearchIndexCreated {
|
||||
t.Errorf("action: want %s, got %s", types.AuditActionOpenSearchIndexCreated, e.Action)
|
||||
}
|
||||
if e.Details == nil {
|
||||
t.Error("details should be populated")
|
||||
}
|
||||
|
||||
sink.EmitReindexExecuted(ctxWithTenant(7), "src", "dst", 9)
|
||||
if len(f.logged) != 2 || f.logged[1].Action != types.AuditActionOpenSearchReindexExecuted {
|
||||
t.Errorf("reindex audit not recorded: %+v", f.logged)
|
||||
}
|
||||
}
|
||||
|
||||
func TestAuditSinkAdapter_SkipsWithoutTenant(t *testing.T) {
|
||||
f := &fakeAuditSvc{}
|
||||
sink := newAuditSinkAdapter(f)
|
||||
// background ctx carries no tenant → adapter must skip (never write tenant=0)
|
||||
sink.EmitIndexCreated(context.Background(), "weknora_768", 768)
|
||||
sink.EmitReindexExecuted(context.Background(), "a", "b", 1)
|
||||
if len(f.logged) != 0 {
|
||||
t.Errorf("want 0 audit entries without tenant, got %d", len(f.logged))
|
||||
}
|
||||
}
|
||||
|
||||
func TestAuditSinkAdapter_NilServiceNoPanic(t *testing.T) {
|
||||
sink := newAuditSinkAdapter(nil)
|
||||
sink.EmitIndexCreated(ctxWithTenant(1), "x", 1) // must not panic
|
||||
}
|
||||
@@ -40,6 +40,7 @@ import (
|
||||
elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8"
|
||||
milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus"
|
||||
neo4jRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/neo4j"
|
||||
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
|
||||
postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres"
|
||||
qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant"
|
||||
sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite"
|
||||
@@ -785,10 +786,16 @@ func initFileService(cfg *config.Config) (interfaces.FileService, error) {
|
||||
// Returns:
|
||||
// - Configured retrieval engine registry
|
||||
// - Error if initialization fails
|
||||
func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.RetrieveEngineRegistry, error) {
|
||||
func initRetrieveEngineRegistry(
|
||||
db *gorm.DB, cfg *config.Config, auditSvc interfaces.AuditLogService,
|
||||
) (interfaces.RetrieveEngineRegistry, error) {
|
||||
registry := retriever.NewRetrieveEngineRegistry()
|
||||
retrieveDriver := strings.Split(os.Getenv("RETRIEVE_DRIVER"), ",")
|
||||
log := logger.GetLogger(context.Background())
|
||||
// Audit sink for OpenSearch driver events (index created / reindex). Driver
|
||||
// events fire under a tenant-scoped ctx at indexing time; the env-path
|
||||
// registration ctx below has no tenant, so those emits self-skip.
|
||||
auditSink := newAuditSinkAdapter(auditSvc)
|
||||
|
||||
if slices.Contains(retrieveDriver, "postgres") {
|
||||
postgresRepo := postgresRepo.NewPostgresRetrieveEngineRepository(db)
|
||||
@@ -854,6 +861,29 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
|
||||
}
|
||||
}
|
||||
|
||||
if slices.Contains(retrieveDriver, "opensearch") {
|
||||
cc := &types.ConnectionConfig{
|
||||
Addr: os.Getenv("OPENSEARCH_ADDR"),
|
||||
Username: os.Getenv("OPENSEARCH_USERNAME"),
|
||||
Password: os.Getenv("OPENSEARCH_PASSWORD"),
|
||||
InsecureSkipVerify: strings.EqualFold(os.Getenv("OPENSEARCH_INSECURE_SKIP_VERIFY"), "true"),
|
||||
}
|
||||
client, err := openSearchRepo.NewOpenSearchClient(cc)
|
||||
if err != nil {
|
||||
log.Errorf("Create opensearch client failed: %v", err)
|
||||
} else if repo, err := openSearchRepo.NewRepository(
|
||||
context.Background(), client, "", nil, openSearchRepo.WithAuditSink(auditSink),
|
||||
); err != nil {
|
||||
log.Errorf("Create opensearch repository failed: %v", err)
|
||||
} else if err := registry.Register(
|
||||
retriever.NewKVHybridRetrieveEngine(repo, types.OpenSearchRetrieverEngineType),
|
||||
); err != nil {
|
||||
log.Errorf("Register opensearch retrieve engine failed: %v", err)
|
||||
} else {
|
||||
log.Infof("Register opensearch retrieve engine success")
|
||||
}
|
||||
}
|
||||
|
||||
if slices.Contains(retrieveDriver, "qdrant") {
|
||||
qdrantHost := os.Getenv("QDRANT_HOST")
|
||||
if qdrantHost == "" {
|
||||
@@ -1061,7 +1091,7 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
|
||||
}
|
||||
// ─── DB store registration (byStoreID) ───
|
||||
if storeReg, ok := registry.(*retriever.RetrieveEngineRegistry); ok {
|
||||
loadDBStoresIntoRegistry(storeReg, db, cfg)
|
||||
loadDBStoresIntoRegistry(storeReg, db, cfg, auditSink)
|
||||
}
|
||||
|
||||
return registry, nil
|
||||
@@ -1069,7 +1099,9 @@ func initRetrieveEngineRegistry(db *gorm.DB, cfg *config.Config) (interfaces.Ret
|
||||
|
||||
// loadDBStoresIntoRegistry loads VectorStore records from DB and registers them
|
||||
// in the registry's byStoreID map. Failures are logged and skipped (non-fatal).
|
||||
func loadDBStoresIntoRegistry(storeRegistry interfaces.StoreRegistry, db *gorm.DB, cfg *config.Config) {
|
||||
func loadDBStoresIntoRegistry(
|
||||
storeRegistry interfaces.StoreRegistry, db *gorm.DB, cfg *config.Config, auditSink openSearchRepo.AuditSink,
|
||||
) {
|
||||
ctx := context.Background()
|
||||
log := logger.GetLogger(ctx)
|
||||
|
||||
@@ -1086,7 +1118,7 @@ func loadDBStoresIntoRegistry(storeRegistry interfaces.StoreRegistry, db *gorm.D
|
||||
|
||||
log.Infof("Loading %d vector store(s) from database", len(stores))
|
||||
for _, store := range stores {
|
||||
svc, err := createEngineServiceFromStore(ctx, store, db, cfg)
|
||||
svc, err := createEngineServiceFromStore(ctx, store, db, cfg, auditSink)
|
||||
if err != nil {
|
||||
log.Errorf("Failed to create engine for store %s (%s): %v", store.ID, store.Name, err)
|
||||
continue
|
||||
|
||||
@@ -23,6 +23,7 @@ import (
|
||||
elasticsearchRepoV7 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v7"
|
||||
elasticsearchRepoV8 "github.com/Tencent/WeKnora/internal/application/repository/retriever/elasticsearch/v8"
|
||||
milvusRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/milvus"
|
||||
openSearchRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/opensearch"
|
||||
postgresRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/postgres"
|
||||
qdrantRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/qdrant"
|
||||
sqliteRetrieverRepo "github.com/Tencent/WeKnora/internal/application/repository/retriever/sqlite"
|
||||
@@ -35,21 +36,27 @@ import (
|
||||
"github.com/tencent/vectordatabase-sdk-go/tcvectordb"
|
||||
)
|
||||
|
||||
// NewEngineFactory returns an EngineFactory function closed over db and cfg.
|
||||
// Registered in dig and injected into VectorStoreService for dynamic registry updates.
|
||||
func NewEngineFactory(db *gorm.DB, cfg *config.Config) interfaces.EngineFactory {
|
||||
// NewEngineFactory returns an EngineFactory function closed over db, cfg, and
|
||||
// an audit sink (built from the AuditLogService). Registered in dig and
|
||||
// injected into VectorStoreService for dynamic registry updates. The
|
||||
// EngineFactory type itself is unchanged — the audit sink is captured in the
|
||||
// closure rather than added to the signature.
|
||||
func NewEngineFactory(db *gorm.DB, cfg *config.Config, auditSvc interfaces.AuditLogService) interfaces.EngineFactory {
|
||||
sink := newAuditSinkAdapter(auditSvc)
|
||||
return func(ctx context.Context, store types.VectorStore) (interfaces.RetrieveEngineService, error) {
|
||||
return createEngineServiceFromStore(ctx, store, db, cfg)
|
||||
return createEngineServiceFromStore(ctx, store, db, cfg, sink)
|
||||
}
|
||||
}
|
||||
|
||||
// createEngineServiceFromStore creates a RetrieveEngineService from a VectorStore's config.
|
||||
// This is the DB store counterpart of the env-based initialization in initRetrieveEngineRegistry.
|
||||
// auditSink may be nil (audit becomes a no-op).
|
||||
func createEngineServiceFromStore(
|
||||
ctx context.Context,
|
||||
store types.VectorStore,
|
||||
db *gorm.DB,
|
||||
cfg *config.Config,
|
||||
auditSink openSearchRepo.AuditSink,
|
||||
) (interfaces.RetrieveEngineService, error) {
|
||||
switch store.EngineType {
|
||||
case types.PostgresRetrieverEngineType:
|
||||
@@ -68,11 +75,40 @@ func createEngineServiceFromStore(
|
||||
return createSQLiteEngine(store, db)
|
||||
case types.TencentVectorDBRetrieverEngineType:
|
||||
return createTencentVectorDBEngine(store)
|
||||
case types.OpenSearchRetrieverEngineType:
|
||||
return createOpenSearchEngine(ctx, store, auditSink)
|
||||
default:
|
||||
return nil, fmt.Errorf("unsupported engine type: %s", store.EngineType)
|
||||
}
|
||||
}
|
||||
|
||||
// createOpenSearchEngine builds an OpenSearch k-NN retrieve engine. Mirrors
|
||||
// createElasticsearchV8Engine but uses the driver's TLS-hardened client
|
||||
// constructor and injects the audit sink. NewRepository probes the cluster
|
||||
// (version + k-NN plugin), so an unreachable cluster fails here at
|
||||
// registration rather than on first query.
|
||||
func createOpenSearchEngine(
|
||||
ctx context.Context, store types.VectorStore, auditSink openSearchRepo.AuditSink,
|
||||
) (interfaces.RetrieveEngineService, error) {
|
||||
client, err := openSearchRepo.NewOpenSearchClient(&store.ConnectionConfig)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("create opensearch client: %w", err)
|
||||
}
|
||||
// Env stores share the cluster without a per-store index prefix; DB stores
|
||||
// fold their (>=16-char) ID into the index name. NewRepository enforces the
|
||||
// length rule, so map env-store IDs to "".
|
||||
storeID := store.ID
|
||||
if types.IsEnvStoreID(storeID) {
|
||||
storeID = ""
|
||||
}
|
||||
repo, err := openSearchRepo.NewRepository(ctx, client, storeID, &store.IndexConfig,
|
||||
openSearchRepo.WithAuditSink(auditSink))
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("create opensearch repository: %w", err)
|
||||
}
|
||||
return retriever.NewKVHybridRetrieveEngine(repo, types.OpenSearchRetrieverEngineType), nil
|
||||
}
|
||||
|
||||
func createPostgresEngine(store types.VectorStore, db *gorm.DB) (interfaces.RetrieveEngineService, error) {
|
||||
if store.ConnectionConfig.UseDefaultConnection {
|
||||
repo := postgresRepo.NewPostgresRetrieveEngineRepository(db)
|
||||
|
||||
104
internal/container/engine_factory_opensearch_test.go
Normal file
104
internal/container/engine_factory_opensearch_test.go
Normal file
@@ -0,0 +1,104 @@
|
||||
package container
|
||||
|
||||
import (
|
||||
"context"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
|
||||
"gorm.io/driver/sqlite"
|
||||
"gorm.io/gorm"
|
||||
|
||||
"github.com/Tencent/WeKnora/internal/config"
|
||||
"github.com/Tencent/WeKnora/internal/types"
|
||||
)
|
||||
|
||||
// osClusterHandler simulates an OpenSearch cluster for the driver's
|
||||
// construction probe: GET / (version) + /_cat/plugins (k-NN on every node).
|
||||
func osClusterHandler(distribution, number string, knnInstalled bool) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
if r.URL.Path == "/_cat/plugins" {
|
||||
if knnInstalled {
|
||||
_, _ = w.Write([]byte(`[{"name":"node-1","component":"opensearch-knn"}]`))
|
||||
} else {
|
||||
_, _ = w.Write([]byte(`[{"name":"node-1","component":"opensearch-sql"}]`))
|
||||
}
|
||||
return
|
||||
}
|
||||
_, _ = w.Write([]byte(`{"version":{"distribution":"` + distribution + `","number":"` + number + `"}}`))
|
||||
}
|
||||
}
|
||||
|
||||
func TestCreateOpenSearchEngine_WiresClientAndRepo(t *testing.T) {
|
||||
ts := httptest.NewServer(osClusterHandler("opensearch", "3.3.2", true))
|
||||
defer ts.Close()
|
||||
|
||||
store := types.VectorStore{
|
||||
ID: "", // env-style (no index prefix)
|
||||
EngineType: types.OpenSearchRetrieverEngineType,
|
||||
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL},
|
||||
}
|
||||
svc, err := createOpenSearchEngine(context.Background(), store, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("createOpenSearchEngine: %v", err)
|
||||
}
|
||||
if svc == nil {
|
||||
t.Fatal("want non-nil engine service")
|
||||
}
|
||||
if svc.EngineType() != types.OpenSearchRetrieverEngineType {
|
||||
t.Errorf("engine type: want opensearch, got %s", svc.EngineType())
|
||||
}
|
||||
}
|
||||
|
||||
func TestCreateOpenSearchEngine_RejectsBadCluster(t *testing.T) {
|
||||
// Elasticsearch distribution must be rejected at construction.
|
||||
ts := httptest.NewServer(osClusterHandler("elasticsearch", "8.10.4", true))
|
||||
defer ts.Close()
|
||||
_, err := createOpenSearchEngine(context.Background(),
|
||||
types.VectorStore{EngineType: types.OpenSearchRetrieverEngineType,
|
||||
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL}}, nil)
|
||||
if err == nil {
|
||||
t.Error("elasticsearch cluster should be rejected at engine creation")
|
||||
}
|
||||
}
|
||||
|
||||
func TestCreateEngineServiceFromStore_OpenSearchCaseReached(t *testing.T) {
|
||||
ts := httptest.NewServer(osClusterHandler("opensearch", "2.11.0", true))
|
||||
defer ts.Close()
|
||||
svc, err := createEngineServiceFromStore(context.Background(),
|
||||
types.VectorStore{EngineType: types.OpenSearchRetrieverEngineType,
|
||||
ConnectionConfig: types.ConnectionConfig{Addr: ts.URL}},
|
||||
nil, &config.Config{}, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("createEngineServiceFromStore (opensearch case): %v", err)
|
||||
}
|
||||
if svc == nil || svc.EngineType() != types.OpenSearchRetrieverEngineType {
|
||||
t.Errorf("opensearch case not wired correctly: %v", svc)
|
||||
}
|
||||
}
|
||||
|
||||
// TestInitRetrieveEngineRegistry_OpenSearchEnvPath exercises the
|
||||
// RETRIEVE_DRIVER=opensearch env-path registration block end to end.
|
||||
func TestInitRetrieveEngineRegistry_OpenSearchEnvPath(t *testing.T) {
|
||||
ts := httptest.NewServer(osClusterHandler("opensearch", "3.3.2", true))
|
||||
defer ts.Close()
|
||||
|
||||
// In-memory DB: the vector_stores table is absent, so loadDBStores logs
|
||||
// and returns (non-fatal) — only the env-path block matters here.
|
||||
db, err := gorm.Open(sqlite.Open(":memory:"), &gorm.Config{})
|
||||
if err != nil {
|
||||
t.Fatalf("open in-mem db: %v", err)
|
||||
}
|
||||
|
||||
t.Setenv("RETRIEVE_DRIVER", "opensearch")
|
||||
t.Setenv("OPENSEARCH_ADDR", ts.URL)
|
||||
|
||||
registry, err := initRetrieveEngineRegistry(db, &config.Config{}, &fakeAuditSvc{})
|
||||
if err != nil {
|
||||
t.Fatalf("initRetrieveEngineRegistry: %v", err)
|
||||
}
|
||||
if _, err := registry.GetRetrieveEngineService(types.OpenSearchRetrieverEngineType); err != nil {
|
||||
t.Errorf("opensearch engine not registered via env path: %v", err)
|
||||
}
|
||||
}
|
||||
@@ -50,6 +50,10 @@ var retrieverEngineMapping = map[string][]RetrieverEngineParams{
|
||||
{RetrieverType: KeywordsRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType},
|
||||
{RetrieverType: VectorRetrieverType, RetrieverEngineType: TencentVectorDBRetrieverEngineType},
|
||||
},
|
||||
"opensearch": {
|
||||
{RetrieverType: KeywordsRetrieverType, RetrieverEngineType: OpenSearchRetrieverEngineType},
|
||||
{RetrieverType: VectorRetrieverType, RetrieverEngineType: OpenSearchRetrieverEngineType},
|
||||
},
|
||||
}
|
||||
|
||||
// GetRetrieverEngineMapping returns the retriever engine mapping
|
||||
|
||||
@@ -86,6 +86,7 @@ var validEngineTypes = map[RetrieverEngineType]bool{
|
||||
WeaviateRetrieverEngineType: true,
|
||||
DorisRetrieverEngineType: true,
|
||||
TencentVectorDBRetrieverEngineType: true,
|
||||
OpenSearchRetrieverEngineType: true,
|
||||
}
|
||||
|
||||
// IsValidEngineType checks whether the given engine type is valid for VectorStore.
|
||||
@@ -250,6 +251,14 @@ type IndexConfig struct {
|
||||
DesiredShardCount int `yaml:"desired_shard_count" json:"desired_shard_count,omitempty"` // Weaviate: number of shards per collection
|
||||
BucketsNum int `yaml:"buckets_num" json:"buckets_num,omitempty"` // Doris: number of buckets per table (DISTRIBUTED BY HASH ... BUCKETS N)
|
||||
ReplicationNum int `yaml:"replication_num" json:"replication_num,omitempty"` // Doris: replication_num PROPERTIES
|
||||
|
||||
// --- OpenSearch k-NN HNSW fields ---
|
||||
// All omitempty so other engines' serialized IndexConfig is unchanged.
|
||||
// Zero / empty values fall back to the driver defaults in buildInternalCfg.
|
||||
HNSWM int `yaml:"hnsw_m" json:"hnsw_m,omitempty"` // OpenSearch: HNSW graph degree (M)
|
||||
HNSWEFConstruction int `yaml:"hnsw_ef_construction" json:"hnsw_ef_construction,omitempty"` // OpenSearch: HNSW index-build candidate list size
|
||||
HNSWEFSearch int `yaml:"hnsw_ef_search" json:"hnsw_ef_search,omitempty"` // OpenSearch: HNSW search candidate list size (faiss; lucene reads at query time)
|
||||
KNNEngine string `yaml:"knn_engine" json:"knn_engine,omitempty"` // OpenSearch: k-NN backend ("lucene" | "faiss")
|
||||
}
|
||||
|
||||
// Value implements the driver.Valuer interface.
|
||||
@@ -727,9 +736,32 @@ func GetVectorStoreTypes() []VectorStoreTypeInfo {
|
||||
{Name: "replication_num", Type: "number", Required: false, Description: "Replication Num", Default: 1},
|
||||
},
|
||||
},
|
||||
{
|
||||
Type: "opensearch",
|
||||
DisplayName: "OpenSearch",
|
||||
ConnectionFields: []VectorStoreFieldInfo{
|
||||
{Name: "addr", Type: "string", Required: true, Description: "URL", Default: "https://localhost:9200"},
|
||||
{Name: "username", Type: "string", Required: false, Description: "Username", Default: "admin"},
|
||||
{Name: "password", Type: "string", Required: false, Sensitive: true, Description: "Password"},
|
||||
{Name: "insecure_skip_verify", Type: "boolean", Required: false, Default: false,
|
||||
Description: "Skip TLS certificate verification. For self-signed dev clusters only — never enable in production."},
|
||||
},
|
||||
IndexFields: []VectorStoreFieldInfo{
|
||||
{Name: "index_name", Type: "string", Required: false, Description: "Index Name", Default: "weknora"},
|
||||
{Name: "number_of_shards", Type: "number", Required: false, Description: "Shards", Default: 4, Min: floatPtr(1), Max: floatPtr(64)},
|
||||
{Name: "number_of_replicas", Type: "number", Required: false, Description: "Replicas", Default: 1, Min: floatPtr(0), Max: floatPtr(10)},
|
||||
{Name: "hnsw_m", Type: "number", Required: false, Description: "HNSW graph degree (M). Immutable after index creation.", Default: 16, Min: floatPtr(2), Max: floatPtr(100), Immutable: true},
|
||||
{Name: "hnsw_ef_construction", Type: "number", Required: false, Description: "HNSW build candidate list. Higher (e.g. 200-512) improves recall at the cost of build time. Immutable after creation.", Default: 100, Min: floatPtr(2), Max: floatPtr(4096), Immutable: true},
|
||||
{Name: "hnsw_ef_search", Type: "number", Required: false, Description: "HNSW search candidate list. Effective on the faiss engine; the lucene engine reads it at query time. Immutable (no settings-update path).", Default: 100, Min: floatPtr(1), Max: floatPtr(10000), Immutable: true},
|
||||
{Name: "knn_engine", Type: "string", Required: false, Description: "k-NN backend.", Default: "lucene", Enum: []string{"lucene", "faiss"}, Immutable: true},
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// floatPtr returns a pointer to v, for setting VectorStoreFieldInfo Min/Max.
|
||||
func floatPtr(v float64) *float64 { return &v }
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// BuildEnvVectorStores — virtual stores from RETRIEVE_DRIVER env var
|
||||
// ---------------------------------------------------------------------------
|
||||
@@ -821,6 +853,21 @@ func buildEnvStoreForDriver(driver string, envLookup EnvLookupFunc) *VectorStore
|
||||
IndexName: envLookup("ELASTICSEARCH_INDEX"),
|
||||
},
|
||||
}
|
||||
case "opensearch":
|
||||
return &VectorStore{
|
||||
ID: "__env_opensearch__",
|
||||
Name: "OpenSearch",
|
||||
EngineType: OpenSearchRetrieverEngineType,
|
||||
ConnectionConfig: ConnectionConfig{
|
||||
Addr: envLookup("OPENSEARCH_ADDR"),
|
||||
Username: envLookup("OPENSEARCH_USERNAME"),
|
||||
Password: envLookup("OPENSEARCH_PASSWORD"),
|
||||
InsecureSkipVerify: strings.EqualFold(envLookup("OPENSEARCH_INSECURE_SKIP_VERIFY"), "true"),
|
||||
},
|
||||
IndexConfig: IndexConfig{
|
||||
IndexName: envLookup("OPENSEARCH_INDEX"),
|
||||
},
|
||||
}
|
||||
case "qdrant":
|
||||
return &VectorStore{
|
||||
ID: "__env_qdrant__",
|
||||
|
||||
154
internal/types/vectorstore_opensearch_test.go
Normal file
154
internal/types/vectorstore_opensearch_test.go
Normal file
@@ -0,0 +1,154 @@
|
||||
package types
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"testing"
|
||||
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// findVSType returns the VectorStoreTypeInfo with the given Type, or fails.
|
||||
func findVSType(t *testing.T, typeName string) VectorStoreTypeInfo {
|
||||
t.Helper()
|
||||
for _, vt := range GetVectorStoreTypes() {
|
||||
if vt.Type == typeName {
|
||||
return vt
|
||||
}
|
||||
}
|
||||
t.Fatalf("GetVectorStoreTypes() has no entry for %q", typeName)
|
||||
return VectorStoreTypeInfo{}
|
||||
}
|
||||
|
||||
// findField returns the field with the given Name from a slice, or fails.
|
||||
func findField(t *testing.T, fields []VectorStoreFieldInfo, name string) VectorStoreFieldInfo {
|
||||
t.Helper()
|
||||
for _, f := range fields {
|
||||
if f.Name == name {
|
||||
return f
|
||||
}
|
||||
}
|
||||
t.Fatalf("no field named %q", name)
|
||||
return VectorStoreFieldInfo{}
|
||||
}
|
||||
|
||||
// TestIndexConfig_OpenSearchFieldsOmittedForOtherEngines verifies the new HNSW
|
||||
// fields are omitempty so other engines' serialized IndexConfig is unchanged.
|
||||
func TestIndexConfig_OpenSearchFieldsOmittedForOtherEngines(t *testing.T) {
|
||||
t.Run("omitted when unset", func(t *testing.T) {
|
||||
b, err := json.Marshal(IndexConfig{IndexName: "weknora", NumberOfShards: 4})
|
||||
require.NoError(t, err)
|
||||
s := string(b)
|
||||
assert.NotContains(t, s, "hnsw_m")
|
||||
assert.NotContains(t, s, "hnsw_ef_construction")
|
||||
assert.NotContains(t, s, "hnsw_ef_search")
|
||||
assert.NotContains(t, s, "knn_engine")
|
||||
})
|
||||
|
||||
t.Run("present when set", func(t *testing.T) {
|
||||
b, err := json.Marshal(IndexConfig{
|
||||
HNSWM: 24,
|
||||
HNSWEFConstruction: 200,
|
||||
HNSWEFSearch: 128,
|
||||
KNNEngine: "faiss",
|
||||
})
|
||||
require.NoError(t, err)
|
||||
s := string(b)
|
||||
assert.Contains(t, s, `"hnsw_m":24`)
|
||||
assert.Contains(t, s, `"hnsw_ef_construction":200`)
|
||||
assert.Contains(t, s, `"hnsw_ef_search":128`)
|
||||
assert.Contains(t, s, `"knn_engine":"faiss"`)
|
||||
})
|
||||
}
|
||||
|
||||
// TestIsValidEngineType_OpenSearch verifies OpenSearch is now a valid DB-store engine.
|
||||
func TestIsValidEngineType_OpenSearch(t *testing.T) {
|
||||
assert.True(t, IsValidEngineType(OpenSearchRetrieverEngineType))
|
||||
}
|
||||
|
||||
// TestGetVectorStoreTypes_OpenSearchEntry verifies the OpenSearch metadata entry
|
||||
// exposes the connection + HNSW index fields with the right bounds/enum/immutable.
|
||||
func TestGetVectorStoreTypes_OpenSearchEntry(t *testing.T) {
|
||||
vt := findVSType(t, "opensearch")
|
||||
assert.Equal(t, "OpenSearch", vt.DisplayName)
|
||||
|
||||
// Connection fields
|
||||
insecure := findField(t, vt.ConnectionFields, "insecure_skip_verify")
|
||||
assert.Equal(t, "boolean", insecure.Type)
|
||||
assert.Equal(t, false, insecure.Default) // never default-true
|
||||
pw := findField(t, vt.ConnectionFields, "password")
|
||||
assert.True(t, pw.Sensitive)
|
||||
|
||||
// HNSW index fields: bounds match the flat-validator-aligned caps (14-D)
|
||||
m := findField(t, vt.IndexFields, "hnsw_m")
|
||||
require.NotNil(t, m.Min)
|
||||
require.NotNil(t, m.Max)
|
||||
assert.Equal(t, 2.0, *m.Min)
|
||||
assert.Equal(t, 100.0, *m.Max)
|
||||
assert.True(t, m.Immutable)
|
||||
|
||||
shards := findField(t, vt.IndexFields, "number_of_shards")
|
||||
require.NotNil(t, shards.Max)
|
||||
assert.Equal(t, 64.0, *shards.Max) // flat ValidateIndexConfig maxShards
|
||||
|
||||
replicas := findField(t, vt.IndexFields, "number_of_replicas")
|
||||
require.NotNil(t, replicas.Max)
|
||||
assert.Equal(t, 10.0, *replicas.Max) // flat maxReplicas
|
||||
|
||||
eng := findField(t, vt.IndexFields, "knn_engine")
|
||||
assert.ElementsMatch(t, []string{"lucene", "faiss"}, eng.Enum)
|
||||
assert.True(t, eng.Immutable)
|
||||
|
||||
efs := findField(t, vt.IndexFields, "hnsw_ef_search")
|
||||
assert.True(t, efs.Immutable) // no PutSettings path → immutable
|
||||
}
|
||||
|
||||
// TestBuildEnvVectorStores_OpenSearch verifies the env-store builder case.
|
||||
func TestBuildEnvVectorStores_OpenSearch(t *testing.T) {
|
||||
lookup := mockEnvLookup(map[string]string{
|
||||
"OPENSEARCH_ADDR": "https://os:9200",
|
||||
"OPENSEARCH_USERNAME": "admin",
|
||||
"OPENSEARCH_PASSWORD": "secret",
|
||||
"OPENSEARCH_INDEX": "weknora",
|
||||
"OPENSEARCH_INSECURE_SKIP_VERIFY": "true",
|
||||
})
|
||||
stores := BuildEnvVectorStores("opensearch", lookup)
|
||||
require.Len(t, stores, 1)
|
||||
s := stores[0]
|
||||
assert.Equal(t, "__env_opensearch__", s.ID)
|
||||
assert.Equal(t, OpenSearchRetrieverEngineType, s.EngineType)
|
||||
assert.Equal(t, "https://os:9200", s.ConnectionConfig.Addr)
|
||||
assert.Equal(t, "admin", s.ConnectionConfig.Username)
|
||||
assert.Equal(t, "secret", s.ConnectionConfig.Password)
|
||||
assert.True(t, s.ConnectionConfig.InsecureSkipVerify)
|
||||
assert.Equal(t, "weknora", s.IndexConfig.IndexName)
|
||||
}
|
||||
|
||||
// TestBuildEnvVectorStores_OpenSearch_InsecureDefaultsFalse verifies the TLS
|
||||
// skip flag is false unless the env var is explicitly "true".
|
||||
func TestBuildEnvVectorStores_OpenSearch_InsecureDefaultsFalse(t *testing.T) {
|
||||
lookup := mockEnvLookup(map[string]string{"OPENSEARCH_ADDR": "https://os:9200"})
|
||||
stores := BuildEnvVectorStores("opensearch", lookup)
|
||||
require.Len(t, stores, 1)
|
||||
assert.False(t, stores[0].ConnectionConfig.InsecureSkipVerify)
|
||||
}
|
||||
|
||||
// TestRetrieverEngineMapping_OpenSearch verifies the RETRIEVE_DRIVER mapping.
|
||||
func TestRetrieverEngineMapping_OpenSearch(t *testing.T) {
|
||||
m := GetRetrieverEngineMapping()
|
||||
params, ok := m["opensearch"]
|
||||
require.True(t, ok, `retrieverEngineMapping missing "opensearch"`)
|
||||
require.Len(t, params, 2)
|
||||
|
||||
var hasKeywords, hasVector bool
|
||||
for _, p := range params {
|
||||
assert.Equal(t, OpenSearchRetrieverEngineType, p.RetrieverEngineType)
|
||||
switch p.RetrieverType {
|
||||
case KeywordsRetrieverType:
|
||||
hasKeywords = true
|
||||
case VectorRetrieverType:
|
||||
hasVector = true
|
||||
}
|
||||
}
|
||||
assert.True(t, hasKeywords && hasVector, "expected both Keywords and Vector retriever types")
|
||||
}
|
||||
@@ -246,7 +246,7 @@ func TestGetVectorStoreTypes(t *testing.T) {
|
||||
types := GetVectorStoreTypes()
|
||||
|
||||
t.Run("returns supported external engine types (excludes postgres and sqlite)", func(t *testing.T) {
|
||||
assert.Len(t, types, 6)
|
||||
assert.Len(t, types, 7)
|
||||
})
|
||||
|
||||
t.Run("type names match engine constants", func(t *testing.T) {
|
||||
@@ -260,6 +260,7 @@ func TestGetVectorStoreTypes(t *testing.T) {
|
||||
assert.Contains(t, typeNames, "tencent_vectordb")
|
||||
assert.Contains(t, typeNames, "weaviate")
|
||||
assert.Contains(t, typeNames, "doris")
|
||||
assert.Contains(t, typeNames, "opensearch")
|
||||
assert.NotContains(t, typeNames, "postgres")
|
||||
assert.NotContains(t, typeNames, "sqlite")
|
||||
})
|
||||
@@ -399,9 +400,10 @@ func TestIsValidEngineType(t *testing.T) {
|
||||
// GetVectorStoreTypes does not list them, Validate rejects them, and
|
||||
// env stores reach the engine registry through BuildEnvVectorStores
|
||||
// instead of through CreateStore.
|
||||
// Note: opensearch is now a VALID DB-store engine (activated in this PR);
|
||||
// see TestIsValidEngineType_OpenSearch in vectorstore_opensearch_test.go.
|
||||
invalidTypes := []RetrieverEngineType{
|
||||
"unknown",
|
||||
"opensearch",
|
||||
"",
|
||||
PostgresRetrieverEngineType,
|
||||
SQLiteRetrieverEngineType,
|
||||
@@ -1350,64 +1352,3 @@ func TestOpenSearchRetrieverEngineType_DistinctFromExisting(t *testing.T) {
|
||||
"OpenSearch wire value must not collide with %s", e)
|
||||
}
|
||||
}
|
||||
|
||||
// TestOpenSearchRetrieverEngineType_NotInValidEngineTypes verifies
|
||||
// that PR 1 does NOT add the new engine type to validEngineTypes —
|
||||
// this is the gate that keeps OpenSearch VectorStore registration
|
||||
// rejected until activation lands in a later PR.
|
||||
func TestOpenSearchRetrieverEngineType_NotInValidEngineTypes(t *testing.T) {
|
||||
assert.False(t, IsValidEngineType(OpenSearchRetrieverEngineType),
|
||||
"OpenSearch must remain invalid for VectorStore registration "+
|
||||
"until activation lands in a later PR (gated activation)")
|
||||
}
|
||||
|
||||
// The next three tests are defense-in-depth companions to
|
||||
// TestOpenSearchRetrieverEngineType_NotInValidEngineTypes. Each pins
|
||||
// a separate activation surface that must remain closed in PR 1 (and
|
||||
// in PR 2). Activation lands together with a coordinated flip in
|
||||
// PR 3, at which point each of these `assert.False` / `assert.NotContains`
|
||||
// / `assert.Nil` lines flips to its positive counterpart in the same
|
||||
// diff — the test suite becomes the activation checklist.
|
||||
|
||||
// TestRetrieverEngineMapping_OpenSearchNotRegistered pins that
|
||||
// `retrieverEngineMapping` does not have an `"opensearch"` key. Without
|
||||
// this entry, setting `RETRIEVE_DRIVER=opensearch` is a silent no-op
|
||||
// (GetDefaultRetrieverEngines drops the unknown driver from its loop
|
||||
// at tenant.go).
|
||||
func TestRetrieverEngineMapping_OpenSearchNotRegistered(t *testing.T) {
|
||||
mapping := GetRetrieverEngineMapping()
|
||||
_, ok := mapping["opensearch"]
|
||||
assert.False(t, ok,
|
||||
"retrieverEngineMapping must not register opensearch until "+
|
||||
"activation lands in a later PR (gated activation)")
|
||||
}
|
||||
|
||||
// TestGetVectorStoreTypes_OmitsOpenSearch pins that the
|
||||
// /api/v1/vector-stores/types response does NOT list opensearch.
|
||||
// Without this entry, the UI dropdown cannot offer OpenSearch as a
|
||||
// store type even if a frontend renderer accidentally tries.
|
||||
func TestGetVectorStoreTypes_OmitsOpenSearch(t *testing.T) {
|
||||
listed := GetVectorStoreTypes()
|
||||
for _, info := range listed {
|
||||
assert.NotEqual(t, "opensearch", info.Type,
|
||||
"GetVectorStoreTypes must not surface opensearch until "+
|
||||
"activation lands in a later PR (gated activation)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestBuildEnvVectorStores_OpenSearchSkipped pins that
|
||||
// `BuildEnvVectorStores("opensearch", lookup)` returns nil — the
|
||||
// `default:` arm of `buildEnvStoreForDriver`. Without an explicit
|
||||
// `case "opensearch":` arm, container.go cannot synthesize an env
|
||||
// store for the driver, completing the third lock on the activation
|
||||
// chain.
|
||||
func TestBuildEnvVectorStores_OpenSearchSkipped(t *testing.T) {
|
||||
stores := BuildEnvVectorStores("opensearch", mockEnvLookup(map[string]string{
|
||||
"OPENSEARCH_ADDR": "https://os:9200",
|
||||
"OPENSEARCH_USERNAME": "admin",
|
||||
}))
|
||||
assert.Empty(t, stores,
|
||||
"BuildEnvVectorStores must not synthesize an env store for "+
|
||||
"opensearch until activation lands in a later PR (gated "+
|
||||
"activation)")
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user