Phase 3 (#1440) gate flip. PR 1 (#1445) + PR 2a (#1481) + PR 2b (#1482)
laid the type prep + driver skeleton + read/write paths as gated dead
code; this PR wires every activation surface so opensearch becomes a
registerable VectorStore engine.
Activation wiring
- internal/types: validEngineTypes / GetVectorStoreTypes (with HNSW
bounds + knn_engine enum + Immutable hints) / retrieverEngineMapping /
buildEnvStoreForDriver — every gated surface now recognises
"opensearch". IndexConfig grows four omitempty HNSW fields (HNSWM /
HNSWEFConstruction / HNSWEFSearch / KNNEngine), keeping other engines'
serialised config byte-identical.
- internal/container: createOpenSearchEngine + the switch case in
createEngineServiceFromStore; the RETRIEVE_DRIVER=opensearch env path
in initRetrieveEngineRegistry; NewEngineFactory now closes over the
AuditLogService (the EngineFactory type itself is unchanged).
- internal/application/service/vectorstore_healthcheck.go: a
testOpenSearchConnection case so CreateStore's connectivity probe
accepts opensearch instead of returning 400.
- internal/application/repository/retriever/opensearch/transport.go:
NewOpenSearchClient is exported so the factory and env path can build
the TLS-hardened client; healthcheck.go reuses the unexported
probeVersion / probeKNNPlugin for the service-layer probe.
Service-layer validation
- validateOpenSearchIndexConfig validates the HNSW caps (m 2-100,
ef_construction 2-4096, ef_search 1-10000, knn_engine ∈ lucene|faiss).
Shards/replicas continue to be enforced by the flat ValidateIndexConfig.
Create-only: UpdateStore mutates the name only.
- validateConnectionConfig requires addr for opensearch.
Sync implementations (stubs.go shrinks)
- CopyIndices (copy.go) mirrors the Elasticsearch / Qdrant pattern —
search → BatchSave with the source_id remap for generated questions —
so dim/keyword routing and the source_id contract come from BatchSave
for free. embeddingMap is keyed by the *target* SourceID because
OpenSearch's BatchSave looks up embeddings by SourceID
(lookupEmbedding), not by chunk_id (the ES driver's convention).
Pagination is from/size; copies larger than max_result_window
(default 10000) need the scroll-based async path that lands later.
- BatchUpdateChunkEnabledStatus / BatchUpdateChunkTagID (bulk_update.go)
group the input by target value and issue one _update_by_query per
group over the cross-dim <base>_* pattern. Caller values flow through
bound script params only — never string-interpolated into the Painless
source — closing the script-injection surface.
- inspectByQueryResponse (byquery.go) mirrors inspectBulkResponse: the
full failure reason goes to the debug log only; the returned error
carries the bounded id + type.
- UpdateByQueryParams.Refresh is *bool in opensearch-go v4.6.0 (the same
shape as DeleteByQuery's quirk), so refresh=wait_for is not
expressible; we use refresh=true.
Driver-owned audit (DIP)
- A new opensearch.AuditSink interface (with nopSink + WithAuditSink
functional option) lets the driver emit opensearch.index_created and
opensearch.reindex_executed events without importing any service
package — the service layer implements the interface. NewRepository
takes opts, so existing 4-arg test call sites keep compiling unchanged.
- internal/container/audit_sink.go bridges AuditSink to AuditLogService.
When the context carries no tenant (the env-path registration ctx
during boot, for example) the adapter skips the emit with a warning
rather than silently writing tenant_id=0, which would collide with the
system-scope sentinel.
Frontend + polish
- FieldSchema (frontend/src/api/vector-store.ts) gains min/max/enum/
immutable. VectorStoreSettings.vue is now schema-driven: a closed
`enum` renders a t-select; number inputs use the schema's `:min`/`:max`
and fall back to the legacy replica-vs-shard heuristic only when the
schema does not pin them; a danger-coloured warning fires when
insecure_skip_verify is toggled on (the switch and warning are wrapped
in a vertical stack so the warning sits on its own row below the switch).
- i18n: labels for hnsw_m / hnsw_ef_construction / hnsw_ef_search /
knn_engine / insecure_skip_verify plus the warning copy in en-US,
ko-KR, zh-CN, ru-RU.
- docker-compose.dev.yml: an opensearch profile (single-node 3.3.2 with
security plugin disabled for dev only). OpenSearch Dashboards lives in a
separate, opt-in opensearch-ui profile so the heavy UI container is not
forced up alongside the cluster (the driver e2e is fully curl-verifiable
against :9200). The new docs/dev/opensearch-integration-test.md covers the
end-to-end exercise and the single-node guidance (set replicas=0 to keep
the cluster Green).
Gating-guard tests flipped
- The "OpenSearch is NOT in validEngineTypes / mapping / types list /
env builder / stubs" guard tests from PR 1 / PR 2 are replaced by
their positive counterparts in this PR. The test suite was the
activation checklist; the activation flip is its diff.
Backward compatibility
- Additive everywhere. IndexConfig's new HNSW fields are omitempty so
other engines' serialised config is byte-identical. Existing
Elasticsearch / Qdrant / Milvus / Weaviate / Doris / TencentVectorDB
stores are untouched. No migrations.
Test plan
- go build ./... clean
- go vet ./... clean
- gofmt -l clean on touched files
- go test ./... — only TestOssEnsureBucket_CreateFails (Aliyun OSS
endpoint), the docreader gRPC tests, and the doris SQL-shape tests
fail; all three are pre-existing on upstream/main and untouched by
this PR.
- New tests across internal/types, opensearch, service and container —
including a full end-to-end env-path test that exercises
initRetrieveEngineRegistry with RETRIEVE_DRIVER=opensearch against an
httptest cluster.
Add VectorStoreService with CRUD validation, duplicate checking (DB + env
stores), and engine-specific health checks for 6 vector database types.
Include VectorStoreResponse DTO, env store builder, engine type metadata,
and comprehensive unit tests.