Files
WeKnora/go.mod
issunion 4cce6f2e99 feat(retriever): 接入 Apache Doris 4.1 作为向量数据库
为 RetrieveEngine 体系新增 Doris 后端,与现有 Qdrant/Milvus/Weaviate
等保持完整能力对齐:向量检索、关键词检索、健康检查、环境变量与多实例
DB 配置、前端类型注册、单元测试、Docker Compose 模板。

实现要点:
- 协议分工:主链路用 MySQL 协议(database/sql + go-sql-driver/mysql)
  做 DDL / 查询 / 删除;批量更新走 Stream Load HTTP API,并启用
  partial_update=true、merge_type=APPEND,自动按 1MiB 切分批次并处理
  307 重定向。
- 表结构:UNIQUE KEY(id) + enable_unique_key_merge_on_write=true 以
  支持 upsert/部分列更新;按维度分表(<base>_<dim>),每张表上建
  HNSW ANN 索引(metric_type=cosine_distance)和 INVERTED 索引
  (parser=chinese)。
- 分数语义:使用 cosine_distance_approximate,再以 1 - dist 转换为
  "越大越相似",与现有 KVHybridRetrieveEngine 约定一致。
- 异步索引:ANN 索引为后台构建,ensureTable 通过轮询 SHOW INDEX 等
  待索引就绪后再放行写入,避免首次检索召回为空。
- ARRAY<FLOAT> 序列化:go-sql-driver/mysql 不支持数组占位符,
  embeddingLiteral 将 []float32 转成 SQL 字面量字符串再拼接。

新增文件:
- internal/application/repository/retriever/doris/{structs,schema,
  query,repository,streamload,repository_test}.go
- scripts/e2e-doris.sh:E2E 验证清单
- docs/wiki/集成扩展/Doris改动与上游同步.md:fork-and-rebase 工作流
  与改动清单

修改文件(接线 + 文档):
- internal/types/{retriever,tenant,vectorstore}.go:新增
  DorisRetrieverEngineType、env 解析、表单 schema 与索引参数校验
- internal/container/{container,engine_factory}.go:环境变量驱动
  与 VectorStore 配置驱动两条路径都支持 Doris
- internal/application/service/vectorstore{,_healthcheck}.go:连接
  校验 + Ping/Version 健康检查
- docker-compose.yml:新增 doris-fe / doris-be 服务(profile=doris)
- .env.example:DORIS_* 环境变量与示例
- docs/{使用其他向量数据库,wiki/集成扩展/集成向量数据库}.md:
  使用说明与索引/分数行为说明

依赖:go.mod/go.sum 新增 github.com/go-sql-driver/mysql(运行时)和
github.com/DATA-DOG/go-sqlmock(测试)。

测试:repository 层 SQL 形状、Stream Load HTTP 行为、whereBuilder
逻辑、embeddingLiteral 往返、健康检查错误路径均有单测覆盖。
2026-05-08 21:59:35 +08:00

329 lines
16 KiB
Modula-2

module github.com/Tencent/WeKnora
go 1.24.11
require (
github.com/JohannesKaufmann/html-to-markdown/v2 v2.5.0
github.com/PuerkitoBio/goquery v1.10.3
github.com/aliyun/alibabacloud-oss-go-sdk-v2 v1.4.1
github.com/asg017/sqlite-vec-go-bindings v0.1.6
github.com/aws/aws-sdk-go-v2 v1.41.3
github.com/aws/aws-sdk-go-v2/config v1.29.14
github.com/aws/aws-sdk-go-v2/credentials v1.19.11
github.com/aws/aws-sdk-go-v2/service/s3 v1.83.0
github.com/chromedp/chromedp v0.14.2
github.com/duckdb/duckdb-go/v2 v2.5.4
github.com/elastic/go-elasticsearch/v7 v7.17.10
github.com/elastic/go-elasticsearch/v8 v8.18.0
github.com/gin-contrib/cors v1.7.5
github.com/gin-gonic/gin v1.11.0
github.com/go-openapi/strfmt v0.25.0
github.com/go-viper/mapstructure/v2 v2.4.0
github.com/golang-jwt/jwt/v5 v5.3.0
github.com/golang-migrate/migrate/v4 v4.19.1
github.com/google/jsonschema-go v0.4.2
github.com/google/uuid v1.6.0
github.com/gorilla/websocket v1.5.3
github.com/hibiken/asynq v0.25.1
github.com/jackc/pgx/v5 v5.7.2
github.com/joho/godotenv v1.5.1
github.com/larksuite/oapi-sdk-go/v3 v3.5.3
github.com/longbridgeapp/opencc v0.3.13
github.com/mark3labs/mcp-go v0.43.0
github.com/milvus-io/milvus/client/v2 v2.6.2
github.com/minio/minio-go/v7 v7.0.91
github.com/neo4j/neo4j-go-driver/v6 v6.0.0
github.com/ollama/ollama v0.11.4
github.com/open-dingtalk/dingtalk-stream-sdk-go v0.9.1
github.com/panjf2000/ants/v2 v2.11.3
github.com/parquet-go/parquet-go v0.25.0
github.com/pganalyze/pg_query_go/v6 v6.1.0
github.com/pgvector/pgvector-go v0.3.0
github.com/qdrant/go-client v1.16.1
github.com/redis/go-redis/v9 v9.14.0
github.com/robfig/cron/v3 v3.0.1
github.com/sashabaranov/go-openai v1.40.5
github.com/sirupsen/logrus v1.9.3
github.com/slack-go/slack v0.18.0-rc2
github.com/spf13/viper v1.20.1
github.com/stretchr/testify v1.11.1
github.com/swaggo/files v1.0.1
github.com/swaggo/gin-swagger v1.6.1
github.com/swaggo/swag v1.16.6
github.com/tencentyun/cos-go-sdk-v5 v0.7.65
github.com/tiktoken-go/tokenizer v0.7.0
github.com/volcengine/ve-tos-golang-sdk/v2 v2.7.23
github.com/wailsapp/wails/v2 v2.12.0
github.com/weaviate/weaviate v1.33.0-rc.1
github.com/weaviate/weaviate-go-client/v5 v5.5.0
github.com/xuri/excelize/v2 v2.10.1
github.com/yanyiwu/gojieba v1.4.5
go.opentelemetry.io/otel v1.38.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.37.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.37.0
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.35.0
go.opentelemetry.io/otel/sdk v1.38.0
go.opentelemetry.io/otel/trace v1.38.0
go.uber.org/dig v1.18.1
golang.org/x/crypto v0.48.0
golang.org/x/mod v0.32.0
golang.org/x/net v0.50.0
golang.org/x/sync v0.19.0
golang.org/x/time v0.14.0
google.golang.org/api v0.259.0
google.golang.org/grpc v1.78.0
google.golang.org/protobuf v1.36.11
gopkg.in/natefinch/lumberjack.v2 v2.2.1
gopkg.in/yaml.v3 v3.0.1
gorm.io/driver/postgres v1.5.11
gorm.io/driver/sqlite v1.6.0
gorm.io/gorm v1.30.0
)
require (
cloud.google.com/go/auth v0.18.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/compute/metadata v0.9.0 // indirect
filippo.io/edwards25519 v1.2.0 // indirect
git.sr.ht/~jackmordaunt/go-toast/v2 v2.0.3 // indirect
github.com/DATA-DOG/go-sqlmock v1.5.2 // indirect
github.com/JohannesKaufmann/dom v0.2.0 // indirect
github.com/KyleBanks/depth v1.2.1 // indirect
github.com/andybalholm/brotli v1.2.0 // indirect
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/apache/arrow-go/v18 v18.4.1 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.6 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.19 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.19 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.19 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.5 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.20 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.6 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.11 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.19 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.19 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.12 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.16 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.8 // indirect
github.com/aws/smithy-go v1.24.2 // indirect
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/bep/debounce v1.2.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/bytedance/sonic v1.14.0 // indirect
github.com/bytedance/sonic/loader v0.3.0 // indirect
github.com/cenkalti/backoff/v4 v4.3.0 // indirect
github.com/cenkalti/backoff/v5 v5.0.2 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/chromedp/cdproto v0.0.0-20250724212937-08a3db8b4327 // indirect
github.com/chromedp/sysutil v1.1.0 // indirect
github.com/cilium/ebpf v0.11.0 // indirect
github.com/clbanning/mxj v1.8.4 // indirect
github.com/cloudwego/base64x v0.1.6 // indirect
github.com/cockroachdb/errors v1.9.1 // indirect
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f // indirect
github.com/cockroachdb/redact v1.1.3 // indirect
github.com/containerd/cgroups/v3 v3.0.3 // indirect
github.com/coreos/go-semver v0.3.0 // indirect
github.com/coreos/go-systemd/v22 v22.3.2 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
github.com/dlclark/regexp2 v1.11.5 // indirect
github.com/docker/go-units v0.5.0 // indirect
github.com/duckdb/duckdb-go-bindings v0.1.24 // indirect
github.com/duckdb/duckdb-go-bindings/darwin-amd64 v0.1.24 // indirect
github.com/duckdb/duckdb-go-bindings/darwin-arm64 v0.1.24 // indirect
github.com/duckdb/duckdb-go-bindings/linux-amd64 v0.1.24 // indirect
github.com/duckdb/duckdb-go-bindings/linux-arm64 v0.1.24 // indirect
github.com/duckdb/duckdb-go-bindings/windows-amd64 v0.1.24 // indirect
github.com/duckdb/duckdb-go/arrowmapping v0.0.27 // indirect
github.com/duckdb/duckdb-go/mapping v0.0.27 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/elastic/elastic-transport-go/v8 v8.7.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/form3tech-oss/jwt-go v3.2.5+incompatible // indirect
github.com/fsnotify/fsnotify v1.9.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/gabriel-vasile/mimetype v1.4.8 // indirect
github.com/getsentry/sentry-go v0.30.0 // indirect
github.com/gin-contrib/sse v1.1.0 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/go-json-experiment/json v0.0.0-20250725192818-e39067aee2d2 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.3.0 // indirect
github.com/go-openapi/analysis v0.24.1 // indirect
github.com/go-openapi/errors v0.22.4 // indirect
github.com/go-openapi/jsonpointer v0.22.1 // indirect
github.com/go-openapi/jsonreference v0.21.3 // indirect
github.com/go-openapi/loads v0.23.2 // indirect
github.com/go-openapi/runtime v0.29.2 // indirect
github.com/go-openapi/spec v0.22.1 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/go-openapi/swag/conv v0.25.1 // indirect
github.com/go-openapi/swag/fileutils v0.25.1 // indirect
github.com/go-openapi/swag/jsonname v0.25.1 // indirect
github.com/go-openapi/swag/jsonutils v0.25.1 // indirect
github.com/go-openapi/swag/loading v0.25.1 // indirect
github.com/go-openapi/swag/mangling v0.25.1 // indirect
github.com/go-openapi/swag/stringutils v0.25.1 // indirect
github.com/go-openapi/swag/typeutils v0.25.1 // indirect
github.com/go-openapi/swag/yamlutils v0.25.1 // indirect
github.com/go-openapi/validate v0.25.1 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.27.0 // indirect
github.com/go-sql-driver/mysql v1.10.0 // indirect
github.com/gobwas/httphead v0.1.0 // indirect
github.com/gobwas/pool v0.2.1 // indirect
github.com/gobwas/ws v1.4.0 // indirect
github.com/goccy/go-json v0.10.5 // indirect
github.com/goccy/go-yaml v1.18.0 // indirect
github.com/godbus/dbus/v5 v5.1.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/btree v1.1.3 // indirect
github.com/google/flatbuffers v25.9.23+incompatible // indirect
github.com/google/go-querystring v1.1.0 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.7 // indirect
github.com/googleapis/gax-go/v2 v2.16.0 // indirect
github.com/grpc-ecosystem/go-grpc-middleware v1.4.0 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.1 // indirect
github.com/invopop/jsonschema v0.13.0 // indirect
github.com/jackc/pgpassfile v1.0.0 // indirect
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
github.com/jackc/puddle/v2 v2.2.2 // indirect
github.com/jchv/go-winloader v0.0.0-20210711035445-715c2860da7e // indirect
github.com/jinzhu/inflection v1.0.0 // indirect
github.com/jinzhu/now v1.1.5 // indirect
github.com/jonboulle/clockwork v0.5.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.13-0.20220915233716-71ac16282d12 // indirect
github.com/klauspost/compress v1.18.2 // indirect
github.com/klauspost/cpuid/v2 v2.3.0 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/ks3sdklib/aws-sdk-go v1.11.0 // indirect
github.com/labstack/echo/v4 v4.13.3 // indirect
github.com/labstack/gommon v0.4.2 // indirect
github.com/leaanthony/go-ansi-parser v1.6.1 // indirect
github.com/leaanthony/gosod v1.0.4 // indirect
github.com/leaanthony/slicer v1.6.0 // indirect
github.com/leaanthony/u v1.1.1 // indirect
github.com/leodido/go-urn v1.4.0 // indirect
github.com/lib/pq v1.10.9 // indirect
github.com/liuzl/cedar-go v0.0.0-20170805034717-80a9c64b256d // indirect
github.com/liuzl/da v0.0.0-20180704015230-14771aad5b1d // indirect
github.com/lufia/plan9stats v0.0.0-20251013123823-9fd1530e3ec3 // indirect
github.com/mailru/easyjson v0.9.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.16 // indirect
github.com/mattn/go-sqlite3 v1.14.22 // indirect
github.com/milvus-io/milvus-proto/go-api/v2 v2.6.8-0.20251223041313-25746c47c1a7 // indirect
github.com/milvus-io/milvus/pkg/v2 v2.6.7-0.20251201120310-af64f2acba38 // indirect
github.com/minio/crc64nvme v1.0.1 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/mozillazg/go-httpheader v0.2.1 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/oklog/ulid v1.3.1 // indirect
github.com/olekukonko/tablewriter v0.0.5 // indirect
github.com/opencontainers/runtime-spec v1.0.2 // indirect
github.com/pelletier/go-toml/v2 v2.2.4 // indirect
github.com/pierrec/lz4/v4 v4.1.22 // indirect
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
github.com/prometheus/client_golang v1.20.5 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.65.0 // indirect
github.com/prometheus/procfs v0.15.1 // indirect
github.com/quic-go/qpack v0.5.1 // indirect
github.com/quic-go/quic-go v0.54.0 // indirect
github.com/richardlehane/mscfb v1.0.6 // indirect
github.com/richardlehane/msoleps v1.0.6 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rogpeppe/go-internal v1.14.1 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/sagikazarmark/locafero v0.7.0 // indirect
github.com/samber/lo v1.49.1 // indirect
github.com/shirou/gopsutil/v3 v3.23.12 // indirect
github.com/shoenig/go-m1cpu v0.1.6 // indirect
github.com/soheilhy/cmux v0.1.5 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
github.com/spf13/afero v1.15.0 // indirect
github.com/spf13/cast v1.10.0 // indirect
github.com/spf13/pflag v1.0.6 // indirect
github.com/subosito/gotenv v1.6.0 // indirect
github.com/tidwall/gjson v1.17.1 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.0 // indirect
github.com/tiendc/go-deepcopy v1.7.2 // indirect
github.com/tklauser/go-sysconf v0.3.16 // indirect
github.com/tklauser/numcpus v0.11.0 // indirect
github.com/tkrajina/go-reflector v0.5.8 // indirect
github.com/tmc/grpc-websocket-proxy v0.0.0-20201229170055-e5319fda7802 // indirect
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/twpayne/go-geom v1.6.1 // indirect
github.com/uber/jaeger-client-go v2.30.0+incompatible // indirect
github.com/ugorji/go/codec v1.3.0 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasttemplate v1.2.2 // indirect
github.com/wailsapp/go-webview2 v1.0.22 // indirect
github.com/wailsapp/mimetype v1.4.1 // indirect
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
github.com/x448/float16 v0.8.4 // indirect
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2 // indirect
github.com/xuri/efp v0.0.1 // indirect
github.com/xuri/nfp v0.0.2-0.20250530014748-2ddeb826f9a9 // indirect
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
github.com/zeebo/xxh3 v1.0.2 // indirect
go.etcd.io/bbolt v1.4.3 // indirect
go.etcd.io/etcd/api/v3 v3.5.5 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.5.5 // indirect
go.etcd.io/etcd/client/v2 v2.305.5 // indirect
go.etcd.io/etcd/client/v3 v3.5.5 // indirect
go.etcd.io/etcd/pkg/v3 v3.5.5 // indirect
go.etcd.io/etcd/raft/v3 v3.5.5 // indirect
go.etcd.io/etcd/server/v3 v3.5.5 // indirect
go.mongodb.org/mongo-driver v1.17.6 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.63.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
go.opentelemetry.io/otel/metric v1.38.0 // indirect
go.opentelemetry.io/proto/otlp v1.7.1 // indirect
go.uber.org/atomic v1.11.0 // indirect
go.uber.org/automaxprocs v1.5.3 // indirect
go.uber.org/mock v0.5.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/arch v0.20.0 // indirect
golang.org/x/exp v0.0.0-20251209150349-8475f28825e9 // indirect
golang.org/x/oauth2 v0.34.0 // indirect
golang.org/x/sys v0.41.0 // indirect
golang.org/x/telemetry v0.0.0-20260109210033-bd525da824e2 // indirect
golang.org/x/text v0.34.0 // indirect
golang.org/x/tools v0.41.0 // indirect
golang.org/x/xerrors v0.0.0-20240903120638-7835f813f4da // indirect
google.golang.org/genproto v0.0.0-20251202230838-ff82c1b0f217 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20251202230838-ff82c1b0f217 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20251222181119-0a764e51fe1b // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
k8s.io/apimachinery v0.32.3 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
replace go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc => go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.59.0