Files
WeKnora/internal
wizardchen 500c821817 feat(builtin-models): validate YAML entries and align ID length with schema
Three correctness fixes that the lifecycle PR deliberately deferred:

1. ID length / struct-tag drift
   - models.id is varchar(64) on both PG and SQLite (per the init
     migrations), but Model.ID's GORM tag said varchar(36) — a remnant
     from when the field only held UUIDs. The mismatch is harmless under
     golang-migrate (struct tag is ignored), but misleading on AutoMigrate
     paths and in IDE tooltips. Tag now matches the real column width.
   - New ModelIDMaxLen constant (=64) is the single source of truth for
     anyone accepting user-provided ids. The YAML loader uses it to
     reject too-long ids up front with a clear message instead of letting
     the INSERT explode with a generic "value too long for type" error.

2. Field validation in the YAML loader
   - Type, Source, and Status are typed strings but YAML can supply any
     value. Misspellings (e.g. `type: knowledgeqa` lowercase, `type: LLM`)
     were previously persisted as-is and produced rows that looked fine
     in the table but failed at provider-factory lookup time, which is
     hard to debug.
   - validateBuiltinModelEntry now checks: empty id, id length, empty
     type, type ∈ {KnowledgeQA, Embedding, Rerank, VLLM, ASR}, and
     status ∈ {active, downloading, download_failed, empty}. Source is
     intentionally NOT validated because the provider matrix in
     internal/models/* keeps growing and a strict allow-list here would
     force changes in two places per new provider.
   - Invalid entries are warned + skipped (not aborting the whole load),
     and excluded from the keep-set so the drift sweep does not delete
     existing matching rows on the strength of a typo'd YAML retry.

3. Magic number cleanup
   - DefaultBuiltinModelTenantID (=10000) replaces the hard-coded `10000`
     literal in toModel(). The invariant lives in three places already
     (PG migration, SQLite migration, this constant); naming it makes
     the cross-reference explicit and grep-able.

Tests:
- New TestLoadBuiltinModelsConfig_RejectsInvalidEntries with five
  sub-cases (id-too-long, missing-type, lowercase-type, unknown-type,
  unknown-status) asserts the table stays empty after each.
- All 11 existing tests still pass.
2026-05-26 11:37:03 +08:00
..