mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
feat(system-info): surface DB migration errors with troubleshooting links
When a startup database migration fails (e.g. issue #1319: pg_trgm not available so 000041 cannot build its trigram index), the application intentionally keeps booting so operators can reach the UI to diagnose. However, before this change the system info page silently dropped the "DB Version" row because the value was empty: - migration.go only cached the version after a successful m.Up(); the error path returned early and left migrationVersionSet=false. - system.go used CachedMigrationVersion's ok=false to skip emitting db_version, and the JSON tag was already omitempty. - SystemInfo.vue gated the entire row on v-if="systemInfo?.db_version". The end result was the most useful diagnostic surface vanishing in the exact failure mode that needs it most — Wiki ingest and KG features would silently produce nothing with no UI hint. Changes: - migration.go: replace sync.Once-based setter with an RWMutex-guarded state struct holding {version, dirty, err}. Every failure path now calls captureMigrationFailure(m, err), which best-effort reads m.Version() so the cached value still reflects the partial state. - system.go (handler): always emit db_version (falling back to "unknown" when no version could be read), append " (failed)" when an error is recorded, and add db_migration_error to the response. - swagger / client SDK: keep the API contract in sync with the new response field. - SystemInfo.vue: render the DB version row whenever either field is present, show a "Migration failed" danger tag, and add a full-width alert below the row carrying the error message plus two links: 1. View troubleshooting guide -> new docs/migration-troubleshooting.md 2. Report an issue -> github.com/Tencent/WeKnora/issues/new, prefilled with the captured error and environment metadata. - docs/migration-troubleshooting.md: new self-service guide covering the common failure modes (missing extension, dirty state, privileges, out of disk, schema drift) with concrete psql / make commands. - i18n: add the new keys to zh-CN, en-US, ko-KR, ru-RU. Refs #1319. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -18,6 +18,7 @@ type SystemInfo struct {
|
||||
GraphDatabaseEngine string `json:"graph_database_engine,omitempty"`
|
||||
MinioEnabled bool `json:"minio_enabled,omitempty"`
|
||||
DBVersion string `json:"db_version,omitempty"`
|
||||
DBMigrationError string `json:"db_migration_error,omitempty"`
|
||||
}
|
||||
|
||||
// ParserEngine represents a document parser engine
|
||||
|
||||
@@ -15933,6 +15933,10 @@ const docTemplate = `{
|
||||
"commit_id": {
|
||||
"type": "string"
|
||||
},
|
||||
"db_migration_error": {
|
||||
"description": "DBMigrationError carries the human-readable error message recorded when\nthe most recent startup migration attempt failed. Empty when migrations\nsucceeded; non-empty values let the frontend surface a troubleshooting\nbanner instead of silently hiding the DB version row (see issue #1319).",
|
||||
"type": "string"
|
||||
},
|
||||
"db_version": {
|
||||
"type": "string"
|
||||
},
|
||||
|
||||
181
docs/migration-troubleshooting.md
Normal file
181
docs/migration-troubleshooting.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Database migration troubleshooting
|
||||
|
||||
This guide is linked from the system info page when WeKnora's startup database
|
||||
migration fails. It covers the most common causes, how to diagnose them, and
|
||||
how to recover without losing data.
|
||||
|
||||
If none of these match your situation, jump to
|
||||
[Reporting an issue](#reporting-an-issue) at the bottom.
|
||||
|
||||
---
|
||||
|
||||
## What "migration failed" means
|
||||
|
||||
WeKnora auto-runs `golang-migrate` migrations on every startup. When a
|
||||
migration fails, the application **still finishes starting up** (so the UI
|
||||
remains reachable to help you diagnose the problem), but:
|
||||
|
||||
- The failing migration is rolled back, leaving the database at the previous
|
||||
version. Any tables / indexes introduced by that migration **are not
|
||||
created**.
|
||||
- Downstream features depending on those tables (Wiki ingest, knowledge graph,
|
||||
task queues, …) may silently produce nothing.
|
||||
- The system info page shows the partial DB version + a red "Migration failed"
|
||||
tag and the captured error.
|
||||
|
||||
The cached error message you see in the UI is the same one logged at startup
|
||||
under `Database migration failed: ...`. Recent container logs are the
|
||||
authoritative source — copy them before doing anything destructive.
|
||||
|
||||
---
|
||||
|
||||
## Common causes
|
||||
|
||||
### 1. Missing PostgreSQL extension
|
||||
|
||||
Many migrations require extensions (`pg_trgm`, `vector`, `pg_search`) created
|
||||
by `CREATE EXTENSION IF NOT EXISTS`. **`IF NOT EXISTS` does not validate that
|
||||
the extension is actually installed** — it only checks the catalog. If the
|
||||
extension's shared library is missing or the role lacks `CREATE` privilege,
|
||||
the statement may succeed in the migration that nominally creates it but a
|
||||
later migration that uses the extension (e.g. building a `gin_trgm_ops` index)
|
||||
will fail.
|
||||
|
||||
**Symptoms in the error**:
|
||||
|
||||
```
|
||||
ERROR: operator class "gin_trgm_ops" does not exist for access method "gin"
|
||||
ERROR: type "vector" does not exist
|
||||
ERROR: function ... does not exist
|
||||
```
|
||||
|
||||
**Fix**:
|
||||
|
||||
```sql
|
||||
-- Connect as a superuser (typically `postgres`):
|
||||
\c your_weknora_database
|
||||
CREATE EXTENSION IF NOT EXISTS pg_trgm;
|
||||
CREATE EXTENSION IF NOT EXISTS vector; -- if RETRIEVE_DRIVER includes pgvector
|
||||
CREATE EXTENSION IF NOT EXISTS pg_search; -- only on ParadeDB
|
||||
|
||||
-- Verify they are actually loaded:
|
||||
SELECT extname, extversion FROM pg_extension WHERE extname IN ('pg_trgm','vector','pg_search');
|
||||
```
|
||||
|
||||
Then restart WeKnora. The next startup will pick up where the failing
|
||||
migration left off.
|
||||
|
||||
If `CREATE EXTENSION` itself errors with **"could not open extension control
|
||||
file"** or **"permission denied"**, the extension is not installed on your
|
||||
PostgreSQL server — install the corresponding OS package (e.g.
|
||||
`postgresql-contrib` for `pg_trgm`) or switch to an image that ships it
|
||||
preinstalled, then retry.
|
||||
|
||||
### 2. Dirty migration state
|
||||
|
||||
If a migration crashed partway through (OOM, container kill, network blip)
|
||||
`golang-migrate` marks the schema as "dirty" at the failing version. By
|
||||
default, WeKnora's startup tries to auto-recover; if you disabled that with
|
||||
`AUTO_RECOVER_DIRTY=false` you'll see:
|
||||
|
||||
```
|
||||
database is in dirty state at version N. ...
|
||||
```
|
||||
|
||||
**Fix** — use the bundled helpers:
|
||||
|
||||
```bash
|
||||
# Check the recorded version
|
||||
make migrate-version
|
||||
|
||||
# Force the version to the last successful migration (N - 1 in the message)
|
||||
make migrate-force version=<N-1>
|
||||
|
||||
# Re-run pending migrations
|
||||
make migrate-up
|
||||
```
|
||||
|
||||
After that, restart WeKnora.
|
||||
|
||||
Or set `AUTO_RECOVER_DIRTY=true` (the default in recent versions) and just
|
||||
restart — startup will perform the same `force` + retry automatically.
|
||||
|
||||
### 3. Insufficient privileges on the database role
|
||||
|
||||
Some migrations create extensions or alter shared catalogs, which require
|
||||
either superuser or `CREATEROLE` / `CREATEDB`. Errors look like:
|
||||
|
||||
```
|
||||
ERROR: permission denied to create extension "pg_trgm"
|
||||
ERROR: must be owner of database ...
|
||||
```
|
||||
|
||||
**Fix**: grant the role used by `DB_USER` the necessary privileges, or
|
||||
pre-create the extensions / objects as a superuser ahead of time, then
|
||||
restart. The migration's `CREATE EXTENSION IF NOT EXISTS` will then no-op.
|
||||
|
||||
### 4. Out-of-disk during `CREATE INDEX`
|
||||
|
||||
GIN / pgvector indexes can require significant temporary space. Errors:
|
||||
|
||||
```
|
||||
ERROR: could not extend file ...: No space left on device
|
||||
ERROR: cannot create temporary tables in transaction
|
||||
```
|
||||
|
||||
**Fix**: free disk on the volume backing `PGDATA`, then restart. The
|
||||
migration will retry the index build.
|
||||
|
||||
### 5. Schema drift from manual edits
|
||||
|
||||
If you previously edited tables / columns by hand and a later migration
|
||||
expects the original shape, it will fail with mismatched-type errors. The
|
||||
safest recovery is to align the live schema with the previous successful
|
||||
migration's `*.up.sql` and then re-run pending migrations.
|
||||
|
||||
---
|
||||
|
||||
## Generic diagnostic checklist
|
||||
|
||||
1. **Read the full error**: the cached message in the UI is truncated only by
|
||||
your browser scroll — it is the complete `golang-migrate` error. The
|
||||
container log shows the same content with stack context.
|
||||
2. **Identify the failing migration**: the version number in the error (or
|
||||
`make migrate-version`) points to a file under `migrations/versioned/`.
|
||||
Open `migrations/versioned/<version>_*.up.sql` and look for the statement
|
||||
matching the error type (extension, index, function, foreign key, …).
|
||||
3. **Run the failing statement manually** against the DB using `psql`. The
|
||||
error will be far more specific than the migration wrapper's.
|
||||
4. **Fix the underlying cause** (install extension, fix privileges, free
|
||||
disk, …), then either:
|
||||
- Restart WeKnora and let auto-recovery retry; **or**
|
||||
- Run `make migrate-up` from a checkout to apply migrations outside the
|
||||
server process.
|
||||
5. **Verify**: the system info page should now show the DB version without
|
||||
the "Migration failed" tag, and the previously broken feature (Wiki, KG,
|
||||
…) should start producing output.
|
||||
|
||||
---
|
||||
|
||||
## Reporting an issue
|
||||
|
||||
If you've worked through the checklist and the migration still fails, please
|
||||
open an issue at:
|
||||
|
||||
<https://github.com/Tencent/WeKnora/issues/new?template=bug_report.yml>
|
||||
|
||||
Include:
|
||||
|
||||
- WeKnora version + commit ID (from the system info page).
|
||||
- The full error from the system info page (or container logs).
|
||||
- PostgreSQL version (`SELECT version();`) and how it was deployed (vanilla,
|
||||
ParadeDB, Aurora, Aliyun RDS, …).
|
||||
- The output of:
|
||||
```sql
|
||||
SELECT extname, extversion FROM pg_extension;
|
||||
```
|
||||
- Any non-default values of `RETRIEVE_DRIVER`, `AUTO_MIGRATE`, and
|
||||
`AUTO_RECOVER_DIRTY`.
|
||||
|
||||
The "Report issue" link on the system info page pre-fills a body with the
|
||||
captured error for you — clicking it is the fastest path.
|
||||
@@ -15926,6 +15926,10 @@
|
||||
"commit_id": {
|
||||
"type": "string"
|
||||
},
|
||||
"db_migration_error": {
|
||||
"description": "DBMigrationError carries the human-readable error message recorded when\nthe most recent startup migration attempt failed. Empty when migrations\nsucceeded; non-empty values let the frontend surface a troubleshooting\nbanner instead of silently hiding the DB version row (see issue #1319).",
|
||||
"type": "string"
|
||||
},
|
||||
"db_version": {
|
||||
"type": "string"
|
||||
},
|
||||
|
||||
@@ -3515,6 +3515,13 @@ definitions:
|
||||
type: string
|
||||
commit_id:
|
||||
type: string
|
||||
db_migration_error:
|
||||
description: |-
|
||||
DBMigrationError carries the human-readable error message recorded when
|
||||
the most recent startup migration attempt failed. Empty when migrations
|
||||
succeeded; non-empty values let the frontend surface a troubleshooting
|
||||
banner instead of silently hiding the DB version row (see issue #1319).
|
||||
type: string
|
||||
db_version:
|
||||
type: string
|
||||
edition:
|
||||
|
||||
@@ -11,6 +11,10 @@ export interface SystemInfo {
|
||||
graph_database_engine?: string
|
||||
minio_enabled?: boolean
|
||||
db_version?: string
|
||||
/** Human-readable error message when the startup migration failed.
|
||||
* When non-empty, the system info view should surface a troubleshooting
|
||||
* banner (see docs/migration-troubleshooting.md). */
|
||||
db_migration_error?: string
|
||||
}
|
||||
|
||||
export interface ToolDefinition {
|
||||
|
||||
@@ -2416,6 +2416,11 @@ export default {
|
||||
goVersionDescription: 'Go language version used by the backend',
|
||||
dbVersionLabel: 'Database Version',
|
||||
dbVersionDescription: 'Current database migration version',
|
||||
dbMigrationFailedTag: 'Migration failed',
|
||||
dbMigrationFailedTitle: 'Database migration failed',
|
||||
dbMigrationFailedDesc: 'The startup database migration did not complete successfully. Some tables or indexes may be missing, which can break Wiki ingest, the knowledge graph, and other features. Check the troubleshooting guide below first; if the issue persists, report it via the link.',
|
||||
dbMigrationViewDocs: 'View troubleshooting guide',
|
||||
dbMigrationReportIssue: "Can't fix it? Report an issue",
|
||||
keywordIndexEngineLabel: 'Keyword Index Engine',
|
||||
keywordIndexEngineDescription: 'Currently used keyword index engine',
|
||||
vectorStoreEngineLabel: 'Vector Store Engine',
|
||||
|
||||
@@ -1674,6 +1674,11 @@ export default {
|
||||
goVersionDescription: "백엔드에서 사용하는 Go 언어 버전",
|
||||
dbVersionLabel: "데이터베이스 버전",
|
||||
dbVersionDescription: "현재 데이터베이스 마이그레이션 버전",
|
||||
dbMigrationFailedTag: "마이그레이션 실패",
|
||||
dbMigrationFailedTitle: "데이터베이스 마이그레이션 실패",
|
||||
dbMigrationFailedDesc: "시작 시 데이터베이스 마이그레이션이 정상적으로 완료되지 않았습니다. 일부 테이블이나 인덱스가 생성되지 않았을 수 있으며, Wiki와 지식 그래프 기능 등이 동작하지 않을 수 있습니다. 먼저 아래 문제 해결 문서를 확인하여 직접 복구해 보고, 그래도 해결되지 않으면 이슈를 등록해 주세요.",
|
||||
dbMigrationViewDocs: "문제 해결 문서 보기",
|
||||
dbMigrationReportIssue: "해결되지 않나요? 이슈 등록",
|
||||
keywordIndexEngineLabel: "키워드 인덱스 엔진",
|
||||
keywordIndexEngineDescription: "현재 사용 중인 키워드 인덱스 엔진",
|
||||
vectorStoreEngineLabel: "벡터 저장소 엔진",
|
||||
|
||||
@@ -1491,6 +1491,11 @@ export default {
|
||||
goVersionDescription: 'Версия языка Go, используемая backend',
|
||||
dbVersionLabel: 'Версия базы данных',
|
||||
dbVersionDescription: 'Текущая версия миграции базы данных',
|
||||
dbMigrationFailedTag: 'Миграция не удалась',
|
||||
dbMigrationFailedTitle: 'Сбой миграции базы данных',
|
||||
dbMigrationFailedDesc: 'Миграция базы данных при запуске не завершилась успешно. Часть таблиц или индексов могла не создаться, из-за чего Wiki, граф знаний и другие функции могут не работать. Сначала ознакомьтесь с руководством по диагностике ниже; если проблема сохраняется, сообщите о ней по ссылке.',
|
||||
dbMigrationViewDocs: 'Открыть руководство по диагностике',
|
||||
dbMigrationReportIssue: 'Не удаётся починить? Создать issue',
|
||||
keywordIndexEngineLabel: 'Движок индексации ключевых слов',
|
||||
keywordIndexEngineDescription: 'Используемый в настоящее время движок индексации ключевых слов',
|
||||
vectorStoreEngineLabel: 'Движок векторного хранилища',
|
||||
|
||||
@@ -1654,6 +1654,11 @@ export default {
|
||||
goVersionDescription: "后端使用的 Go 语言版本",
|
||||
dbVersionLabel: "数据库版本",
|
||||
dbVersionDescription: "当前数据库迁移版本号",
|
||||
dbMigrationFailedTag: "迁移失败",
|
||||
dbMigrationFailedTitle: "数据库迁移失败",
|
||||
dbMigrationFailedDesc: "启动时数据库迁移未成功完成,部分表或索引可能未创建,会导致 Wiki、知识图谱等功能异常。建议先查看排查文档自助修复;如仍无法解决,请通过下方链接反馈。",
|
||||
dbMigrationViewDocs: "查看排查文档",
|
||||
dbMigrationReportIssue: "无法修复?提交 Issue",
|
||||
keywordIndexEngineLabel: "关键词索引引擎",
|
||||
keywordIndexEngineDescription: "当前使用的关键词索引引擎",
|
||||
vectorStoreEngineLabel: "向量存储引擎",
|
||||
|
||||
@@ -88,16 +88,50 @@
|
||||
</div>
|
||||
|
||||
<!-- DB Version -->
|
||||
<div v-if="systemInfo?.db_version" class="setting-row">
|
||||
<div v-if="systemInfo?.db_version || systemInfo?.db_migration_error" class="setting-row">
|
||||
<div class="setting-info">
|
||||
<label>{{ $t('system.dbVersionLabel') }}</label>
|
||||
<p class="desc">{{ $t('system.dbVersionDescription') }}</p>
|
||||
</div>
|
||||
<div class="setting-control">
|
||||
<span class="info-value">{{ systemInfo.db_version }}</span>
|
||||
<span class="info-value">
|
||||
{{ systemInfo?.db_version || $t('system.unknown') }}
|
||||
<t-tag
|
||||
v-if="systemInfo?.db_migration_error"
|
||||
theme="danger"
|
||||
variant="light"
|
||||
size="small"
|
||||
style="margin-left: 8px;"
|
||||
>{{ $t('system.dbMigrationFailedTag') }}</t-tag>
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- DB migration error: full-width banner under the row -->
|
||||
<div v-if="systemInfo?.db_migration_error" class="setting-row migration-error-row">
|
||||
<t-alert theme="error" :title="$t('system.dbMigrationFailedTitle')" style="width: 100%;">
|
||||
<template #default>
|
||||
<p class="migration-error-desc">{{ $t('system.dbMigrationFailedDesc') }}</p>
|
||||
<pre class="migration-error-detail">{{ systemInfo.db_migration_error }}</pre>
|
||||
<div class="migration-error-actions">
|
||||
<t-link
|
||||
theme="primary"
|
||||
:href="troubleshootingDocsURL"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
>{{ $t('system.dbMigrationViewDocs') }}</t-link>
|
||||
<span class="migration-error-actions-sep">·</span>
|
||||
<t-link
|
||||
theme="primary"
|
||||
:href="reportIssueURL"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
>{{ $t('system.dbMigrationReportIssue') }}</t-link>
|
||||
</div>
|
||||
</template>
|
||||
</t-alert>
|
||||
</div>
|
||||
|
||||
<!-- Keyword Index Engine -->
|
||||
<div class="setting-row">
|
||||
<div class="setting-info">
|
||||
@@ -136,7 +170,7 @@
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, onMounted } from 'vue'
|
||||
import { ref, computed, onMounted } from 'vue'
|
||||
import { getSystemInfo, type SystemInfo } from '@/api/system'
|
||||
import { useI18n } from 'vue-i18n'
|
||||
|
||||
@@ -148,6 +182,37 @@ const loading = ref(true)
|
||||
const error = ref('')
|
||||
const frontendVersion = __FRONTEND_VERSION__
|
||||
|
||||
const troubleshootingDocsURL =
|
||||
'https://github.com/Tencent/WeKnora/blob/main/docs/migration-troubleshooting.md'
|
||||
|
||||
// Pre-fills a new issue with the current migration error so users don't have to
|
||||
// paste it manually. Body is intentionally minimal — the bug template will fill
|
||||
// in the rest. Encode aggressively to survive newlines / quotes.
|
||||
const reportIssueURL = computed(() => {
|
||||
const base = 'https://github.com/Tencent/WeKnora/issues/new'
|
||||
const params = new URLSearchParams({
|
||||
template: 'bug_report.yml',
|
||||
title: '[Bug]: Database migration failed at startup',
|
||||
labels: 'bug',
|
||||
})
|
||||
const errMsg = systemInfo.value?.db_migration_error
|
||||
if (errMsg) {
|
||||
const body = [
|
||||
'### Environment',
|
||||
`- WeKnora version: ${systemInfo.value?.version || 'unknown'}`,
|
||||
`- Commit: ${systemInfo.value?.commit_id || 'unknown'}`,
|
||||
`- DB version reported: ${systemInfo.value?.db_version || 'unknown'}`,
|
||||
'',
|
||||
'### Migration error',
|
||||
'```',
|
||||
errMsg,
|
||||
'```',
|
||||
].join('\n')
|
||||
params.set('body', body)
|
||||
}
|
||||
return `${base}?${params.toString()}`
|
||||
})
|
||||
|
||||
// Methods
|
||||
const loadInfo = async () => {
|
||||
try {
|
||||
@@ -250,6 +315,44 @@ onMounted(() => {
|
||||
}
|
||||
}
|
||||
|
||||
.migration-error-row {
|
||||
display: block;
|
||||
padding: 0 0 20px 0;
|
||||
border-bottom: 1px solid var(--td-component-stroke);
|
||||
}
|
||||
|
||||
.migration-error-desc {
|
||||
margin: 0 0 8px 0;
|
||||
font-size: 13px;
|
||||
line-height: 1.5;
|
||||
color: var(--td-text-color-primary);
|
||||
}
|
||||
|
||||
.migration-error-detail {
|
||||
margin: 0 0 12px 0;
|
||||
padding: 8px 12px;
|
||||
background: var(--td-bg-color-container-hover);
|
||||
border-radius: 4px;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
max-height: 200px;
|
||||
overflow: auto;
|
||||
color: var(--td-text-color-secondary);
|
||||
}
|
||||
|
||||
.migration-error-actions {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
font-size: 13px;
|
||||
|
||||
.migration-error-actions-sep {
|
||||
color: var(--td-text-color-placeholder);
|
||||
}
|
||||
}
|
||||
|
||||
.setting-control {
|
||||
flex-shrink: 0;
|
||||
min-width: 280px;
|
||||
|
||||
@@ -16,24 +16,66 @@ import (
|
||||
)
|
||||
|
||||
var (
|
||||
migrationStateMu sync.RWMutex
|
||||
currentMigrationVersion uint
|
||||
currentMigrationDirty bool
|
||||
migrationVersionOnce sync.Once
|
||||
migrationVersionSet bool
|
||||
currentMigrationError string
|
||||
)
|
||||
|
||||
// CachedMigrationVersion returns the migration version captured at startup.
|
||||
// Returns (version, dirty, ok). ok is false if the version was never captured.
|
||||
//
|
||||
// Note: when migrations fail mid-way, the cache may still be populated via a
|
||||
// best-effort m.Version() call inside RunMigrationsWithOptions so the system
|
||||
// info endpoint can surface the partial state. Check CachedMigrationError() to
|
||||
// distinguish a clean version reading from a recorded-after-failure one.
|
||||
func CachedMigrationVersion() (uint, bool, bool) {
|
||||
migrationStateMu.RLock()
|
||||
defer migrationStateMu.RUnlock()
|
||||
return currentMigrationVersion, currentMigrationDirty, migrationVersionSet
|
||||
}
|
||||
|
||||
func setMigrationVersion(version uint, dirty bool) {
|
||||
migrationVersionOnce.Do(func() {
|
||||
// CachedMigrationError returns the error message captured when the most recent
|
||||
// migration attempt failed at startup. Empty string means migrations either
|
||||
// succeeded or were never run.
|
||||
func CachedMigrationError() string {
|
||||
migrationStateMu.RLock()
|
||||
defer migrationStateMu.RUnlock()
|
||||
return currentMigrationError
|
||||
}
|
||||
|
||||
// setMigrationState records the latest known migration state. Unlike the old
|
||||
// sync.Once-based setter, this is intentionally idempotent-overwrite so the
|
||||
// failure path (which runs after Up() errored) can replace the pre-migration
|
||||
// snapshot taken from the initial m.Version() call.
|
||||
func setMigrationState(version uint, dirty bool, errMsg string, versionKnown bool) {
|
||||
migrationStateMu.Lock()
|
||||
defer migrationStateMu.Unlock()
|
||||
if versionKnown {
|
||||
currentMigrationVersion = version
|
||||
currentMigrationDirty = dirty
|
||||
migrationVersionSet = true
|
||||
})
|
||||
}
|
||||
currentMigrationError = errMsg
|
||||
}
|
||||
|
||||
// captureMigrationFailure best-effort queries m for the current version so the
|
||||
// system info endpoint can show "N (failed)" instead of vanishing the row, and
|
||||
// stores the human-readable error message. Always returns the original error.
|
||||
func captureMigrationFailure(m *migrate.Migrate, err error) error {
|
||||
versionKnown := false
|
||||
var ver uint
|
||||
var dirty bool
|
||||
if m != nil {
|
||||
v, d, vErr := m.Version()
|
||||
if vErr == nil {
|
||||
versionKnown = true
|
||||
ver, dirty = v, d
|
||||
}
|
||||
}
|
||||
setMigrationState(ver, dirty, err.Error(), versionKnown)
|
||||
return err
|
||||
}
|
||||
|
||||
// RunMigrations executes all pending database migrations
|
||||
@@ -71,25 +113,33 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
sqlDB, err := sql.Open("sqlite3", opts.SQLiteDBPath)
|
||||
if err != nil {
|
||||
logger.Errorf(ctx, "Failed to open sqlite db for migration: %v", err)
|
||||
return fmt.Errorf("failed to open sqlite db for migration: %w", err)
|
||||
wrapped := fmt.Errorf("failed to open sqlite db for migration: %w", err)
|
||||
setMigrationState(0, false, wrapped.Error(), false)
|
||||
return wrapped
|
||||
}
|
||||
driver, err := sqlite3migrate.WithInstance(sqlDB, &sqlite3migrate.Config{})
|
||||
if err != nil {
|
||||
sqlDB.Close()
|
||||
logger.Errorf(ctx, "Failed to create sqlite3 migrate driver: %v", err)
|
||||
return fmt.Errorf("failed to create sqlite3 migrate driver: %w", err)
|
||||
wrapped := fmt.Errorf("failed to create sqlite3 migrate driver: %w", err)
|
||||
setMigrationState(0, false, wrapped.Error(), false)
|
||||
return wrapped
|
||||
}
|
||||
m, err = migrate.NewWithDatabaseInstance(migrationsPath, "sqlite3", driver)
|
||||
if err != nil {
|
||||
logger.Errorf(ctx, "Failed to create migrate instance: %v", err)
|
||||
return fmt.Errorf("failed to create migrate instance: %w", err)
|
||||
wrapped := fmt.Errorf("failed to create migrate instance: %w", err)
|
||||
setMigrationState(0, false, wrapped.Error(), false)
|
||||
return wrapped
|
||||
}
|
||||
} else {
|
||||
var err error
|
||||
m, err = migrate.New(migrationsPath, dsn)
|
||||
if err != nil {
|
||||
logger.Errorf(ctx, "Failed to create migrate instance: %v", err)
|
||||
return fmt.Errorf("failed to create migrate instance: %w", err)
|
||||
wrapped := fmt.Errorf("failed to create migrate instance: %w", err)
|
||||
setMigrationState(0, false, wrapped.Error(), false)
|
||||
return wrapped
|
||||
}
|
||||
}
|
||||
defer m.Close()
|
||||
@@ -98,7 +148,7 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
oldVersion, oldDirty, versionErr := m.Version()
|
||||
if versionErr != nil && versionErr != migrate.ErrNilVersion {
|
||||
logger.Errorf(ctx, "Failed to get migration version: %v", versionErr)
|
||||
return fmt.Errorf("failed to get migration version: %w", versionErr)
|
||||
return captureMigrationFailure(m, fmt.Errorf("failed to get migration version: %w", versionErr))
|
||||
}
|
||||
|
||||
if versionErr == migrate.ErrNilVersion {
|
||||
@@ -113,7 +163,7 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
if opts.AutoRecoverDirty {
|
||||
logger.Infof(ctx, "AutoRecoverDirty is enabled, attempting recovery...")
|
||||
if err := recoverFromDirtyState(ctx, m, oldVersion); err != nil {
|
||||
return err
|
||||
return captureMigrationFailure(m, err)
|
||||
}
|
||||
// Update oldVersion after recovery
|
||||
oldVersion, _, _ = m.Version()
|
||||
@@ -123,7 +173,7 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
if oldVersion == 0 || forceVersion < 0 {
|
||||
forceVersion = 0
|
||||
}
|
||||
return fmt.Errorf(
|
||||
return captureMigrationFailure(m, fmt.Errorf(
|
||||
"database is in dirty state at version %d. This usually means a migration failed partway through. "+
|
||||
"To fix this:\n"+
|
||||
"1. Check if the migration partially applied changes and manually fix if needed\n"+
|
||||
@@ -136,7 +186,7 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
forceVersion,
|
||||
forceVersion,
|
||||
forceVersion,
|
||||
)
|
||||
))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -152,13 +202,13 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
logger.Infof(ctx, "Attempting to recover from dirty state...")
|
||||
// Try to recover and retry
|
||||
if recoverErr := recoverFromDirtyState(ctx, m, currentVersion); recoverErr != nil {
|
||||
return recoverErr
|
||||
return captureMigrationFailure(m, recoverErr)
|
||||
}
|
||||
// Retry migration after recovery
|
||||
logger.Infof(ctx, "Retrying migration after recovery...")
|
||||
if retryErr := m.Up(); retryErr != nil && retryErr != migrate.ErrNoChange {
|
||||
logger.Errorf(ctx, "Migration failed after recovery attempt: %v", retryErr)
|
||||
return fmt.Errorf("migration failed after recovery attempt: %w", retryErr)
|
||||
return captureMigrationFailure(m, fmt.Errorf("migration failed after recovery attempt: %w", retryErr))
|
||||
}
|
||||
} else {
|
||||
// Calculate the version to force to (usually the previous version)
|
||||
@@ -166,7 +216,7 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
if currentVersion == 0 {
|
||||
forceVersion = 0
|
||||
}
|
||||
return fmt.Errorf(
|
||||
return captureMigrationFailure(m, fmt.Errorf(
|
||||
"migration failed and database is now in dirty state at version %d. "+
|
||||
"To fix this:\n"+
|
||||
"1. Check if the migration partially applied changes and manually fix if needed\n"+
|
||||
@@ -179,20 +229,20 @@ func RunMigrationsWithOptions(dsn string, opts MigrationOptions) error {
|
||||
forceVersion,
|
||||
forceVersion,
|
||||
forceVersion,
|
||||
)
|
||||
))
|
||||
}
|
||||
} else {
|
||||
return fmt.Errorf("failed to run migrations: %w", err)
|
||||
return captureMigrationFailure(m, fmt.Errorf("failed to run migrations: %w", err))
|
||||
}
|
||||
}
|
||||
|
||||
// Get current version after migration
|
||||
version, dirty, err := m.Version()
|
||||
if err != nil && err != migrate.ErrNilVersion {
|
||||
return fmt.Errorf("failed to get migration version: %w", err)
|
||||
return captureMigrationFailure(m, fmt.Errorf("failed to get migration version: %w", err))
|
||||
}
|
||||
|
||||
setMigrationVersion(version, dirty)
|
||||
setMigrationState(version, dirty, "", true)
|
||||
|
||||
if oldVersion != version {
|
||||
logger.Infof(ctx, "Database migrated from version %d to %d", oldVersion, version)
|
||||
|
||||
@@ -57,6 +57,11 @@ type GetSystemInfoResponse struct {
|
||||
GraphDatabaseEngine string `json:"graph_database_engine,omitempty"`
|
||||
MinioEnabled bool `json:"minio_enabled,omitempty"`
|
||||
DBVersion string `json:"db_version,omitempty"`
|
||||
// DBMigrationError carries the human-readable error message recorded when
|
||||
// the most recent startup migration attempt failed. Empty when migrations
|
||||
// succeeded; non-empty values let the frontend surface a troubleshooting
|
||||
// banner instead of silently hiding the DB version row (see issue #1319).
|
||||
DBMigrationError string `json:"db_migration_error,omitempty"`
|
||||
}
|
||||
|
||||
// 编译时注入的版本信息
|
||||
@@ -91,12 +96,21 @@ func (h *SystemHandler) GetSystemInfo(c *gin.Context) {
|
||||
// Get MinIO enabled status
|
||||
minioEnabled := h.isMinioConfigured(c)
|
||||
|
||||
dbMigrationErr := database.CachedMigrationError()
|
||||
var dbVersion string
|
||||
if ver, dirty, ok := database.CachedMigrationVersion(); ok {
|
||||
dbVersion = fmt.Sprintf("%d", ver)
|
||||
if dirty {
|
||||
dbVersion += " (dirty)"
|
||||
}
|
||||
if dbMigrationErr != "" {
|
||||
dbVersion += " (failed)"
|
||||
}
|
||||
} else if dbMigrationErr != "" {
|
||||
// Failure happened before m.Version() could be read (e.g. could not
|
||||
// open the database). Still emit a placeholder so the frontend renders
|
||||
// the row and shows the troubleshooting banner.
|
||||
dbVersion = "unknown"
|
||||
}
|
||||
|
||||
response := GetSystemInfoResponse{
|
||||
@@ -110,6 +124,7 @@ func (h *SystemHandler) GetSystemInfo(c *gin.Context) {
|
||||
GraphDatabaseEngine: graphDatabaseEngine,
|
||||
MinioEnabled: minioEnabled,
|
||||
DBVersion: dbVersion,
|
||||
DBMigrationError: dbMigrationErr,
|
||||
}
|
||||
|
||||
logger.Info(ctx, "System info retrieved successfully")
|
||||
|
||||
Reference in New Issue
Block a user