Files
WeKnora/REFACTORING_PLAN.md
wizardchen e147209e77 feat: Implement comprehensive wiki feature for knowledge bases
This commit introduces the complete wiki feature for WeKnora, enabling AI-powered wiki page generation and management. The implementation includes:

**Backend Changes:**
- Wiki data model: WikiPage type with support for multiple page types (Summary, Entity, Concept, Index, Log)
- Database schema: wiki_pages table with full migration support
- WikiPageService: CRUD operations and page management
- WikiPageRepository: GORM-based persistence layer
- Wiki ingest pipeline: Automated generation of wiki pages from knowledge documents
  * Summary page generation using LLM
  * Entity and concept extraction in a single LLM call
  * Synthesis opportunity detection
  * Index page rebuilding
  * Log page maintenance
- Wiki boost feature: Enhance chat retrieval with wiki context
- Wiki linting: Maintenance and validation utilities
- Agent wiki tools: Enable agents to query and interact with wiki pages
- Wiki prompts: Comprehensive LLM prompt templates for all wiki generation tasks
- Language support: Reuse existing middleware language infrastructure for LLM prompts

**Frontend Changes:**
- Wiki browser UI: View all wiki pages with filtering and search
- Wiki API client: Knowledge base wiki management endpoints
- Knowledge base editor: Configure wiki settings (language, auto-ingest, synthesis model)
- i18n updates: Support for English, Korean, Russian, and Chinese interfaces

**Configuration:**
- Container DI: Wire up all wiki services
- Router: Register wiki API endpoints
- Task handling: Support async wiki ingest tasks

**Testing:**
- Unit tests for wiki page types
- Service layer tests
- Endpoint tests for wiki operations
- Integration tests with LLM mocking

**Documentation:**
- Language refactoring analysis and guides
- Implementation completion reports
- Quick reference guides for developers

**Key Features:**
 LLM-powered wiki page generation from documents
 Multi-language support (9+ languages)
 Automatic extraction of entities and concepts
 Synthesis opportunity detection
 Index and log page maintenance
 Progressive wiki building across multiple documents
 Agent-based wiki interaction
 Chat retrieval enhancement with wiki context
 Full frontend UI for wiki browsing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 21:17:18 +08:00

6.3 KiB

Wiki Ingest Language Refactoring - Implementation Plan

Problem Statement

The wiki ingest pipeline contains hardcoded language mapping logic (lines 135-141 in wiki_ingest.go) that:

  • Only supports 2 languages (Chinese and English)
  • Duplicates logic already available in types.LanguageLocaleName()
  • Uses inconsistent naming compared to the existing language middleware
  • Cannot be easily extended for new languages

Solution

Reuse the existing types.LanguageLocaleName() function from /internal/types/context_helpers.go which:

  • Supports 9+ languages (Chinese, English, Korean, Japanese, Russian, French, German, Spanish, Portuguese)
  • Is already tested and used throughout the codebase
  • Provides consistent language name formatting for LLM prompts
  • Handles unknown languages gracefully

Implementation

Change Location

File: /internal/application/service/wiki_ingest.go Method: ProcessWikiIngest() Lines: 135-141

Before (Current Code)

135	    // Determine language
136	    lang := "the same language as the source document"
137	    if kb.WikiConfig.WikiLanguage == "zh" {
138	        lang = "Chinese (中文)"
139	    } else if kb.WikiConfig.WikiLanguage == "en" {
140	        lang = "English"
141	    }

After (Refactored Code)

135	    // Determine language - reuse middleware infrastructure for consistent naming
136	    // Supports: Chinese (Simplified/Traditional), English, Korean, Japanese, Russian, French, German, Spanish, Portuguese
137	    lang := types.LanguageLocaleName(kb.WikiConfig.WikiLanguage)

Code Diff

-   // Determine language
-   lang := "the same language as the source document"
-   if kb.WikiConfig.WikiLanguage == "zh" {
-       lang = "Chinese (中文)"
-   } else if kb.WikiConfig.WikiLanguage == "en" {
-       lang = "English"
-   }
+   // Determine language - reuse middleware infrastructure for consistent naming
+   // Supports: Chinese (Simplified/Traditional), English, Korean, Japanese, Russian, French, German, Spanish, Portuguese
+   lang := types.LanguageLocaleName(kb.WikiConfig.WikiLanguage)

Verification

The types package is already imported at line 14:

"github.com/Tencent/WeKnora/internal/types"

No additional imports needed.

Impact Analysis

Benefits

Code Reduction: 6 lines → 3 lines (50% reduction) Language Coverage: 2 languages → 9+ languages Consistency: Aligns with middleware language naming conventions Maintainability: Centralized language mapping (single source of truth) Extensibility: Adding new languages only requires updating LanguageLocaleName() Testing: Reuses existing, tested function

Supported Languages (After Refactoring)

Code Output
zh-CN, zh, zh-Hans Chinese (Simplified)
zh-TW, zh-HK, zh-Hant Chinese (Traditional)
en-US, en, en-GB English
ko-KR, ko Korean
ja-JP, ja Japanese
ru-RU, ru Russian
fr-FR, fr French
de-DE, de German
es-ES, es Spanish
pt-BR, pt Portuguese
unknown Returns the locale code as-is

Backward Compatibility

Fully Compatible

  • Existing KB configs with WikiLanguage: "zh" and WikiLanguage: "en" continue to work
  • The function handles short codes: "zh" → "Chinese (Simplified)", "en" → "English"
  • No database schema changes required
  • No migration needed

Testing Recommendations

Test Cases

// Test 1: Existing short code support
lang := types.LanguageLocaleName("zh")     // Expected: "Chinese (Simplified)"
lang := types.LanguageLocaleName("en")     // Expected: "English"

// Test 2: Full locale code support
lang := types.LanguageLocaleName("zh-CN")  // Expected: "Chinese (Simplified)"
lang := types.LanguageLocaleName("en-US")  // Expected: "English"
lang := types.LanguageLocaleName("ko-KR")  // Expected: "Korean"

// Test 3: Unknown locale fallback
lang := types.LanguageLocaleName("xx-YY")  // Expected: "xx-YY"
lang := types.LanguageLocaleName("")       // Expected: ""

Integration Testing

  1. Create a wiki KB with WikiLanguage: "zh" and verify summary pages are generated in Chinese
  2. Create a wiki KB with WikiLanguage: "en" and verify summary pages are generated in English
  3. (Optional) Create a wiki KB with WikiLanguage: "ko" and verify Korean language support

Future Enhancements

Store full locale codes in WikiConfig to align with middleware:

// In WikiConfig struct
type WikiConfig struct {
    WikiLanguage string // Store "zh-CN" instead of "zh"
    // ...
}

// No conversion needed
lang := types.LanguageLocaleName(kb.WikiConfig.WikiLanguage)

Option 2: Context-Aware Language Selection

Use context language as fallback:

// Determine language: prefer KB config, fallback to context/env, then default
lang := kb.WikiConfig.WikiLanguage
if lang == "" {
    // Fallback to context language
    if ctxLang, ok := types.LanguageFromContext(ctx); ok {
        lang = ctxLang
    } else {
        // Fallback to env/default
        lang = types.DefaultLanguage()
    }
}
humanReadableLang := types.LanguageLocaleName(lang)

Implementation Steps

  1. Step 1: Open /internal/application/service/wiki_ingest.go
  2. Step 2: Replace lines 135-141 with the refactored code shown above
  3. Step 3: Save the file
  4. Step 4: Run tests to verify:
    go test ./internal/application/service/...
    go test ./internal/types/...
    
  5. Step 5: (Optional) Run the wiki ingest pipeline with test documents

Rollback Plan

If issues arise:

git checkout HEAD -- internal/application/service/wiki_ingest.go

Definition Location

  • File: /internal/types/context_helpers.go
  • Function: LanguageLocaleName() (lines 85-112)

Other Usage Points (Informational)

  • /internal/types/context_helpers.go - LanguageNameFromContext() function (line 82)

Similar Patterns (For Future Refactoring)

  • Search for other hardcoded language mappings in the codebase that could benefit from centralization

Sign-Off

  • Requested By: User (mentions "中间件中已经有language的判断逻辑")
  • Implementation Date: [TBD]
  • Reviewer: [TBD]
  • Approval: [TBD]