feat(openmaic-classroom): 增加基于知识图谱概念的微课堂生成支持,更新需求构建模板

This commit is contained in:
bingxiang.cheng
2026-05-02 09:19:22 +08:00
committed by lyingbug
parent 19cd17c0a6
commit db8abd3f9b
3 changed files with 572 additions and 2 deletions

View File

@@ -1,6 +1,6 @@
---
name: openmaic-classroom
description: 将 RAG 检索结果文档块转换为 OpenMAIC 互动课程。当用户要求将知识库内容、检索到的文档片段、上传的文档转换为教学课件/互动课堂时使用此技能。支持纯需求生成基于 PDF 内容的课程生成。
description: 将 RAG 检索结果文档块或知识图谱概念转换为 OpenMAIC 互动课程。当用户要求将知识库内容、检索到的文档片段、上传的文档、或知识图谱中的概念批量转换为教学课件/互动课堂时使用此技能。支持纯需求生成基于 PDF 内容的课程生成、和基于概念图遍历的批量课堂生成
---
# OpenMAIC Classroom Generator
@@ -12,6 +12,7 @@ description: 将 RAG 检索结果或文档块转换为 OpenMAIC 互动课程。
1. **RAG → 课程**: 将知识检索结果提炼为教学需求requirement通过 OpenMAIC API 生成互动课程
2. **PDF → 课程**: 解析用户上传的 PDF结合内容生成课程
3. **文档块 → 课程集**: 将多个文档块/知识片段组织为多阶段课程集
4. **概念图遍历 → 批量微课堂**: 遍历知识图谱中所有 concept 页面,每个 concept 生成一个 micro-classroom
## 能力边界
@@ -62,12 +63,13 @@ OpenMAIC 有两种使用模式,**根据用户场景选择**
- "基于检索结果生成课程"
- "为这个知识点创建互动课堂"
- "将知识库内容转换为教学材料"
- "批量生成课程" / "把知识图谱的概念都做成课堂" / "基于概念图生成微课堂"
## 工作流程
### Phase 1: 确认输入源
确认课程生成的输入来源(选一):
确认课程生成的输入来源(选一):
1. **纯需求生成**: 用户直接描述教学主题,无需额外文档
→ 直接使用用户描述作为 `requirement`**无需调用脚本**
@@ -75,6 +77,8 @@ OpenMAIC 有两种使用模式,**根据用户场景选择**
→ 使用 `scripts/rag-to-requirement.py` 脚本转换检索结果为结构化 requirement见 Phase 1.1
3. **PDF 文件**: 用户提供 PDF 文件路径,先解析再调用生成 API
→ 提取 PDF 文本后构建 requirement**无需调用脚本**
4. **概念图遍历批量生成**: 遍历知识图谱中所有 concept 页面,每个 concept 生成一个 micro-classroom
→ 使用 `scripts/concept-to-requirement.py` 脚本转换 concept + 关联 entity 为结构化 requirement见 Phase 1.2
### Phase 1.1: RAG 结果 → Requirement 转换(仅适用于场景 2
@@ -101,6 +105,82 @@ execute_skill_script(
- **不要**在没有任何参数的情况下调用此脚本,否则会报错退出
- 如果脚本执行失败,可直接根据检索结果手动构建 requirement
### Phase 1.2: Concept Graph → Requirement 转换(仅适用于场景 4
当场景 4 需要基于知识图谱概念批量生成课堂时,执行以下步骤:
**步骤 1列出所有 concept 页面**
使用 `wiki_search` 工具搜索所有 concept 类型的页面:
```
wiki_search("^concept/", limit=50)
```
如果 concept 数量超过 50 个,多次调用翻页直到获取全部。
**步骤 2对每个 concept 获取详情和关联 entity**
对每个 concept 页面:
1. 调用 `wiki_read_page([concept_slug])` 获取页面详情(含 OutLinks 和 InLinks
2. 从 OutLinks 和 InLinks 中筛选出 `entity/*` 开头的 slug
3. 确定每个 entity 的 link_type
- 同时出现在 OutLinks 和 InLinks 中 → `bidirectional`
- 仅出现在 OutLinks 中 → `outlink`
- 仅出现在 InLinks 中 → `inlink`
4. 调用 `wiki_read_page([entity_slugs])` 批量读取关联 entity只取 title + summary不取完整 content
**步骤 3转换为 requirement**
对每个 concept调用 `scripts/concept-to-requirement.py` 将 concept + 关联 entity 转换为 requirement
```
execute_skill_script(
skill_name: "openmaic-classroom",
script_path: "scripts/concept-to-requirement.py",
input: '{"concept": {"slug": "...", "title": "...", "summary": "...", "content": "..."}, "entities": [{"slug": "...", "title": "...", "summary": "...", "link_type": "..."}], "language": "zh-CN", "depth": "intermediate"}'
)
```
**input 参数格式JSON 字符串,必须通过 `input` 参数传入):**
- `concept`(必填): concept 页面对象,包含 `slug``title``summary``content`
- `entities`(可选): 关联 entity 数组,每项包含 `slug``title``summary``link_type`
- `language`(可选): `zh-CN|en-US`,默认 `zh-CN`
- `depth`(可选): `beginner|intermediate|advanced`,默认 `intermediate`
- `audience`(可选): 目标受众描述,默认"相关领域的学习者"
**步骤 4顺序调用 OpenMAIC API**
对每个 concept 的 requirement**顺序**调用 OpenMAIC 生成 APIconcurrency=1
- 每个 concept → 一个 micro-classroom
- requirement 中标注 `micro-classroom`
- 不可并行,避免配额冲突
**步骤 5生成 manifest可恢复性**
生成 manifest JSON记录每个 concept 的生成状态:
```json
{
"kb_id": "...",
"total_concepts": 10,
"generated": ["concept/rag", "concept/llm"],
"failed": [{"slug": "concept/embedding", "error": "..."}],
"pending": ["concept/vector-db"]
}
```
失败时从断点继续:跳过 `generated` 中的 concept`pending` 的第一个开始。
**关键约束**
- wiki 读取和脚本转换允许 batching
- OpenMAIC API 生成 concurrency=1
- concept.Summary 作为 requirement 核心锚定
- entity 只取 title + summary不取完整 content
- 无关联 entity 的 concept 仍可生成课堂(缺少实践环节)
### Phase 2: 构建 Generation Request
根据输入源构建请求体,**字段说明**
@@ -120,6 +200,7 @@ execute_skill_script(
- **场景 1纯需求**: `requirement` 直接使用用户描述
- **场景 2RAG 结果)**: `requirement` 使用 Phase 1.1 脚本输出中的 `requirement` 字段
- **场景 3PDF**: `requirement` 根据 PDF 提取的文本构建,`pdfContent` 填入解析结果
- **场景 4概念图遍历**: `requirement` 使用 Phase 1.2 脚本输出中的 `requirement` 字段,每个 concept 单独调用 API
### Phase 3: 调用 OpenMAIC API
@@ -261,6 +342,23 @@ Classroom URL:
4. 如果 MCP 工具不可用,告知用户先部署 mcp_api_requester见 MCP 可用性检查)
5. 汇总返回所有 Classroom URL
## 概念图遍历 → 批量微课堂
当用户需要基于知识图谱概念批量生成课堂时(场景 4遵循 Phase 1.2 的完整流程。
**MVP 课程编排策略**one concept → one micro-classroom
**课程类型标注**requirement 中标注 `micro-classroom`
**批处理可恢复性**:生成 manifest JSON记录每个 concept 的生成状态,失败时可从断点继续。
**关键约束**
- wiki 读取和脚本转换允许 batching
- OpenMAIC API 生成 concurrency=1
- concept.Summary 作为 requirement 核心锚定
- entity 只取 title + summary不取完整 content
- 无关联 entity 的 concept 仍可生成课堂(缺少实践环节)
## 注意事项
- 脚本在 Docker 沙箱中执行,**沙箱默认禁用网络访问**

View File

@@ -70,6 +70,66 @@ OpenMAIC 的 `requirement` 字段需要是**结构化的教学需求描述**
- 重点覆盖:[关键主题列表]
```
### 模板 5: 基于概念图遍历Concept Graph
当从知识图谱 concept 页面及其关联 entity 生成微课堂时使用此模板。由 `scripts/concept-to-requirement.py` 自动生成。
**输入结构**
```json
{
"concept": { "slug": "concept/rag", "title": "RAG 检索增强生成", "summary": "...", "content": "..." },
"entities": [
{ "slug": "entity/vector-db", "title": "向量数据库", "summary": "...", "link_type": "outlink" },
{ "slug": "entity/embedding", "title": "Embedding 模型", "summary": "...", "link_type": "bidirectional" }
],
"language": "zh-CN",
"depth": "intermediate",
"audience": "相关领域的学习者"
}
```
**输出 requirement 结构**
```
基于知识图谱概念「[concept.title]」,为[audience]创建一个[depth]微课堂micro-classroom
教学锚点:[concept.summary]
学习目标:
- 理解[concept.summary 中的关键句]
核心知识点:
- [从 concept.content 解析的定义/机制]
关联实体(实践环节):
- 案例:[entity.title][entity.summary]
- 工具:[entity.title][entity.summary]
- 应用场景:[entity.title][entity.summary]
- 前置知识:[entity.title]
实践任务:
- 通过 [entity.title] 实践 [concept.title] 的应用
常见误区检查:
- [从 concept.content 解析的误区]
评估提示:
- 请解释 [concept.title] 的核心定义
请使用中文生成课程内容。
```
**entity 分类排序规则**(纯文本操作,无 LLM/embedding
- link_type 权重bidirectional (+3) > outlink (+2) > inlink (+1)
- title token overlapconcept title 分词后与 entity title 的交集数 (+1 per hit, cap +2)
- summary keyword hitconcept summary 关键词在 entity summary 中出现 (+1 per hit, cap +2)
- slug token hitconcept slug token 在 entity slug 中出现 (+1)
- summary 为空扣分 (-2)
- 取 top 3-5 entities分为 Examples / Tools / Application Scenarios / Prerequisites
**概念内容解析逻辑**
- 优先解析 markdown 结构:标题列表(定义段、机制段、案例段、误区段)
- fallback 到前 N 字
## 示例
### 示例 1: 技术文档 → 课程

View File

@@ -0,0 +1,412 @@
#!/usr/bin/env python3
"""
Concept Graph Material → OpenMAIC Requirement 转换器
将 WeKnora wiki 知识图谱中的 concept 页面及其关联 entity 转换为
结构化的 OpenMAIC 课程生成需求描述。两阶段转换:
1. Concept Graph Material → Pedagogical Design JSON
2. Pedagogical Design JSON → requirement string
此脚本仅做数据转换,不涉及网络调用。
用法:
echo '{"concept": {...}, "entities": [...]}' | python scripts/concept-to-requirement.py
python scripts/concept-to-requirement.py --file input.json
"""
import json
import re
import sys
from typing import Any
def _tokenize(text: str) -> list[str]:
"""Simple whitespace + punctuation tokenizer for title/slug overlap."""
return [t.lower() for t in re.split(r"[\s_\-/]+", text) if t]
def _score_entity(
entity: dict[str, Any],
concept_title_tokens: list[str],
concept_summary_keywords: list[str],
concept_slug_tokens: list[str],
) -> int:
"""Score an entity by relevance to the concept (pure text heuristics)."""
score = 0
# link_type scoring
link_type = entity.get("link_type", "")
if link_type == "bidirectional":
score += 3
elif link_type == "outlink":
score += 2
elif link_type == "inlink":
score += 1
# title token overlap
entity_title_tokens = set(_tokenize(entity.get("title", "")))
overlap = entity_title_tokens & set(concept_title_tokens)
score += min(len(overlap), 2)
# summary keyword hit
entity_summary = (entity.get("summary") or "").lower()
keyword_hits = sum(1 for kw in concept_summary_keywords if kw in entity_summary)
score += min(keyword_hits, 2)
# slug token hit
entity_slug_tokens = set(_tokenize(entity.get("slug", "")))
slug_overlap = entity_slug_tokens & set(concept_slug_tokens)
score += min(len(slug_overlap), 1)
# penalty for empty summary
if not entity.get("summary"):
score -= 2
return score
def _classify_entity(
entity: dict[str, Any], concept_title: str
) -> str:
"""Classify an entity into a pedagogical role."""
title = (entity.get("title") or "").lower()
summary = (entity.get("summary") or "").lower()
text = f"{title} {summary}"
tool_keywords = ["工具", "平台", "框架", "", "sdk", "api", "tool", "platform", "framework", "library"]
example_keywords = ["案例", "实例", "示例", "应用", "case", "example", "application", "demo"]
prereq_keywords = ["前提", "基础", "前置", "先决", "prerequisite", "foundation", "basic"]
if any(kw in text for kw in tool_keywords):
return "Tools"
if any(kw in text for kw in example_keywords):
return "Examples"
if any(kw in text for kw in prereq_keywords):
return "Prerequisites"
return "Application Scenarios"
def _parse_markdown_sections(content: str) -> dict[str, str]:
"""Parse markdown content into sections keyed by heading."""
sections: dict[str, str] = {}
current_heading = ""
current_lines: list[str] = []
for line in content.split("\n"):
heading_match = re.match(r"^(#{1,4})\s+(.+)$", line)
if heading_match:
if current_heading:
sections[current_heading] = "\n".join(current_lines).strip()
current_heading = heading_match.group(2).strip()
current_lines = []
else:
current_lines.append(line)
if current_heading:
sections[current_heading] = "\n".join(current_lines).strip()
return sections
def _extract_key_points(sections: dict[str, str]) -> list[str]:
"""Extract key points from markdown sections (definitions, mechanisms)."""
points: list[str] = []
definition_headings = {"定义", "概念", "概述", "简介", "Definition", "Overview", "Introduction"}
mechanism_headings = {"机制", "原理", "工作原理", "Mechanism", "How it works", "Principle"}
for heading, body in sections.items():
if any(d in heading for d in definition_headings):
first_para = body.split("\n\n")[0].strip()
if first_para:
points.append(first_para[:200])
elif any(m in heading for m in mechanism_headings):
bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", ""))]
points.extend(bullets[:3])
return points[:5]
def _extract_examples(sections: dict[str, str]) -> list[str]:
"""Extract examples from markdown sections."""
example_headings = {"案例", "示例", "实例", "应用场景", "Example", "Use Case", "Application"}
examples: list[str] = []
for heading, body in sections.items():
if any(e in heading for e in example_headings):
bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", ""))]
examples.extend(bullets[:3])
return examples[:3]
def _extract_misconceptions(sections: dict[str, str]) -> list[str]:
"""Extract common misconceptions from markdown sections."""
misconception_headings = {"误区", "常见错误", "误解", "Misconception", "Common mistake", "Pitfall"}
misconceptions: list[str] = []
for heading, body in sections.items():
if any(m in heading for m in misconception_headings):
bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", ""))]
misconceptions.extend(bullets[:3])
return misconceptions[:3]
def build_pedagogical_design(data: dict[str, Any]) -> dict[str, Any]:
"""Stage 1: Concept Graph Material → Pedagogical Design JSON."""
concept = data["concept"]
entities = data.get("entities", [])
language = data.get("language", "zh-CN")
depth = data.get("depth", "intermediate")
audience = data.get("audience", "相关领域的学习者")
concept_title = concept.get("title", "")
concept_summary = concept.get("summary", "")
concept_content = concept.get("content", "")
concept_slug = concept.get("slug", "")
# Parse markdown sections from content
sections = _parse_markdown_sections(concept_content) if concept_content else {}
# Extract pedagogical elements from content
key_points = _extract_key_points(sections)
examples_from_content = _extract_examples(sections)
misconceptions = _extract_misconceptions(sections)
# Score and rank entities
concept_title_tokens = _tokenize(concept_title)
concept_summary_keywords = [w.lower() for w in re.findall(r"\w+", concept_summary) if len(w) > 1]
concept_slug_tokens = _tokenize(concept_slug)
scored_entities = []
for entity in entities:
score = _score_entity(entity, concept_title_tokens, concept_summary_keywords, concept_slug_tokens)
scored_entities.append((score, entity))
scored_entities.sort(key=lambda x: x[0], reverse=True)
# Select top entities (3-5)
top_count = min(max(3, len(scored_entities)), 5)
top_entities = scored_entities[:top_count]
# Classify entities into pedagogical roles
classified: dict[str, list[dict[str, Any]]] = {
"Examples": [],
"Tools": [],
"Application Scenarios": [],
"Prerequisites": [],
}
for score, entity in top_entities:
role = _classify_entity(entity, concept_title)
classified[role].append({
"slug": entity.get("slug", ""),
"title": entity.get("title", ""),
"summary": entity.get("summary", ""),
"link_type": entity.get("link_type", ""),
"relevance_score": score,
})
# Build learning objectives from concept summary
learning_objectives: list[str] = []
if concept_summary:
sentences = re.split(r"[。!?.!?]", concept_summary)
learning_objectives = [f"理解{s.strip()}" for s in sentences if s.strip()][:3]
if not learning_objectives:
learning_objectives = [f"掌握 {concept_title} 的核心概念"]
# Build practice tasks from entity examples
practice_tasks: list[str] = []
for ent in classified.get("Examples", []):
practice_tasks.append(f"通过 {ent['title']} 实践 {concept_title} 的应用")
for ent in classified.get("Application Scenarios", []):
practice_tasks.append(f"分析 {ent['title']}{concept_title} 中的作用")
if not practice_tasks and top_entities:
_, first_ent = top_entities[0]
practice_tasks.append(f"结合 {first_ent.get('title', '相关实体')} 理解 {concept_title} 的实际应用")
# Build prerequisites from entity prerequisites
prerequisites: list[str] = []
for ent in classified.get("Prerequisites", []):
prerequisites.append(ent["title"])
# Build assessment prompts
assessment_prompts: list[str] = []
if concept_summary:
assessment_prompts.append(f"请解释 {concept_title} 的核心定义")
if key_points:
assessment_prompts.append(f"请描述 {concept_title} 的工作机制")
if classified.get("Examples"):
assessment_prompts.append(f"请举例说明 {concept_title} 的实际应用")
# Build warnings from misconceptions
warnings: list[str] = []
for m in misconceptions:
warnings.append(f"常见误区:{m}")
return {
"concept_slug": concept_slug,
"title": concept_title,
"teaching_anchor": concept_summary or concept_title,
"learning_objectives": learning_objectives,
"key_points": key_points,
"examples": examples_from_content,
"practice_tasks": practice_tasks,
"prerequisites": prerequisites,
"misconception_checks": misconceptions,
"assessment_prompts": assessment_prompts,
"warnings": warnings,
"classified_entities": classified,
}
def build_requirement(design: dict[str, Any], data: dict[str, Any]) -> str:
"""Stage 2: Pedagogical Design JSON → requirement string."""
concept = data["concept"]
depth = data.get("depth", "intermediate")
audience = data.get("audience", "相关领域的学习者")
language = data.get("language", "zh-CN")
depth_map = {"beginner": "入门", "intermediate": "中级", "advanced": "高级"}
depth_cn = depth_map.get(depth, "中级")
parts: list[str] = []
# Header
parts.append(f"基于知识图谱概念「{design['title']}」,为{audience}创建一个{depth_cn}微课堂micro-classroom")
parts.append("")
# Teaching anchor
parts.append(f"教学锚点:{design['teaching_anchor']}")
parts.append("")
# Learning objectives
if design["learning_objectives"]:
parts.append("学习目标:")
for obj in design["learning_objectives"]:
parts.append(f" - {obj}")
parts.append("")
# Key points
if design["key_points"]:
parts.append("核心知识点:")
for kp in design["key_points"]:
parts.append(f" - {kp}")
parts.append("")
# Classified entities as practice context
classified = design.get("classified_entities", {})
entity_sections = []
for role in ("Examples", "Tools", "Application Scenarios", "Prerequisites"):
ents = classified.get(role, [])
if ents:
role_cn = {
"Examples": "案例",
"Tools": "工具",
"Application Scenarios": "应用场景",
"Prerequisites": "前置知识",
}[role]
ent_descs = [f"{e['title']}" + (f"{e['summary'][:80]}" if e.get("summary") else "") for e in ents]
entity_sections.append(f"{role_cn}{''.join(ent_descs)}")
if entity_sections:
parts.append("关联实体(实践环节):")
for section in entity_sections:
parts.append(f" - {section}")
parts.append("")
# Practice tasks
if design["practice_tasks"]:
parts.append("实践任务:")
for task in design["practice_tasks"]:
parts.append(f" - {task}")
parts.append("")
# Misconception checks
if design["misconception_checks"]:
parts.append("常见误区检查:")
for mc in design["misconception_checks"]:
parts.append(f" - {mc}")
parts.append("")
# Assessment
if design["assessment_prompts"]:
parts.append("评估提示:")
for ap in design["assessment_prompts"]:
parts.append(f" - {ap}")
parts.append("")
# Language directive
if language == "zh-CN":
parts.append("请使用中文生成课程内容。")
# Concept content fallback
concept_content = concept.get("content", "")
if concept_content and len(concept_content) > 200:
parts.append("")
parts.append(f"参考内容前500字{concept_content[:500]}")
return "\n".join(parts)
def process(input_data: dict[str, Any]) -> dict[str, Any]:
"""Main processing: two-stage conversion."""
concept = input_data.get("concept")
if not concept:
return {
"requirement": "",
"pedagogical_design": {},
"metadata": {"error": "Missing 'concept' in input"},
}
entities = input_data.get("entities", [])
# Stage 1: Build pedagogical design
design = build_pedagogical_design(input_data)
# Stage 2: Build requirement string
requirement = build_requirement(design, input_data)
return {
"requirement": requirement,
"pedagogical_design": design,
"metadata": {
"concept_slug": concept.get("slug", ""),
"entity_count": len(entities),
"depth": input_data.get("depth", "intermediate"),
"language": input_data.get("language", "zh-CN"),
},
}
def main() -> None:
"""Entry point: read from stdin or file, output JSON."""
import argparse
parser = argparse.ArgumentParser(description="Concept Graph → OpenMAIC Requirement 转换器")
parser.add_argument("--file", "-f", help="输入 JSON 文件路径")
args = parser.parse_args()
if args.file:
with open(args.file, "r", encoding="utf-8") as f:
input_data = json.load(f)
else:
input_text = sys.stdin.read()
if not input_text.strip():
print(
"错误: 未提供输入数据。用法:\n"
' echo \'{"concept": {...}, "entities": [...]}\' | python concept-to-requirement.py\n'
" python concept-to-requirement.py --file input.json",
file=sys.stderr,
)
sys.exit(1)
try:
input_data = json.loads(input_text)
except json.JSONDecodeError as e:
print(f"错误: 输入 JSON 解析失败: {e}", file=sys.stderr)
sys.exit(1)
result = process(input_data)
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()