diff --git a/skills/preloaded/openmaic-classroom/SKILL.md b/skills/preloaded/openmaic-classroom/SKILL.md index 108a0e4f..b42b0e14 100644 --- a/skills/preloaded/openmaic-classroom/SKILL.md +++ b/skills/preloaded/openmaic-classroom/SKILL.md @@ -1,6 +1,6 @@ --- name: openmaic-classroom -description: 将 RAG 检索结果或文档块转换为 OpenMAIC 互动课程。当用户要求将知识库内容、检索到的文档片段、或上传的文档转换为教学课件/互动课堂时使用此技能。支持纯需求生成和基于 PDF 内容的课程生成。 +description: 将 RAG 检索结果、文档块或知识图谱概念转换为 OpenMAIC 互动课程。当用户要求将知识库内容、检索到的文档片段、上传的文档、或知识图谱中的概念批量转换为教学课件/互动课堂时使用此技能。支持纯需求生成、基于 PDF 内容的课程生成、和基于概念图遍历的批量课堂生成。 --- # OpenMAIC Classroom Generator @@ -12,6 +12,7 @@ description: 将 RAG 检索结果或文档块转换为 OpenMAIC 互动课程。 1. **RAG → 课程**: 将知识检索结果提炼为教学需求(requirement),通过 OpenMAIC API 生成互动课程 2. **PDF → 课程**: 解析用户上传的 PDF,结合内容生成课程 3. **文档块 → 课程集**: 将多个文档块/知识片段组织为多阶段课程集 +4. **概念图遍历 → 批量微课堂**: 遍历知识图谱中所有 concept 页面,每个 concept 生成一个 micro-classroom ## 能力边界 @@ -62,12 +63,13 @@ OpenMAIC 有两种使用模式,**根据用户场景选择**: - "基于检索结果生成课程" - "为这个知识点创建互动课堂" - "将知识库内容转换为教学材料" +- "批量生成课程" / "把知识图谱的概念都做成课堂" / "基于概念图生成微课堂" ## 工作流程 ### Phase 1: 确认输入源 -确认课程生成的输入来源(三选一): +确认课程生成的输入来源(四选一): 1. **纯需求生成**: 用户直接描述教学主题,无需额外文档 → 直接使用用户描述作为 `requirement`,**无需调用脚本** @@ -75,6 +77,8 @@ OpenMAIC 有两种使用模式,**根据用户场景选择**: → 使用 `scripts/rag-to-requirement.py` 脚本转换检索结果为结构化 requirement(见 Phase 1.1) 3. **PDF 文件**: 用户提供 PDF 文件路径,先解析再调用生成 API → 提取 PDF 文本后构建 requirement,**无需调用脚本** +4. **概念图遍历批量生成**: 遍历知识图谱中所有 concept 页面,每个 concept 生成一个 micro-classroom + → 使用 `scripts/concept-to-requirement.py` 脚本转换 concept + 关联 entity 为结构化 requirement(见 Phase 1.2) ### Phase 1.1: RAG 结果 → Requirement 转换(仅适用于场景 2) @@ -101,6 +105,82 @@ execute_skill_script( - **不要**在没有任何参数的情况下调用此脚本,否则会报错退出 - 如果脚本执行失败,可直接根据检索结果手动构建 requirement +### Phase 1.2: Concept Graph → Requirement 转换(仅适用于场景 4) + +当场景 4 需要基于知识图谱概念批量生成课堂时,执行以下步骤: + +**步骤 1:列出所有 concept 页面** + +使用 `wiki_search` 工具搜索所有 concept 类型的页面: + +``` +wiki_search("^concept/", limit=50) +``` + +如果 concept 数量超过 50 个,多次调用翻页直到获取全部。 + +**步骤 2:对每个 concept 获取详情和关联 entity** + +对每个 concept 页面: + +1. 调用 `wiki_read_page([concept_slug])` 获取页面详情(含 OutLinks 和 InLinks) +2. 从 OutLinks 和 InLinks 中筛选出 `entity/*` 开头的 slug +3. 确定每个 entity 的 link_type: + - 同时出现在 OutLinks 和 InLinks 中 → `bidirectional` + - 仅出现在 OutLinks 中 → `outlink` + - 仅出现在 InLinks 中 → `inlink` +4. 调用 `wiki_read_page([entity_slugs])` 批量读取关联 entity(只取 title + summary,不取完整 content) + +**步骤 3:转换为 requirement** + +对每个 concept,调用 `scripts/concept-to-requirement.py` 将 concept + 关联 entity 转换为 requirement: + +``` +execute_skill_script( + skill_name: "openmaic-classroom", + script_path: "scripts/concept-to-requirement.py", + input: '{"concept": {"slug": "...", "title": "...", "summary": "...", "content": "..."}, "entities": [{"slug": "...", "title": "...", "summary": "...", "link_type": "..."}], "language": "zh-CN", "depth": "intermediate"}' +) +``` + +**input 参数格式(JSON 字符串,必须通过 `input` 参数传入):** +- `concept`(必填): concept 页面对象,包含 `slug`、`title`、`summary`、`content` +- `entities`(可选): 关联 entity 数组,每项包含 `slug`、`title`、`summary`、`link_type` +- `language`(可选): `zh-CN|en-US`,默认 `zh-CN` +- `depth`(可选): `beginner|intermediate|advanced`,默认 `intermediate` +- `audience`(可选): 目标受众描述,默认"相关领域的学习者" + +**步骤 4:顺序调用 OpenMAIC API** + +对每个 concept 的 requirement,**顺序**调用 OpenMAIC 生成 API(concurrency=1): + +- 每个 concept → 一个 micro-classroom +- requirement 中标注 `micro-classroom` +- 不可并行,避免配额冲突 + +**步骤 5:生成 manifest(可恢复性)** + +生成 manifest JSON,记录每个 concept 的生成状态: + +```json +{ + "kb_id": "...", + "total_concepts": 10, + "generated": ["concept/rag", "concept/llm"], + "failed": [{"slug": "concept/embedding", "error": "..."}], + "pending": ["concept/vector-db"] +} +``` + +失败时从断点继续:跳过 `generated` 中的 concept,从 `pending` 的第一个开始。 + +**关键约束**: +- wiki 读取和脚本转换允许 batching +- OpenMAIC API 生成 concurrency=1 +- concept.Summary 作为 requirement 核心锚定 +- entity 只取 title + summary,不取完整 content +- 无关联 entity 的 concept 仍可生成课堂(缺少实践环节) + ### Phase 2: 构建 Generation Request 根据输入源构建请求体,**字段说明**: @@ -120,6 +200,7 @@ execute_skill_script( - **场景 1(纯需求)**: `requirement` 直接使用用户描述 - **场景 2(RAG 结果)**: `requirement` 使用 Phase 1.1 脚本输出中的 `requirement` 字段 - **场景 3(PDF)**: `requirement` 根据 PDF 提取的文本构建,`pdfContent` 填入解析结果 +- **场景 4(概念图遍历)**: `requirement` 使用 Phase 1.2 脚本输出中的 `requirement` 字段,每个 concept 单独调用 API ### Phase 3: 调用 OpenMAIC API @@ -261,6 +342,23 @@ Classroom URL: 4. 如果 MCP 工具不可用,告知用户先部署 mcp_api_requester(见 MCP 可用性检查) 5. 汇总返回所有 Classroom URL +## 概念图遍历 → 批量微课堂 + +当用户需要基于知识图谱概念批量生成课堂时(场景 4),遵循 Phase 1.2 的完整流程。 + +**MVP 课程编排策略**:one concept → one micro-classroom + +**课程类型标注**:requirement 中标注 `micro-classroom` + +**批处理可恢复性**:生成 manifest JSON,记录每个 concept 的生成状态,失败时可从断点继续。 + +**关键约束**: +- wiki 读取和脚本转换允许 batching +- OpenMAIC API 生成 concurrency=1 +- concept.Summary 作为 requirement 核心锚定 +- entity 只取 title + summary,不取完整 content +- 无关联 entity 的 concept 仍可生成课堂(缺少实践环节) + ## 注意事项 - 脚本在 Docker 沙箱中执行,**沙箱默认禁用网络访问** diff --git a/skills/preloaded/openmaic-classroom/references/requirement-builder.md b/skills/preloaded/openmaic-classroom/references/requirement-builder.md index c023e6d5..852befff 100644 --- a/skills/preloaded/openmaic-classroom/references/requirement-builder.md +++ b/skills/preloaded/openmaic-classroom/references/requirement-builder.md @@ -70,6 +70,66 @@ OpenMAIC 的 `requirement` 字段需要是**结构化的教学需求描述**, - 重点覆盖:[关键主题列表] ``` +### 模板 5: 基于概念图遍历(Concept Graph) + +当从知识图谱 concept 页面及其关联 entity 生成微课堂时使用此模板。由 `scripts/concept-to-requirement.py` 自动生成。 + +**输入结构**: +```json +{ + "concept": { "slug": "concept/rag", "title": "RAG 检索增强生成", "summary": "...", "content": "..." }, + "entities": [ + { "slug": "entity/vector-db", "title": "向量数据库", "summary": "...", "link_type": "outlink" }, + { "slug": "entity/embedding", "title": "Embedding 模型", "summary": "...", "link_type": "bidirectional" } + ], + "language": "zh-CN", + "depth": "intermediate", + "audience": "相关领域的学习者" +} +``` + +**输出 requirement 结构**: +``` +基于知识图谱概念「[concept.title]」,为[audience]创建一个[depth]微课堂(micro-classroom)。 + +教学锚点:[concept.summary] + +学习目标: + - 理解[concept.summary 中的关键句] + +核心知识点: + - [从 concept.content 解析的定义/机制] + +关联实体(实践环节): + - 案例:[entity.title]:[entity.summary] + - 工具:[entity.title]:[entity.summary] + - 应用场景:[entity.title]:[entity.summary] + - 前置知识:[entity.title] + +实践任务: + - 通过 [entity.title] 实践 [concept.title] 的应用 + +常见误区检查: + - [从 concept.content 解析的误区] + +评估提示: + - 请解释 [concept.title] 的核心定义 + +请使用中文生成课程内容。 +``` + +**entity 分类排序规则**(纯文本操作,无 LLM/embedding): +- link_type 权重:bidirectional (+3) > outlink (+2) > inlink (+1) +- title token overlap:concept title 分词后与 entity title 的交集数 (+1 per hit, cap +2) +- summary keyword hit:concept summary 关键词在 entity summary 中出现 (+1 per hit, cap +2) +- slug token hit:concept slug token 在 entity slug 中出现 (+1) +- summary 为空扣分 (-2) +- 取 top 3-5 entities,分为 Examples / Tools / Application Scenarios / Prerequisites + +**概念内容解析逻辑**: +- 优先解析 markdown 结构:标题列表(定义段、机制段、案例段、误区段) +- fallback 到前 N 字 + ## 示例 ### 示例 1: 技术文档 → 课程 diff --git a/skills/preloaded/openmaic-classroom/scripts/concept-to-requirement.py b/skills/preloaded/openmaic-classroom/scripts/concept-to-requirement.py new file mode 100644 index 00000000..a71a81dd --- /dev/null +++ b/skills/preloaded/openmaic-classroom/scripts/concept-to-requirement.py @@ -0,0 +1,412 @@ +#!/usr/bin/env python3 +""" +Concept Graph Material → OpenMAIC Requirement 转换器 + +将 WeKnora wiki 知识图谱中的 concept 页面及其关联 entity 转换为 +结构化的 OpenMAIC 课程生成需求描述。两阶段转换: + 1. Concept Graph Material → Pedagogical Design JSON + 2. Pedagogical Design JSON → requirement string + +此脚本仅做数据转换,不涉及网络调用。 + +用法: + echo '{"concept": {...}, "entities": [...]}' | python scripts/concept-to-requirement.py + python scripts/concept-to-requirement.py --file input.json +""" + +import json +import re +import sys +from typing import Any + + +def _tokenize(text: str) -> list[str]: + """Simple whitespace + punctuation tokenizer for title/slug overlap.""" + return [t.lower() for t in re.split(r"[\s_\-/]+", text) if t] + + +def _score_entity( + entity: dict[str, Any], + concept_title_tokens: list[str], + concept_summary_keywords: list[str], + concept_slug_tokens: list[str], +) -> int: + """Score an entity by relevance to the concept (pure text heuristics).""" + score = 0 + + # link_type scoring + link_type = entity.get("link_type", "") + if link_type == "bidirectional": + score += 3 + elif link_type == "outlink": + score += 2 + elif link_type == "inlink": + score += 1 + + # title token overlap + entity_title_tokens = set(_tokenize(entity.get("title", ""))) + overlap = entity_title_tokens & set(concept_title_tokens) + score += min(len(overlap), 2) + + # summary keyword hit + entity_summary = (entity.get("summary") or "").lower() + keyword_hits = sum(1 for kw in concept_summary_keywords if kw in entity_summary) + score += min(keyword_hits, 2) + + # slug token hit + entity_slug_tokens = set(_tokenize(entity.get("slug", ""))) + slug_overlap = entity_slug_tokens & set(concept_slug_tokens) + score += min(len(slug_overlap), 1) + + # penalty for empty summary + if not entity.get("summary"): + score -= 2 + + return score + + +def _classify_entity( + entity: dict[str, Any], concept_title: str +) -> str: + """Classify an entity into a pedagogical role.""" + title = (entity.get("title") or "").lower() + summary = (entity.get("summary") or "").lower() + text = f"{title} {summary}" + + tool_keywords = ["工具", "平台", "框架", "库", "sdk", "api", "tool", "platform", "framework", "library"] + example_keywords = ["案例", "实例", "示例", "应用", "case", "example", "application", "demo"] + prereq_keywords = ["前提", "基础", "前置", "先决", "prerequisite", "foundation", "basic"] + + if any(kw in text for kw in tool_keywords): + return "Tools" + if any(kw in text for kw in example_keywords): + return "Examples" + if any(kw in text for kw in prereq_keywords): + return "Prerequisites" + return "Application Scenarios" + + +def _parse_markdown_sections(content: str) -> dict[str, str]: + """Parse markdown content into sections keyed by heading.""" + sections: dict[str, str] = {} + current_heading = "" + current_lines: list[str] = [] + + for line in content.split("\n"): + heading_match = re.match(r"^(#{1,4})\s+(.+)$", line) + if heading_match: + if current_heading: + sections[current_heading] = "\n".join(current_lines).strip() + current_heading = heading_match.group(2).strip() + current_lines = [] + else: + current_lines.append(line) + + if current_heading: + sections[current_heading] = "\n".join(current_lines).strip() + + return sections + + +def _extract_key_points(sections: dict[str, str]) -> list[str]: + """Extract key points from markdown sections (definitions, mechanisms).""" + points: list[str] = [] + definition_headings = {"定义", "概念", "概述", "简介", "Definition", "Overview", "Introduction"} + mechanism_headings = {"机制", "原理", "工作原理", "Mechanism", "How it works", "Principle"} + + for heading, body in sections.items(): + if any(d in heading for d in definition_headings): + first_para = body.split("\n\n")[0].strip() + if first_para: + points.append(first_para[:200]) + elif any(m in heading for m in mechanism_headings): + bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", "• "))] + points.extend(bullets[:3]) + + return points[:5] + + +def _extract_examples(sections: dict[str, str]) -> list[str]: + """Extract examples from markdown sections.""" + example_headings = {"案例", "示例", "实例", "应用场景", "Example", "Use Case", "Application"} + examples: list[str] = [] + + for heading, body in sections.items(): + if any(e in heading for e in example_headings): + bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", "• "))] + examples.extend(bullets[:3]) + + return examples[:3] + + +def _extract_misconceptions(sections: dict[str, str]) -> list[str]: + """Extract common misconceptions from markdown sections.""" + misconception_headings = {"误区", "常见错误", "误解", "Misconception", "Common mistake", "Pitfall"} + misconceptions: list[str] = [] + + for heading, body in sections.items(): + if any(m in heading for m in misconception_headings): + bullets = [l.strip().lstrip("-*• ") for l in body.split("\n") if l.strip().startswith(("- ", "* ", "• "))] + misconceptions.extend(bullets[:3]) + + return misconceptions[:3] + + +def build_pedagogical_design(data: dict[str, Any]) -> dict[str, Any]: + """Stage 1: Concept Graph Material → Pedagogical Design JSON.""" + concept = data["concept"] + entities = data.get("entities", []) + language = data.get("language", "zh-CN") + depth = data.get("depth", "intermediate") + audience = data.get("audience", "相关领域的学习者") + + concept_title = concept.get("title", "") + concept_summary = concept.get("summary", "") + concept_content = concept.get("content", "") + concept_slug = concept.get("slug", "") + + # Parse markdown sections from content + sections = _parse_markdown_sections(concept_content) if concept_content else {} + + # Extract pedagogical elements from content + key_points = _extract_key_points(sections) + examples_from_content = _extract_examples(sections) + misconceptions = _extract_misconceptions(sections) + + # Score and rank entities + concept_title_tokens = _tokenize(concept_title) + concept_summary_keywords = [w.lower() for w in re.findall(r"\w+", concept_summary) if len(w) > 1] + concept_slug_tokens = _tokenize(concept_slug) + + scored_entities = [] + for entity in entities: + score = _score_entity(entity, concept_title_tokens, concept_summary_keywords, concept_slug_tokens) + scored_entities.append((score, entity)) + scored_entities.sort(key=lambda x: x[0], reverse=True) + + # Select top entities (3-5) + top_count = min(max(3, len(scored_entities)), 5) + top_entities = scored_entities[:top_count] + + # Classify entities into pedagogical roles + classified: dict[str, list[dict[str, Any]]] = { + "Examples": [], + "Tools": [], + "Application Scenarios": [], + "Prerequisites": [], + } + for score, entity in top_entities: + role = _classify_entity(entity, concept_title) + classified[role].append({ + "slug": entity.get("slug", ""), + "title": entity.get("title", ""), + "summary": entity.get("summary", ""), + "link_type": entity.get("link_type", ""), + "relevance_score": score, + }) + + # Build learning objectives from concept summary + learning_objectives: list[str] = [] + if concept_summary: + sentences = re.split(r"[。!?.!?]", concept_summary) + learning_objectives = [f"理解{s.strip()}" for s in sentences if s.strip()][:3] + if not learning_objectives: + learning_objectives = [f"掌握 {concept_title} 的核心概念"] + + # Build practice tasks from entity examples + practice_tasks: list[str] = [] + for ent in classified.get("Examples", []): + practice_tasks.append(f"通过 {ent['title']} 实践 {concept_title} 的应用") + for ent in classified.get("Application Scenarios", []): + practice_tasks.append(f"分析 {ent['title']} 在 {concept_title} 中的作用") + if not practice_tasks and top_entities: + _, first_ent = top_entities[0] + practice_tasks.append(f"结合 {first_ent.get('title', '相关实体')} 理解 {concept_title} 的实际应用") + + # Build prerequisites from entity prerequisites + prerequisites: list[str] = [] + for ent in classified.get("Prerequisites", []): + prerequisites.append(ent["title"]) + + # Build assessment prompts + assessment_prompts: list[str] = [] + if concept_summary: + assessment_prompts.append(f"请解释 {concept_title} 的核心定义") + if key_points: + assessment_prompts.append(f"请描述 {concept_title} 的工作机制") + if classified.get("Examples"): + assessment_prompts.append(f"请举例说明 {concept_title} 的实际应用") + + # Build warnings from misconceptions + warnings: list[str] = [] + for m in misconceptions: + warnings.append(f"常见误区:{m}") + + return { + "concept_slug": concept_slug, + "title": concept_title, + "teaching_anchor": concept_summary or concept_title, + "learning_objectives": learning_objectives, + "key_points": key_points, + "examples": examples_from_content, + "practice_tasks": practice_tasks, + "prerequisites": prerequisites, + "misconception_checks": misconceptions, + "assessment_prompts": assessment_prompts, + "warnings": warnings, + "classified_entities": classified, + } + + +def build_requirement(design: dict[str, Any], data: dict[str, Any]) -> str: + """Stage 2: Pedagogical Design JSON → requirement string.""" + concept = data["concept"] + depth = data.get("depth", "intermediate") + audience = data.get("audience", "相关领域的学习者") + language = data.get("language", "zh-CN") + + depth_map = {"beginner": "入门", "intermediate": "中级", "advanced": "高级"} + depth_cn = depth_map.get(depth, "中级") + + parts: list[str] = [] + + # Header + parts.append(f"基于知识图谱概念「{design['title']}」,为{audience}创建一个{depth_cn}微课堂(micro-classroom)。") + parts.append("") + + # Teaching anchor + parts.append(f"教学锚点:{design['teaching_anchor']}") + parts.append("") + + # Learning objectives + if design["learning_objectives"]: + parts.append("学习目标:") + for obj in design["learning_objectives"]: + parts.append(f" - {obj}") + parts.append("") + + # Key points + if design["key_points"]: + parts.append("核心知识点:") + for kp in design["key_points"]: + parts.append(f" - {kp}") + parts.append("") + + # Classified entities as practice context + classified = design.get("classified_entities", {}) + entity_sections = [] + for role in ("Examples", "Tools", "Application Scenarios", "Prerequisites"): + ents = classified.get(role, []) + if ents: + role_cn = { + "Examples": "案例", + "Tools": "工具", + "Application Scenarios": "应用场景", + "Prerequisites": "前置知识", + }[role] + ent_descs = [f"{e['title']}" + (f":{e['summary'][:80]}" if e.get("summary") else "") for e in ents] + entity_sections.append(f"{role_cn}:{';'.join(ent_descs)}") + + if entity_sections: + parts.append("关联实体(实践环节):") + for section in entity_sections: + parts.append(f" - {section}") + parts.append("") + + # Practice tasks + if design["practice_tasks"]: + parts.append("实践任务:") + for task in design["practice_tasks"]: + parts.append(f" - {task}") + parts.append("") + + # Misconception checks + if design["misconception_checks"]: + parts.append("常见误区检查:") + for mc in design["misconception_checks"]: + parts.append(f" - {mc}") + parts.append("") + + # Assessment + if design["assessment_prompts"]: + parts.append("评估提示:") + for ap in design["assessment_prompts"]: + parts.append(f" - {ap}") + parts.append("") + + # Language directive + if language == "zh-CN": + parts.append("请使用中文生成课程内容。") + + # Concept content fallback + concept_content = concept.get("content", "") + if concept_content and len(concept_content) > 200: + parts.append("") + parts.append(f"参考内容(前500字):{concept_content[:500]}") + + return "\n".join(parts) + + +def process(input_data: dict[str, Any]) -> dict[str, Any]: + """Main processing: two-stage conversion.""" + concept = input_data.get("concept") + if not concept: + return { + "requirement": "", + "pedagogical_design": {}, + "metadata": {"error": "Missing 'concept' in input"}, + } + + entities = input_data.get("entities", []) + + # Stage 1: Build pedagogical design + design = build_pedagogical_design(input_data) + + # Stage 2: Build requirement string + requirement = build_requirement(design, input_data) + + return { + "requirement": requirement, + "pedagogical_design": design, + "metadata": { + "concept_slug": concept.get("slug", ""), + "entity_count": len(entities), + "depth": input_data.get("depth", "intermediate"), + "language": input_data.get("language", "zh-CN"), + }, + } + + +def main() -> None: + """Entry point: read from stdin or file, output JSON.""" + import argparse + + parser = argparse.ArgumentParser(description="Concept Graph → OpenMAIC Requirement 转换器") + parser.add_argument("--file", "-f", help="输入 JSON 文件路径") + args = parser.parse_args() + + if args.file: + with open(args.file, "r", encoding="utf-8") as f: + input_data = json.load(f) + else: + input_text = sys.stdin.read() + if not input_text.strip(): + print( + "错误: 未提供输入数据。用法:\n" + ' echo \'{"concept": {...}, "entities": [...]}\' | python concept-to-requirement.py\n' + " python concept-to-requirement.py --file input.json", + file=sys.stderr, + ) + sys.exit(1) + try: + input_data = json.loads(input_text) + except json.JSONDecodeError as e: + print(f"错误: 输入 JSON 解析失败: {e}", file=sys.stderr) + sys.exit(1) + + result = process(input_data) + print(json.dumps(result, ensure_ascii=False, indent=2)) + + +if __name__ == "__main__": + main()