22 Commits

Author SHA1 Message Date
wizardchen
ef1047bf67 feat(parser): add OpenDataLoader, PaddleOCR-VL engines, and parser improvements
Introduce opendataloader and PaddleOCR-VL parser engines with tenant-level
settings UI, replace liteparse, and harden Excel/PPT/Markdown parsing.
Optional odl-hybrid sidecar stays local-build only and is excluded from
default dev-start and full profiles.
2026-06-03 12:29:13 +08:00
jw_mac
faaf6124b2 add output that if install package of offline protoc exitst 2026-04-24 22:42:17 +08:00
Liwx
da8191cd57 fix(docker): 修复镜像构建兼容性问题 2026-04-21 23:29:41 +08:00
wizardchen
6d88619869 feat: enhance Dockerfile and build scripts for customizable APT mirror
- Added support for customizable APT mirror in the Dockerfile for the docreader service, allowing users to specify a mirror via build arguments.
- Updated docker-compose.yml to pass the APT_MIRROR argument during the build process.
- Modified build_images.sh script to include the APT_MIRROR argument when building the docreader image.
- Updated .gitignore to exclude .cursor/ directory.

This update improves flexibility in package management during the image build process.
2026-03-02 21:21:49 +08:00
wizardchen
397689d2f3 feat: introduce WeKnora Lite edition with lightweight configuration and deployment
- Added a new `.env.lite.example` file for the Lite version, providing a minimal configuration template.
- Updated `.env.example` to remove deprecated variables and include new Docreader settings.
- Enhanced Docker configurations to support the Lite version, including a new Dockerfile for the Docreader service.
- Introduced a Makefile target for building and running the Lite version, along with packaging capabilities.
- Created GitHub workflows for building and releasing Lite binaries, including Homebrew formula support.
- Implemented a new service file for managing the Lite version as a system service.

This update enables a streamlined, single-binary deployment of WeKnora, reducing external dependencies and simplifying setup.
2026-03-02 21:21:49 +08:00
begoniezhao
2d66abedf0 feat: 新增文档模型类,调整配置与解析逻辑,优化日志及导入
移除日志设置与冗余代码,优化导入、类型提示及OCR后端管理
统一调整各文件模块导入路径为绝对导入
调整导入路径,移除部分导入,优化日志及注释
升级文档解析器为 Docx2Parser,优化超时与图片处理逻辑
2025-11-18 22:37:01 +08:00
begoniezhao
c1f731e026 chore(docreader): 重新组织模块文件 2025-11-05 12:07:39 +08:00
wizardchen
092b30af3e fix(docreader): Download binary by target arch in docker 2025-09-12 20:21:30 +08:00
wizardchen
74c121f7fb feat: Adjust App & Docreader log output 2025-09-11 23:14:23 +08:00
wizardchen
bff0e742fa fix: try fix ocr avx not support 2025-09-11 13:21:21 +08:00
wizardchen
6f6ca84dae feat(docreader): add health check 2025-09-10 20:22:14 +08:00
wizardchen
7cfae7e0d3 fix: pre fetch ocr models in docker container 2025-09-10 17:24:26 +08:00
Liwx1014
3aad892a62 fix:build docreader timeout; update ocr config;support pdf tables parsing 2025-09-08 14:58:37 +08:00
begoniezhao
a5cdda3c4c build: Removed platform parameters, added multi-platform builds and simplified image tags 2025-08-22 12:27:26 +08:00
wizardchen
785261313f feat: make CONCURRENCY_POOL_SIZE configurable 2025-08-16 13:27:01 +08:00
begoniezhao
a46d15e579 build: Upgrade Python image and LibreOffice, update dependency path and add Playwright dependency 2025-08-15 15:34:21 +08:00
wizardchen
57c6e04e54 fix: 调整构建镜像采用的镜像源 2025-08-14 12:16:08 +08:00
wizardchen
8b43931886 feat: support minio storage 2025-08-14 12:16:08 +08:00
LEOLI
621d9aad37 refactor:重构了start_all.sh启动脚本和docker构建文件,适配新版本docke compose和centos9stream系统 2025-08-13 15:51:31 +08:00
begoniezhao
a347abf829 chore: 调整脚本后台语法,改执行方式并设换行符 2025-08-11 17:10:29 +08:00
wizardchen
bdabed6bfa feat: Added web page for configuring model information 2025-08-10 17:11:07 +08:00
wizardchen
56eb2bce33 init commit 2025-08-05 15:08:07 +08:00