Introduce opendataloader and PaddleOCR-VL parser engines with tenant-level
settings UI, replace liteparse, and harden Excel/PPT/Markdown parsing.
Optional odl-hybrid sidecar stays local-build only and is excluded from
default dev-start and full profiles.
Mainland China cloud VMs (Tencent Lighthouse, Aliyun, etc.) frequently
cannot reach get.docker.com, github.com, or even community GitHub
mirrors like gh-proxy.com. The cloud-image bootstrap previously had no
escape hatch for this and failed at the very first curl.
This adds a new DOCKER_INSTALL_MIRROR env var to prepare.sh. When set,
it skips get.docker.com and installs docker-ce + compose-plugin from an
apt mirror of Docker's official repo (e.g. mirrors.tencent.com,
mirrors.aliyun.com).
README.md also gets:
- A GH_PROXY env var threaded through bootstrap methods A and B so the
initial script pull can route through gh-proxy / ghfast.
- An explicit recommendation to prefer method C (scp from local) on
mainland China VMs.
- A consolidated "三件套" table mapping WEKNORA_GH_PROXY /
DOCKER_INSTALL_MIRROR / DOCKER_REGISTRY_MIRROR to per-cloud
endpoints, so users hit one place to copy the full env.
- Updated cleanup.sh to avoid recreating .env during cleanup, preventing exposure of default passwords before firstboot.
- Modified firstboot.sh to create .env from .env.example only if it doesn't exist, ensuring no sensitive data is present before initialization.
- Added support for Docker Hub and GitHub tarball download acceleration via new environment variables WEKNORA_GH_PROXY and DOCKER_REGISTRY_MIRROR.
- Implemented a mechanism to prune old WeKnora images based on the current version, reducing image size and maintaining a clean environment.
- Enhanced README.md with instructions for using the new acceleration features and image pruning options.
Address review feedback on PR #1249:
- prepare.sh: when WEKNORA_REF looks like a version tag (v*), write the
matching WEKNORA_VERSION into .env so docker compose pulls images that
match the compose YAML's git ref (previously stuck on :latest).
- prepare.sh: detect docker binary path via `command -v docker` and
template it into weknora.service (replacing hardcoded /usr/bin/docker
that fails when docker lives in /usr/local/bin).
- firstboot.sh: write a /opt/WeKnora/.firstboot.done marker immediately
after rewriting .env, before `docker compose up -d`. If compose fails
mid-run, the next boot is gated by ConditionPathExists=!marker so we
never regenerate DB_PASSWORD against an already-initialized postgres
volume (which previously bricked the database).
- firstboot.sh: stop deleting its own unit file / script while the
oneshot is still executing; rely on the marker + `systemctl disable`
instead, avoiding "job failed" markings from systemd.
- firstboot.sh: use detected docker path instead of /usr/bin/docker;
add note in credentials file that .env is the source of truth.
- weknora-firstboot.service: add ConditionPathExists=!.firstboot.done.
- cleanup.sh: scope docker volume deletion to compose project label
(com.docker.compose.project=<name>) instead of fuzzy substring match
that could nuke unrelated postgres/redis volumes.
- cleanup.sh: also remove .firstboot.done marker, firstboot log, and
any leftover /root/weknora-credentials.txt so the image is clean.
- README.md: clarify how to actually disable registration (edit the
`replace` call list in firstboot.sh, not run that command in shell).
Recommend `sudo -i` to avoid the classic `sudo cmd >> file` failure
where the shell redirection runs as the unprivileged user. Also document
the `sudo tee -a` workaround and add a scp option C.
Add scripts and docs for packaging WeKnora into cloud images (AMI,
custom images, snapshots) so users can distribute one-click deployable
templates on any cloud provider.
- scripts/cloud-image/: cloud-agnostic prepare/cleanup/firstboot scripts
plus systemd units. Downloads only the 4 runtime files needed by the
compose stack (~100KB) instead of cloning the full repo, and pins to
any git ref via WEKNORA_REF for reproducible builds.
- firstboot.sh randomizes DB/Redis/JWT/AES secrets on first boot,
writes credentials to /root/weknora-credentials.txt and self-removes.
- docs/cloud-image/: per-platform packaging guides. Includes a guide
for Tencent Cloud Lighthouse / CVM covering image creation, sharing,
and marketplace listing.
Default-on services match the unprofiled compose stack (frontend, app,
docreader, postgres, redis); optional services (qdrant, milvus,
neo4j, langfuse, etc.) remain opt-in via compose profiles to keep the
image size small.