- Introduced PDFScannedParser as a fallback parser for scanned PDFs that converts pages into images for OCR processing.
- Updated PDFParser to include PDFScannedParser in the parsing chain, enhancing the document parsing capabilities for scanned content.
- Improved logging for better error tracking during PDF parsing operations.
- Updated the question generation template to clarify the role of surrounding context and main content.
- Enhanced quality rules for generated questions to better align with user search intent.
- Revised output format and added explicit instructions on what not to generate.
- Improved logging and output in the web parser for better visibility of parsed content and metadata.
- Updated DocReaderServicer to pass metadata in responses.
- Modified PipelineParser to accumulate and merge metadata from all parsers.
- Enhanced StdWebParser to extract and log the title from web page content.
- Implemented logic in knowledge service to update knowledge title based on extracted metadata.
- Updated regex patterns in MarkdownImageUtil to support alt text containing brackets and handle MIME types with hyphens.
- Implemented new functions in ImageResolver for resolving HTML <img> tags with data URIs and bare base64 content, improving image handling in markdown.
- Added comprehensive tests for various image scenarios, ensuring robust handling of data URIs and base64 images.
- Added a new `.env.lite.example` file for the Lite version, providing a minimal configuration template.
- Updated `.env.example` to remove deprecated variables and include new Docreader settings.
- Enhanced Docker configurations to support the Lite version, including a new Dockerfile for the Docreader service.
- Introduced a Makefile target for building and running the Lite version, along with packaging capabilities.
- Created GitHub workflows for building and releasing Lite binaries, including Homebrew formula support.
- Implemented a new service file for managing the Lite version as a system service.
This update enables a streamlined, single-binary deployment of WeKnora, reducing external dependencies and simplifying setup.
- Add docstrings and inline comments for key functions and complex logic
- Unify comment style, eliminate magic numbers and ambiguous variable names
- No functional changes, only improve maintainability