Files
WeKnora/docreader/parser
wizardchen 13301ca026 feat(parser): enhance web parser with improved image extension handling and XPath prioritization
- Added a regex pattern for image file extensions to the utils module for better image detection.
- Updated the BODY_XPATH in the xpaths module to prioritize matching specific content structures in web pages.
- These changes aim to improve the accuracy and efficiency of content extraction from web pages using the StdWebParser class.
2026-05-25 19:15:17 +08:00
..