mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
- Added a regex pattern for image file extensions to the utils module for better image detection. - Updated the BODY_XPATH in the xpaths module to prioritize matching specific content structures in web pages. - These changes aim to improve the accuracy and efficiency of content extraction from web pages using the StdWebParser class.