-
magic_pdf-0.6.2b1-released
Optimized model loading logic, now requiring only a single load during batch processing. Command-line interface now supports batch input. When import fails, prints complete error messages to facilitate troubleshooting. Fixed a bug where overlapping spans were incorrectly removed multiple times. Improved OCR recognition areas, doubling the OCR speed. Embedded language identification models within the whl package for easier offline deployment. Replaced interline_equation_blocks with interline_equations to enhance interline formula recognition capabilities in non-academic paper scenarios. Added page number indexing to the output results of content_list. Locked some dependency versions and adjusted the dependency installation logic to reduce conflicts and redundant installations, cutting down the number of packages by 30% and improving the initial installation success rate.
-
magic_pdf-0.6.1-released
fix:Add two spaces at the end of an image or table row to ensure that the caption can render line breaks properly.
-
magic_pdf-0.6.0-released
这是一个大版本更新,相比0.5版本,新增了高精度模型集成,尽量简化了各平台的安装方案,同时修复了一系列bug。 具体安装使用方式请参考readme。 This is a major update, compared to version 0.5, it introduces high-precision model integration, streamlines installation solutions for various platforms, and fixes a series of bugs. For specific installation and usage instructions, please refer to the README file.
-
magic_pdf-0.5.13-released
fix: 1.fasttext not support numpy>=2.0.0. 2.The presence of ".pdf" multiple times in the pdf_path results in model_path not matching the expected.
-
magic_pdf-0.5.12-released
update: cleanup requirements.txt fix: add try import opencv-python and Pillow
-
magic_pdf-0.5.10-released
fix: 1.close some log output if not in debug mode 2.use deepcopy keep the original model json 3.img_dir abnormal
-
magic_pdf-0.5.6-released
fix: 1.use line_lang instead of content_lang to concatenate para update: 1.Enhance the capability to detect garbled document issues
-
magic_pdf-0.5.5-released
update: 1.AVG_TEXT_LEN_THRESHOLD 200->100 2.custom model framework 3.cli output files add orig_pdf and model_json