标签 · AIModels / opendatalab / MinerU

使用标签，可以设置提交历史上的特定点为重要提交

magic_pdf-0.8.1-update-docs

62aa1cbd · Merge pull request #707 from myhloli/master · 10月 09, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.8.1-released

c95f3816 · Merge pull request #599 from icecraft/fix/uncorrect_figure_footnote_relation · 9月 12, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.8.0-released

9f352df0 · Realese 0.8.0 (#586) · 9月 10, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.7.1-released

1dc915a4 · release: release 0.7.1 version (#526) · 9月 02, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.7.0b1-released

fa3475a4 · Merge pull request #386 from myhloli/master · 8月 09, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.7.0a1-released

29e48c73 · mirror(conda): use tuna mirror for Anaconda download · 8月 05, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar

magic_pdf-0.6.2b1-released

3aec9c61 · Merge remote-tracking branch 'origin/master' · 7月 31, 2024

Optimized model loading logic, now requiring only a single load during batch processing.
Command-line interface now supports batch input.
When import fails, prints complete error messages to facilitate troubleshooting.
Fixed a bug where overlapping spans were incorrectly removed multiple times.
Improved OCR recognition areas, doubling the OCR speed.
Embedded language identification models within the whl package for easier offline deployment.
Replaced interline_equation_blocks with interline_equations to enhance interline formula recognition capabilities in non-academic paper scenarios.
Added page number indexing to the output results of content_list.
Locked some dependency versions and adjusted the dependency installation logic to reduce conflicts and redundant installations, cutting down the number of packages by 30% and improving the initial installation success rate.

magic_pdf-0.6.1-released

ff13c8e1 · fix(mkmarkdown): add 2 space after image and table URLs · 7月 13, 2024
```
fix:Add two spaces at the end of an image or table row to ensure that the caption can render line breaks properly.
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar

magic_pdf-0.6.0-released

61fab96e · fix(setup): specify paddleocr version to fix compatibility issue · 7月 12, 2024

这是一个大版本更新，相比0.5版本，新增了高精度模型集成，尽量简化了各平台的安装方案，同时修复了一系列bug。
具体安装使用方式请参考readme。
This is a major update, compared to version 0.5, it introduces high-precision model integration, streamlines installation solutions for various platforms, and fixes a series of bugs.
For specific installation and usage instructions, please refer to the README file.

magic_pdf-0.5.13-released

1e73b9fc · fix: fasttext not support numpy>=2.0.0 · 7月 07, 2024

fix:
1.fasttext not support numpy>=2.0.0.
2.The presence of ".pdf" multiple times in the pdf_path results in model_path not matching the expected.

magic_pdf-0.5.12-released

6c656af6 · update:cleanup requirements.txt · 6月 28, 2024
```
update:
cleanup requirements.txt
fix:
add try import opencv-python and Pillow
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.11-released

6979c926 · update readme · 6月 26, 2024
```
update:
fix cli and inside model used logic
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.10-released

6e8e81c9 · update readme · 6月 25, 2024
```
fix:
1.close some log output if not in debug mode
2.use deepcopy keep the original model json
3.img_dir abnormal
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.9-released

016f94f6 · update readme · 6月 20, 2024
```
update:
1.add entry points can exec in shell
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.8-released

c4fc4d5c · format · 6月 20, 2024
```
update:
1.add some switch to cli
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.7-released

8998380d · update check invalid_chars algorithm to improve accuracy · 6月 20, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.6-released

df14c61f · update: Enhance the capability to detect garbled document issues · 6月 19, 2024
```
fix:
1.use line_lang instead of content_lang to concatenate para
update:
1.Enhance the capability to detect garbled document issues
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.5-released

5f313bd0 · fix local write pdf file name bug · 6月 18, 2024
```
update:
1.AVG_TEXT_LEN_THRESHOLD 200->100
2.custom model framework
3.cli output files add orig_pdf and model_json
```
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.4-released

c69f414b · update pypi upload logic · 6月 17, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar
magic_pdf-0.5.3-released

35d39735 · update pypi upload logic · 6月 17, 2024
- 下载源代码
  
  zip
  
  tar.gz
  
  tar.bz2
  
  tar

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码

下载源代码