14 Apr 12:10

SunnyHaze

57f5015

Flash-MinerU v1.0.1 Release Note Latest

Latest

What's changed

Remove numpy<2 restrict from requirements

Full Changelog: v1.0.0...v1.0.1

Assets 2

03 Apr 18:39

SunnyHaze

v1.0.0

d6d6d56

Flash-MinerU v1.0.0 Release Note

English

Default engine: MineruEngine now uses pipeline-parallel inference (overlapping pdf2img → VLM → Markdown).
Performance: Up to 2.7× faster vs v0.0.4, 1.6× vs manual multi-process, 7.6× vs single GPU.
Tuning: New inflight parameter to control concurrent batches in the DAG.
Legacy API: Sequential batching moved to MineruEngineLegacy (deprecated).
Docs: Added pipeline overview and benchmarks (docs/BENCHMARK.md).
Fix: Resolved memory leak caused by unreleased PDFium objects (#11).

中文

默认引擎： MineruEngine 改为 流水线并行推理（pdf2img → VLM → Markdown 重叠执行）。
性能提升： 相比 v0.0.4 最高 2.7× 加速，相比手动多进程 1.6×，相比单卡 7.6×。
调参： 新增 inflight，控制 DAG 内并发 batch 数。
旧 API： 顺序执行迁移至 MineruEngineLegacy（已弃用）。
文档： 补充流水线说明与性能测试（docs/BENCHMARK.md）。
修复： 修复 PDFium 未释放导致的内存泄漏（#11）。

📊 Benchmark

Usage: [中文](./docs/BENCHMARK.zh.md) · [English](./docs/BENCHMARK.md)

Results (368 PDFs, single node with 8× A100)

Setup	Config	Time
Flash-MinerU v1.0.0	`MineruEngine`, 8 replicas, `inflight=8`	~8.5 min
MinerU (native)	8 manual processes (parallel mode, 1 GPU each)	~14 min
Flash-MinerU v0.0.4	`MineruEngineLegacy`, batch_size=16	~23 min
MinerU (native)	vLLM, single GPU	~65 min

Takeaways

⚡ ~2.7× faster vs v0.0.4
⚡ ~1.6× faster vs manual multi-process MinerU
⚡ ~7.6× faster vs single-GPU MinerU
🚀 Speedup mainly from pipeline-parallel execution (stage overlap + better utilization)

What's Changed

fix: Fix memory leak from PDFium objects not being closed — @1773899415 — #11
feat: Pipelined inference refactor with major speedup — @SunnyHaze — #12

New Contributors

@1773899415 — first contribution in #11

Full Changelog:
v0.0.4...v1.0.0

Contributors

SunnyHaze and 1773899415

Assets 2

10 Mar 14:00

SunnyHaze

v0.0.4

15030a3

Flash-MinerU v0.0.4 release note

What's Changed

fix end_page_id bug in Pdf2ImageOp by @Lavender1 in #10

Full Changelog: v0.0.2...v0.0.4

Contributors

Lavender1

Assets 2

10 Mar 13:52

SunnyHaze

v0.0.3

d08ac36

Flash-MinerU v0.0.3 Release note

What's Changed

fix end_page_id bug in Pdf2ImageOp by @Lavender1 in #10

Full Changelog: v0.0.2...v0.0.3

Contributors

Lavender1

Assets 2

12 Feb 12:54

SunnyHaze

v0.0.2

4fa520e

Flash-MinerU v0.0.2 Release Note

What's Changed

[README] update readme by @SunnyHaze in #5
[README] update some naive benchmark by @SunnyHaze in #6
output layout.json and content_list.json by @Lavender1 in #7

Full Changelog: v0.0.1...v0.0.2

Contributors

SunnyHaze and Lavender1

Assets 2

04 Feb 18:17

SunnyHaze

v0.0.1

81d3b43

Flash-MinerU v0.0.1 Release Note

🎉 Flash-MinerU v0.0.1 — Initial Release

We’re excited to announce the first public release of Flash-MinerU 🎉
This version lays down the core foundation for Ray-based parallel acceleration of MinerU’s VLM inference pipeline, focusing on correctness, extensibility, and engineering clarity.

✨ Highlights

🚀 End-to-end MinerU parsing workflow implemented
Core logic for MinerU PDF parsing is now fully integrated, including all essential parsing and processing functions.
🧩 Clean, minimal dependency setup
Requirements have been streamlined to remove redundancy while keeping the minimal set needed for stable execution.
🧪 Compatibility with domestic computing environments
Dependency versions have been adjusted to ensure smooth execution on special environments.
🧹 Codebase cleanup & structure refinement
Improved readability and maintainability in preparation for future parallelization and scaling features.

🔧 What’s Changed

Core Features
- Added the main MinerU parsing logic and all related functions
  (PR #1 by @Lavender1)
Dependencies & Environment
- Added minimal required dependencies and removed redundant entries
  (PR #2 by @SunnyHaze)
- Updated requirements.txt for compatibility with MUXI environments
  (PR #3 by @SunnyHaze)
Refactoring & Cleanup
- General code cleanup and structure optimization
  (PR #4 by @SunnyHaze)

🤝 New Contributors

Welcome to our first contributors — thanks for helping Flash-MinerU take off 🚀

🎉 @Lavender1 — first contribution via PR #1
🎉 @SunnyHaze — first contribution via PR #2

📜 Full Changelog

🔗 https://github.com/OpenDCAI/Flash-MinerU/commits/v0.0.1

🗺️ What’s Next?

This is just the beginning. Upcoming releases will focus on:

Ray-based multi-replica VLM inference acceleration
Performance benchmarks (single vs multi-GPU)
Additional inference backends and service-oriented deployment

Stay tuned ⚡️📄

Contributors

SunnyHaze and Lavender1

Assets 2

Releases: OpenDCAI/Flash-MinerU

Flash-MinerU v1.0.1 Release Note

What's changed

Uh oh!

Flash-MinerU v1.0.0 Release Note

Flash-MinerU v1.0.0 Release Note

English

中文

📊 Benchmark

Results (368 PDFs, single node with 8× A100)

Takeaways

What's Changed

New Contributors

Contributors

Uh oh!

Flash-MinerU v0.0.4 release note

What's Changed

Contributors

Uh oh!

Flash-MinerU v0.0.3 Release note

What's Changed

Contributors

Uh oh!

Flash-MinerU v0.0.2 Release Note

What's Changed

Contributors

Uh oh!

Flash-MinerU v0.0.1 Release Note

🎉 Flash-MinerU v0.0.1 — Initial Release

✨ Highlights

🔧 What’s Changed

🤝 New Contributors

📜 Full Changelog

🗺️ What’s Next?

Contributors

Uh oh!