Enhanced PDF Parser¶
Guide to using the EnhancedPDFParser
with image restoration.
Overview¶
The EnhancedPDFParser
extends StructuredPDFParser
with DocRes image restoration capabilities. It's ideal for processing scanned documents, low-quality PDFs, or documents with visual distortions.
Key Features¶
- Image Restoration: DocRes integration for document enhancement
- 6 Restoration Tasks: Dewarping, deshadowing, deblurring, and more
- GPU Acceleration: Optional CUDA support for faster processing
- All Base Features: Inherits all
StructuredPDFParser
capabilities
Basic Usage¶
from doctra import EnhancedPDFParser
parser = EnhancedPDFParser(
use_image_restoration=True,
restoration_task="appearance"
)
parser.parse("scanned_document.pdf")
Restoration Tasks¶
Task | Best For |
---|---|
appearance |
General enhancement (default) |
dewarping |
Perspective distortion |
deshadowing |
Shadow removal |
deblurring |
Blur reduction |
binarization |
Clean B&W conversion |
end2end |
Severe degradation |
When to Use¶
Use EnhancedPDFParser
for:
- Scanned documents
- Low-quality PDFs
- Documents with visual distortions
- When OCR accuracy is poor with standard parser
See Also¶
- DocRes Engine - Image restoration details
- Structured Parser - Base parser
- API Reference - Complete API documentation