Skip to content

Command Line Interface

Doctra provides a powerful CLI for document processing automation.

Installation

The CLI is automatically installed with Doctra:

pip install doctra

Verify installation:

doctra --version

Basic Usage

doctra [COMMAND] [OPTIONS] [ARGUMENTS]

Commands

parse

Parse a PDF document with full processing.

doctra parse <pdf_file> [OPTIONS]

Options:

  • --output-dir PATH: Output directory (default: outputs)
  • --dpi INTEGER: Image resolution (default: 200)
  • --min-score FLOAT: Minimum confidence score (default: 0.0)
  • --ocr-lang TEXT: OCR language code (default: eng)
  • --use-vlm: Enable VLM processing
  • --vlm-provider TEXT: VLM provider (openai, gemini, anthropic, openrouter)
  • --vlm-api-key TEXT: VLM API key
  • --vlm-model TEXT: Specific VLM model

Example:

# Basic parsing
doctra parse document.pdf

# With custom settings
doctra parse document.pdf --dpi 300 --output-dir my_outputs

# With VLM
doctra parse document.pdf --use-vlm --vlm-provider openai --vlm-api-key sk-xxx

enhance

Parse with image restoration for low-quality documents.

doctra enhance <pdf_file> [OPTIONS]

Options:

  • All parse options, plus:
  • --restoration-task TEXT: Restoration task (default: appearance)
    • Choices: appearance, dewarping, deshadowing, deblurring, binarization, end2end
  • --restoration-device TEXT: Device (cuda, cpu, or auto)
  • --restoration-dpi INTEGER: DPI for restoration (default: 200)

Example:

# Basic enhancement
doctra enhance scanned.pdf

# Dewarp with GPU
doctra enhance scanned.pdf --restoration-task dewarping --restoration-device cuda

# Full enhancement with VLM
doctra enhance scanned.pdf \
  --restoration-task appearance \
  --restoration-device cuda \
  --use-vlm \
  --vlm-provider openai \
  --vlm-api-key sk-xxx

extract

Extract only charts and/or tables from a document.

doctra extract <type> <pdf_file> [OPTIONS]

Type:

  • charts: Extract only charts
  • tables: Extract only tables
  • both: Extract both charts and tables

Options:

  • --output-dir PATH: Output directory (default: outputs)
  • --dpi INTEGER: Image resolution (default: 200)
  • --use-vlm: Enable VLM for structured data
  • --vlm-provider TEXT: VLM provider
  • --vlm-api-key TEXT: VLM API key
  • --vlm-model TEXT: Specific VLM model

Examples:

# Extract charts only
doctra extract charts report.pdf

# Extract tables with VLM
doctra extract tables report.pdf --use-vlm --vlm-provider gemini --vlm-api-key xxx

# Extract both
doctra extract both report.pdf --output-dir data_extracts

visualize

Visualize layout detection results.

doctra visualize <pdf_file> [OPTIONS]

Options:

  • --num-pages INTEGER: Number of pages to visualize (default: 3)
  • --cols INTEGER: Number of columns in grid (default: 2)
  • --page-width INTEGER: Width of each page (default: 800)
  • --spacing INTEGER: Spacing between pages (default: 40)
  • --output PATH: Save to file instead of displaying
  • --dpi INTEGER: Image resolution (default: 200)

Examples:

# Display first 3 pages
doctra visualize document.pdf

# Save visualization of 6 pages
doctra visualize document.pdf --num-pages 6 --output layout.png

# Custom grid layout
doctra visualize document.pdf --num-pages 9 --cols 3 --page-width 600

analyze

Quick document analysis showing structure.

doctra analyze <pdf_file> [OPTIONS]

Options:

  • --dpi INTEGER: Image resolution (default: 200)

Example:

doctra analyze document.pdf

Output shows:

Document Analysis: document.pdf
=====================================
Total pages: 10

Page 1:
  - Text regions: 5
  - Tables: 1
  - Charts: 0
  - Figures: 2

Page 2:
  ...

info

Display system and configuration information.

doctra info

Shows:

  • Doctra version
  • Python version
  • Installed dependencies
  • GPU availability
  • System information

Example output:

Doctra Information
==================
Version: 0.4.3
Python: 3.10.11

Dependencies:
  - PaddlePaddle: 2.5.0
  - PaddleOCR: 2.7.0
  - PyTesseract: 0.3.10
  - Pillow: 10.0.0

System:
  - OS: Windows 10
  - CUDA Available: Yes
  - GPU: NVIDIA GeForce RTX 3080

Batch Processing

Process Multiple Files

# Using shell globbing
doctra parse *.pdf --output-dir batch_results

# Using find (Linux/Mac)
find ./documents -name "*.pdf" -exec doctra parse {} \;

# Using PowerShell (Windows)
Get-ChildItem *.pdf | ForEach-Object { doctra parse $_.FullName }

Process Directory

# Parse all PDFs in directory
for pdf in directory/*.pdf; do
    doctra parse "$pdf" --output-dir results/
done

Environment Variables

Set default values using environment variables:

# VLM Configuration
export DOCTRA_VLM_PROVIDER=openai
export DOCTRA_VLM_API_KEY=sk-xxx
export DOCTRA_VLM_MODEL=gpt-4o

# Processing Settings
export DOCTRA_DPI=200
export DOCTRA_OCR_LANG=eng
export DOCTRA_DEVICE=cuda

# Then use without flags
doctra parse document.pdf --use-vlm

Configuration File

Create .doctra.yml in your project directory:

# .doctra.yml
vlm:
  provider: openai
  api_key: sk-xxx
  model: gpt-4o

processing:
  dpi: 200
  ocr_lang: eng
  device: cuda

output:
  base_dir: outputs

Then run commands without options:

doctra parse document.pdf

Output Structure

Standard Parse

outputs/
└── document/
    └── full_parse/
        ├── result.md
        ├── result.html
        └── images/
            ├── figures/
            ├── charts/
            └── tables/

Enhanced Parse

outputs/
└── document/
    └── enhanced_parse/
        ├── result.md
        ├── result.html
        ├── document_enhanced.pdf  # Restored PDF
        ├── enhanced_pages/  # Restored page images
        └── images/

Extract

outputs/
└── document/
    └── structured_parsing/
        ├── charts/  # Chart images
        ├── tables/  # Table images
        ├── parsed_tables_charts.xlsx  # If VLM enabled
        ├── parsed_tables_charts.html  # If VLM enabled
        └── vlm_items.json  # If VLM enabled

Examples

Example 1: Basic Document Processing

# Parse a financial report
doctra parse financial_report.pdf

# Output: outputs/financial_report/full_parse/

Example 2: Enhanced Processing with VLM

# Process scanned document with enhancement and VLM
doctra enhance scanned_document.pdf \
  --restoration-task appearance \
  --restoration-device cuda \
  --use-vlm \
  --vlm-provider openai \
  --vlm-api-key $OPENAI_API_KEY \
  --output-dir enhanced_results

Example 3: Extract Data for Analysis

# Extract all tables with VLM to get structured data
doctra extract tables data_report.pdf \
  --use-vlm \
  --vlm-provider gemini \
  --vlm-api-key $GEMINI_API_KEY

# Result: outputs/data_report/structured_parsing/parsed_tables_charts.xlsx

Example 4: Batch Processing Pipeline

#!/bin/bash
# process_documents.sh

INPUT_DIR="./input_pdfs"
OUTPUT_DIR="./processed"

for pdf in "$INPUT_DIR"/*.pdf; do
    echo "Processing: $pdf"

    # First enhance the document
    doctra enhance "$pdf" \
      --restoration-task appearance \
      --restoration-device cuda \
      --output-dir "$OUTPUT_DIR"

    echo "Completed: $pdf"
done

echo "All documents processed!"

Example 5: Quality Check with Visualization

# Visualize layout detection before full processing
doctra visualize document.pdf --num-pages 5 --output viz_check.png

# Review viz_check.png to ensure good detection

# Then proceed with full processing
doctra parse document.pdf --use-vlm

Troubleshooting

Command Not Found

Problem: doctra: command not found

Solution:

# Ensure Doctra is installed
pip install doctra

# Or use module syntax
python -m doctra.cli.main parse document.pdf

API Key Errors

Problem: VLM API key not recognized

Solution:

# Set environment variable
export OPENAI_API_KEY=sk-xxx

# Or pass directly
doctra parse document.pdf --use-vlm --vlm-api-key sk-xxx

Poppler Errors

Problem: pdftoppm not found

Solution: Install Poppler (see Installation Guide)

Memory Errors

Problem: Out of memory during processing

Solution:

# Reduce DPI
doctra parse large.pdf --dpi 150

# Or process pages individually
doctra parse large.pdf --max-pages 10

Advanced Usage

Custom Scripts

Combine CLI with shell scripts:

#!/bin/bash
# Smart processing script

PDF=$1

# Check file size
SIZE=$(du -k "$PDF" | cut -f1)

if [ $SIZE -gt 10000 ]; then
    echo "Large file, using lower DPI..."
    doctra parse "$PDF" --dpi 150
else
    echo "Standard processing..."
    doctra parse "$PDF" --dpi 200 --use-vlm
fi

Integration with Other Tools

# OCR + Search Pipeline
doctra parse document.pdf
grep "keyword" outputs/document/full_parse/result.md

# Extract data and analyze
doctra extract tables report.pdf --use-vlm
python analyze_tables.py outputs/report/structured_parsing/parsed_tables_charts.xlsx

See Also