Skip to content

Chart & Table Extractor

Guide to using the ChartTablePDFParser for targeted extraction.

Overview

The ChartTablePDFParser is a specialized parser focused exclusively on extracting charts and tables from PDF documents. It's optimized for scenarios where you only need these specific elements.

Key Features

  • Focused Extraction: Extract only charts and/or tables
  • Selective Processing: Choose what to extract
  • VLM Integration: Convert visuals to structured data
  • Faster Processing: Skips unnecessary elements

Basic Usage

from doctra import ChartTablePDFParser

parser = ChartTablePDFParser(
    extract_charts=True,
    extract_tables=True
)

parser.parse("data_report.pdf")

Selective Extraction

# Extract only tables
parser = ChartTablePDFParser(
    extract_charts=False,
    extract_tables=True
)

# Extract only charts
parser = ChartTablePDFParser(
    extract_charts=True,
    extract_tables=False
)

With VLM for Structured Data

parser = ChartTablePDFParser(
    extract_charts=True,
    extract_tables=True,
    use_vlm=True,
    vlm_provider="openai",
    vlm_api_key="your-key"
)

parser.parse("report.pdf")
# Outputs: tables.xlsx, tables.html, vlm_items.json

When to Use

Use ChartTablePDFParser when:

  • You only need charts and/or tables
  • Faster processing is important
  • Working with data-heavy documents
  • Extracting data for analysis

See Also