Development Guide¶
Thank you for your interest in contributing to Doctra! This guide will help you get started.
Getting Started¶
Development Setup¶
- Fork and Clone
- Create Virtual Environment
- Install Development Dependencies
This installs Doctra in editable mode with development tools.
- Install System Dependencies
Follow the Installation Guide for Poppler.
Project Structure¶
Doctra/
├── doctra/ # Main package
│ ├── parsers/ # PDF parsers
│ ├── engines/ # Processing engines
│ ├── exporters/ # Output formatters
│ ├── ui/ # Web interface
│ ├── cli/ # Command line interface
│ └── utils/ # Utilities
├── tests/ # Test suite
├── docs/ # Documentation
├── examples/ # Example scripts
├── notebooks/ # Jupyter notebooks
└── setup.py # Package configuration
Development Workflow¶
1. Create a Branch¶
Branch naming conventions:
feature/
- New featuresfix/
- Bug fixesdocs/
- Documentation updatesrefactor/
- Code refactoringtest/
- Test additions/updates
2. Make Changes¶
Write clean, well-documented code following our Code Style.
3. Run Tests¶
Run specific test:
With coverage:
4. Format Code¶
# Format with Black
black doctra tests
# Sort imports
isort doctra tests
# Lint with Flake8
flake8 doctra tests
5. Type Checking¶
6. Commit Changes¶
Commit message format:
feat:
- New featurefix:
- Bug fixdocs:
- Documentationstyle:
- Formattingrefactor:
- Code restructuringtest:
- Testschore:
- Maintenance
7. Push and Create PR¶
Then create a Pull Request on GitHub.
Code Style¶
Python Style Guide¶
We follow PEP 8 with these configurations:
# .flake8
[flake8]
max-line-length = 88
extend-ignore = E203, W503
exclude = .git,__pycache__,docs,build,dist
Code Formatting¶
# Black configuration in pyproject.toml
[tool.black]
line-length = 88
target-version = ['py38', 'py39', 'py310', 'py311', 'py312']
Import Sorting¶
Example Code¶
"""Module docstring explaining purpose."""
from typing import Optional, Union
import numpy as np
from PIL import Image
from doctra.utils import helper_function
class MyParser:
"""Class docstring explaining purpose.
Args:
param1: Description of param1
param2: Description of param2
Attributes:
attribute1: Description
"""
def __init__(self, param1: str, param2: int = 10):
"""Initialize the parser."""
self.param1 = param1
self.param2 = param2
def process(self, input_data: Union[str, np.ndarray]) -> Optional[Image.Image]:
"""Process input data.
Args:
input_data: Input to process
Returns:
Processed image or None
Raises:
ValueError: If input is invalid
"""
if not self._validate(input_data):
raise ValueError("Invalid input")
return self._do_process(input_data)
def _validate(self, data) -> bool:
"""Private helper method."""
return data is not None
Testing¶
Writing Tests¶
Create tests in tests/
directory:
import pytest
from doctra.parsers import StructuredPDFParser
def test_parser_initialization():
"""Test parser can be initialized."""
parser = StructuredPDFParser()
assert parser is not None
def test_parse_basic_pdf():
"""Test parsing a basic PDF."""
parser = StructuredPDFParser()
result = parser.parse("test_data/sample.pdf")
assert result is not None
@pytest.mark.parametrize("dpi", [100, 200, 300])
def test_different_dpi_settings(dpi):
"""Test parser with different DPI settings."""
parser = StructuredPDFParser(dpi=dpi)
assert parser.dpi == dpi
Running Tests¶
# All tests
pytest
# Specific file
pytest tests/test_parsers.py
# Specific test
pytest tests/test_parsers.py::test_parser_initialization
# With verbose output
pytest -v
# With coverage
pytest --cov=doctra --cov-report=html
# Stop on first failure
pytest -x
Test Coverage¶
Aim for >80% code coverage:
Documentation¶
Building Documentation¶
# Install documentation dependencies
pip install -r docs/requirements.txt
# Build and serve locally
mkdocs serve
# Build static site
mkdocs build
View at: http://127.0.0.1:8000
Writing Documentation¶
- Use Markdown for all documentation
- Add docstrings to all public APIs
- Include code examples
- Update relevant docs when adding features
Docstring Format¶
We use Google-style docstrings:
def function(param1: str, param2: int) -> bool:
"""Short description.
Longer description if needed.
Args:
param1: Description of param1
param2: Description of param2
Returns:
Description of return value
Raises:
ValueError: When param1 is invalid
Examples:
>>> function("test", 5)
True
"""
pass
Pull Request Guidelines¶
Before Submitting¶
- Tests pass:
pytest
- Code formatted:
black doctra tests
- Imports sorted:
isort doctra tests
- Linting clean:
flake8 doctra tests
- Type checking:
mypy doctra
- Documentation updated
- CHANGELOG.md updated
PR Description Template¶
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
Describe testing done
## Checklist
- [ ] Tests pass
- [ ] Code formatted
- [ ] Documentation updated
- [ ] CHANGELOG updated
Review Process¶
- Automated checks run (tests, linting)
- Code review by maintainers
- Requested changes addressed
- Approved and merged
Common Tasks¶
Adding a New Parser¶
- Create parser file:
doctra/parsers/new_parser.py
- Implement parser class
- Add tests:
tests/test_new_parser.py
- Update
doctra/__init__.py
- Add documentation:
docs/user-guide/parsers/new-parser.md
- Add API reference:
docs/api/parsers.md
Adding a New Feature¶
- Create feature branch
- Implement feature with tests
- Update documentation
- Submit PR with description
Fixing a Bug¶
- Create test that reproduces bug
- Fix bug
- Verify test passes
- Submit PR referencing issue
Development Tools¶
Pre-commit Hooks¶
Install pre-commit hooks:
This runs checks before each commit:
- Black formatting
- isort import sorting
- Flake8 linting
- Trailing whitespace removal
IDE Setup¶
VS Code¶
Recommended settings.json
:
{
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.linting.mypyEnabled": true,
"editor.formatOnSave": true,
"[python]": {
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
}
}
PyCharm¶
- Enable Black formatter
- Enable Flake8 linter
- Enable mypy type checker
Getting Help¶
- Questions: Open a GitHub Discussion
- Bugs: Report in GitHub Issues
- Chat: Join our community (link in README)
Code of Conduct¶
Please read and follow our Code of Conduct.
License¶
By contributing, you agree that your contributions will be licensed under the MIT License.