Skip to content

Configuration

Set SILKWEB_STRICT_CONFIG=1 (or true / yes) in the environment so configure(...) raises SilkwebConfigError on unknown top-level keys instead of storing them in extra.

SilkwebConfig dataclass

SilkwebConfig(cleaner_model: str = 'ollama/reader-lm-v2', schema_model: str = 'ollama/qwen2.5-coder:14b', extraction_model: str = 'ollama/qwen2.5:14b', selector_model: str = 'ollama/qwen2.5-coder:14b', embedding_model: str = 'ollama/nomic-embed-text', vision_model: str | None = None, default_tier: str | int = 'auto', max_tier: int = 3, auto_escalate: bool = True, timeout: int = 30000, llm_timeout_ms: int = 120000, user_agent: str = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36', impersonate: str = 'chrome_124', headers: dict[str, str] = dict(), chunk_strategy: str = 'bm25', max_tokens_per_chunk: int = 8000, representation: str = 'auto', extraction_html_max_chars: int = 400000, extraction_prompt_body_max_chars: int = 100000, extraction_max_tokens: int = 8192, extraction_markdown_max_chars: int = 400000, include_provenance: bool = True, force_llm: bool = False, hydration_first: bool = True, hydration_subset: bool = True, hydration_max_chars: int = 80000, cache_enabled: bool = True, cache_backend: str = 'sqlite', cache_path: str = '~/.silkweb/cache', redis_url: str | None = None, http_cache_ttl: int = 3600, page_cache_ttl: int = 1800, selector_cache_ttl: int | None = None, proxies: list[str] = list(), proxy_rotation: str = 'on_failure', rate_limit_global: int | None = None, rate_limit_per_domain: int = 2, respect_robots: bool = True, max_retries: int = 3, retry_backoff: str = 'exponential', retry_backoff_base: int = 2, prefer_nodriver: bool = False, human_mouse: bool = False, human_typing: bool = False, captcha_solver: str | None = None, default_output_format: str = 'python', auto_detect_dataframe: bool = True, log_level: str = 'WARNING', log_format: str = 'text', metrics_port: int | None = None, telemetry_enabled: bool = False, replay_dir: str | None = None, extra: dict[str, Any] = dict())

get_config

get_config() -> SilkwebConfig
Source code in silkweb/config.py
def get_config() -> SilkwebConfig:
    return _CONFIG

configure

configure(**kwargs: Any) -> SilkwebConfig

Update global Silkweb configuration.

Known fields are set on :class:SilkwebConfig; unknown keys go into extra.

When environment variable SILKWEB_STRICT_CONFIG is 1 / true / yes, unknown top-level keys raise :class:SilkwebConfigError instead of being stored in extra (helps catch typos like configure(timeouts=30)).

Source code in silkweb/config.py
def configure(**kwargs: Any) -> SilkwebConfig:
    """
    Update global Silkweb configuration.

    Known fields are set on :class:`SilkwebConfig`; unknown keys go into ``extra``.

    When environment variable ``SILKWEB_STRICT_CONFIG`` is ``1`` / ``true`` / ``yes``,
    unknown **top-level** keys raise :class:`SilkwebConfigError` instead of being stored
    in ``extra`` (helps catch typos like ``configure(timeouts=30)``).
    """
    strict = os.environ.get("SILKWEB_STRICT_CONFIG", "").strip().lower() in ("1", "true", "yes")
    for key, value in kwargs.items():
        if hasattr(_CONFIG, key):
            setattr(_CONFIG, key, value)
        else:
            if strict:
                raise SilkwebConfigError(
                    message=f"Unknown SilkwebConfig field {key!r}. "
                    f"Set SILKWEB_STRICT_CONFIG=0 or use `extra` via supported keys only.",
                    key=key,
                    value=value,
                )
            _CONFIG.extra[key] = value
    return _CONFIG