Configuration¶
Set SILKWEB_STRICT_CONFIG=1 (or true / yes) in the environment so configure(...) raises SilkwebConfigError on unknown top-level keys instead of storing them in extra.
SilkwebConfig
dataclass
¶
SilkwebConfig(cleaner_model: str = 'ollama/reader-lm-v2', schema_model: str = 'ollama/qwen2.5-coder:14b', extraction_model: str = 'ollama/qwen2.5:14b', selector_model: str = 'ollama/qwen2.5-coder:14b', embedding_model: str = 'ollama/nomic-embed-text', vision_model: str | None = None, default_tier: str | int = 'auto', max_tier: int = 3, auto_escalate: bool = True, timeout: int = 30000, llm_timeout_ms: int = 120000, user_agent: str = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36', impersonate: str = 'chrome_124', headers: dict[str, str] = dict(), chunk_strategy: str = 'bm25', max_tokens_per_chunk: int = 8000, representation: str = 'auto', extraction_html_max_chars: int = 400000, extraction_prompt_body_max_chars: int = 100000, extraction_max_tokens: int = 8192, extraction_markdown_max_chars: int = 400000, include_provenance: bool = True, force_llm: bool = False, hydration_first: bool = True, hydration_subset: bool = True, hydration_max_chars: int = 80000, cache_enabled: bool = True, cache_backend: str = 'sqlite', cache_path: str = '~/.silkweb/cache', redis_url: str | None = None, http_cache_ttl: int = 3600, page_cache_ttl: int = 1800, selector_cache_ttl: int | None = None, proxies: list[str] = list(), proxy_rotation: str = 'on_failure', rate_limit_global: int | None = None, rate_limit_per_domain: int = 2, respect_robots: bool = True, max_retries: int = 3, retry_backoff: str = 'exponential', retry_backoff_base: int = 2, prefer_nodriver: bool = False, human_mouse: bool = False, human_typing: bool = False, captcha_solver: str | None = None, default_output_format: str = 'python', auto_detect_dataframe: bool = True, log_level: str = 'WARNING', log_format: str = 'text', metrics_port: int | None = None, telemetry_enabled: bool = False, replay_dir: str | None = None, extra: dict[str, Any] = dict())
get_config
¶
configure
¶
Update global Silkweb configuration.
Known fields are set on :class:SilkwebConfig; unknown keys go into extra.
When environment variable SILKWEB_STRICT_CONFIG is 1 / true / yes,
unknown top-level keys raise :class:SilkwebConfigError instead of being stored
in extra (helps catch typos like configure(timeouts=30)).