Skip to content

Observability

Silkweb provides structured logging, optional Prometheus metrics, and a replay mode for deterministic debugging.

Structured logging

Silkweb uses structlog for structured, machine-readable logs.

Configuration

import silkweb

silkweb.configure(
    log_level="INFO",       # "DEBUG" | "INFO" | "WARNING" | "ERROR"
    log_format="json",      # "json" | "text"
)

Log events

All major operations emit structured log events:

Event When
fetch_start A fetch request begins
fetch_complete A fetch request completes
fetch_escalated Tier escalation triggered
cache_hit Cache lookup succeeded
cache_miss Cache lookup missed
llm_call_start An LLM call begins
llm_call_complete An LLM call completes
extraction_complete Data extraction finished
selector_cached Selectors saved to cache
self_heal_triggered Self-healer activated

Log fields

Each event includes contextual fields:

{
  "timestamp": "2025-04-30T12:00:00Z",
  "event": "fetch_complete",
  "url": "https://example.com",
  "tier": 1,
  "status_code": 200,
  "duration_ms": 234,
  "cache_hit": false
}

Prometheus metrics

When config.metrics_port is set, Silkweb exposes Prometheus-compatible metrics:

silkweb.configure(metrics_port=9090)

Available metrics

Metric Labels Description
silkweb_requests_total tier, status, domain Total fetch requests
silkweb_request_duration_seconds tier, domain Fetch request duration
silkweb_llm_calls_total model, task Total LLM calls
silkweb_llm_duration_seconds model, task LLM call duration
silkweb_cache_hits_total layer Cache hits per layer
silkweb_blocks_total domain, challenge_type Bot detection blocks

Note

Requires the prometheus_client package. Silkweb imports it lazily — no overhead when metrics are disabled.

Replay mode

Save raw HTML and metadata on every fetch, then replay extraction without network calls — perfect for deterministic debugging and CI testing.

Recording

silkweb.configure(replay_dir="./silkweb_replays")

# All fetches are now saved to disk alongside normal operation
data = silkweb.ask("https://example.com", "products")

Replaying

# Replay extraction from saved HTML — no network calls
result = silkweb.replay("./silkweb_replays/session_2025-04-30.silkweb")

The JSON session file must include a non-empty html_path (basename next to the .silkweb file). Missing keys or missing HTML raise SilkwebError. This is not replay_session, which replays Playwright actions from ~/.silkweb/sessions/ (see Sessions).

Use cases

  • Debugging: reproduce extraction issues without hitting the live site
  • CI testing: run extraction tests against saved HTML fixtures
  • Development: iterate on extraction prompts/schemas offline
  • Auditing: keep a record of what HTML was seen and what was extracted