SilkPage¶
The central object returned by every fetch operation.
SilkPage
¶
SilkPage(html: str, *, url: str = '', status: int = 200, headers: dict[str, str] | None = None, metadata: dict[str, Any] | None = None, fetch_tier: int = 0)
Source code in silkweb/parse/page.py
xpath
¶
Run an XPath expression against the page root.
kind="elements"returnsSilkElementwrappers (default). Use for node paths.kind="values"returns raw values (e.g.//@href,/text()), not elements.
Source code in silkweb/parse/page.py
links
¶
Return all links as absolute URLs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
external
|
bool | None
|
None returns all links, True returns only external, False returns only internal (same-domain). |
None
|
Source code in silkweb/parse/page.py
network_requests
¶
Return captured network events (browser tiers only, when enabled).
This is populated by tier 2/3 fetchers when capture_network=True.
Source code in silkweb/parse/page.py
hydration_source
¶
Which hydration script produced JSON, if any (before JSON parse).
Source code in silkweb/parse/page.py
detect_records
¶
Heuristic repeated-record detection (no LLM).
For now: find the most repeated (tag, class) among elements under
, and turn each into a small record dict.Source code in silkweb/parse/page.py
SilkElement
¶
SilkMeta
dataclass
¶
SilkMeta(url: str, fetched_at: datetime, fetch_tier: int, xpath: str, llm_model: str | None = None, selector_from_cache: bool | None = None, confidence: float | None = None)