Cache¶
Silkweb uses a three-layer caching system to minimize network requests and LLM calls.
Selector cache key (how reuse works)¶
The selector cache stores synthesized selector sets under a key derived from:
- domain: the URL hostname (e.g.
books.toscrape.com) - DOM skeleton hash: a stable fingerprint of the page’s tag nesting (ignores text/attributes)
- schema signature: a signature of the schema’s field names/types, so selectors compiled for one schema aren’t reused for a different schema
Cache Manager¶
CacheManager
dataclass
¶
HTTP Cache (Layer 1)¶
HttpCache
dataclass
¶
HttpCache(enabled: bool = True, backend: HttpBackend = 'sqlite', ttl_s: float | None = None, max_size_bytes: int | None = None, redis_url: str | None = None, sqlite_path: str | None = None)
HTTP cache via hishel for httpx.
Notes:
- Conditional GET (ETag/Last-Modified) is handled by hishel.
- TTL is implemented through AsyncSqliteStorage(default_ttl=...).
- max_size is best-effort; currently not enforced by hishel storage directly.
Rendered Page Cache (Layer 2)¶
RenderedPageCache
dataclass
¶
RenderedPageCache(backend: PageBackend = 'sqlite', sqlite_path: str | None = None, ttl_seconds: int | None = None, redis_url: str | None = None, _mem_pages: dict[tuple[str, str], dict[str, Any]] | None = None, _mem_last: dict[str, str] | None = None, _mem_timestamps: dict[tuple[str, str], datetime] | None = None)
Selector Cache (Layer 3)¶
SelectorCache
¶
dom_skeleton_hash
¶
Hash of DOM "skeleton": tag names + nesting only (no attrs, no text).
This is designed to be stable across content changes for the same template.