Silkweb¶

The LLM-native Python web scraping library. Fetch anything. Extract everything. No selectors required.

Three lines. Any website. Structured data.¶

Ask a question, get a table¶

import silkweb

stories = silkweb.ask("https://news.ycombinator.com", "top 10 stories with title, score, author")
# [{'title': 'Show HN: ...', 'score': 312, 'author': 'pg'}, ...]

Typed extraction with Pydantic¶

from pydantic import BaseModel
from silkweb import extract

class Product(BaseModel):
    name: str
    price: float
    rating: float

products = extract("https://books.toscrape.com", schema=Product, prompt="all books")
# [Product(name='A Light in the Attic', price=51.77, rating=3.0), ...]

SilkQL: a query language for the web¶

import silkweb

results = silkweb.query("https://github.com/trending", """
{
    repos[] {
        name
        author
        stars(int)
        language
        description(optional)
    }
}
""")

Why Silkweb?¶

Capability	Traditional approach	Silkweb
Fetch a page	`requests.get(url)`	`silkweb.fetch(url)` — auto-selects HTTP, stealth HTTP, or browser
Parse data	Write CSS/XPath selectors	Describe what you want in plain English
Handle JS	Manually configure Playwright	Automatic, transparent escalation
Bypass Cloudflare	Multiple plugins, trial and error	Built-in auto-escalating tiers
LLM extraction	No support	First-class, runs locally with Ollama
Output typing	Manual Pydantic boilerplate	Schema inferred or user-provided
Cache LLM calls	Not applicable	Synthesized selectors persist; repeat visits can reuse cached selectors when the layout still matches

The key insight¶

When Silkweb first encounters a page template, it uses an LLM to understand the structure and synthesize robust CSS/XPath selectors. Those selectors are cached (keyed by domain, a structural skeleton hash, and your schema fields). When a later page matches that cache entry, extraction can skip LLM work and run selector-based extraction instead. If the layout drifts or the cache misses, the pipeline may call an LLM again.

Installation¶

MinimalWith browserFull install

pip install silkweb

pip install "silkweb[browser]"
playwright install chromium

pip install "silkweb[all]"

What's next?¶

Quick Start

Go from pip install to your first extraction in 5 minutes.

Quick Start
Fetcher Tiers

Learn how Silkweb auto-escalates from HTTP to stealth browser.

Fetcher Tiers
LLM Extraction

Understand the clean → schema → extract → cache pipeline.

LLM Extraction
SilkQL

Write structured queries for the web.

SilkQL