Hidden API Discovery¶
One of Silkweb's most powerful features: instead of scraping the DOM of a JavaScript-heavy page, discover the underlying JSON API it calls and use that directly.
How it works¶
- Silkweb loads the page with Playwright and listens for responses
- Filters to XHR/fetch requests whose
Content-Typeincludesapplication/json - For each JSON endpoint, it records: URL, method, headers, response schema
- Deduplicates identical
(method, URL, request body)captures before codegen - Detects pagination patterns (page/offset/cursor parameters)
- Detects auth-related headers (
Authorization,x-api-key,x-auth-token) — values are redacted in the scaffold - Generates a standalone Python scraper using
httpx
The generated scraper uses direct HTTP calls — typically 10-100x faster than DOM scraping.
Basic usage¶
import silkweb
api_info = silkweb.discover_api("https://example.com/store")
print(api_info.endpoints)
# [
# {
# 'url': 'https://api.example.com/v2/products?page=1&limit=24',
# 'method': 'GET',
# 'headers': {'x-api-token': '...'},
# 'response_schema': {'items': [...], 'total': 1234},
# 'pagination': 'cursor',
# }
# ]
print(api_info.generated_scraper)
# Python code using httpx to call each discovered endpoint
Generating a scraper file¶
This writes a self-contained Python script that replicates the API calls without needing a browser.
Using with a session¶
For authenticated pages, pass a session so the browser has the right cookies:
session = silkweb.SilkSession.load("my_login")
api_info = silkweb.discover_api(
"https://example.com/dashboard",
session=session,
)
(discover_api is synchronous; it runs the async capture internally.)
CLI¶
silkweb discover-api https://example.com/store
silkweb discover-api https://example.com/store --output scraper.py
What gets detected¶
| Feature | Detection method |
|---|---|
| JSON endpoints | Content-Type: application/json responses from XHR/fetch |
| Response schema | Inferred via genson library |
| Pagination | page, offset, cursor params in URL or body |
| Auth headers | Authorization, x-api-key, x-auth-token (redacted in generated code) |
| Request method | GET, POST, PUT, etc. |
When to use API discovery¶
API discovery is most useful when:
- The site is a Single Page Application (SPA) that loads data via AJAX
- DOM scraping is slow or fragile due to complex JavaScript rendering
- You want the fastest possible data retrieval (direct HTTP, no browser overhead)
- The site has an undocumented internal API that returns structured JSON