Olostep Crawl endpoint
Trusted by teams worldwide




Orchestrate multi-page extraction without building your own crawler.
Step 1
POST start_url with include_urls, exclude_urls, max_pages, and optional max_depth.
Step 2
Poll GET /v1/crawls/{id} until status is completed, or pass webhook_url for a push notification.
Step 3
Paginate GET /v1/crawls/{id}/pages, then fetch each retrieve_id via /v1/retrieve for markdown or HTML.
POST /v1/crawls
{
"start_url": "https://example.com",
"max_pages": 100,
"include_urls": ["/**"],
"exclude_urls": ["/admin/**"]
}The /v1/crawls API fits between one-off scrapes and huge batch URL lists.
Control depth and max_pages so crawls stay predictable.
Optional webhook_url when the crawl completes.
Stream pages while the crawl is in progress or after completion.
Pair crawls with /v1/retrieve for markdown, html, or json formats.
Mirror sites, build datasets, and power research agents.
Pull every docs page for offline search or RAG.
Crawl product sections with include patterns.
Snapshot many pages under a domain with audit metadata.
Multi-page extraction
Use webhooks for completion signals; paginate pages with cursor and limit.
{
"start_url": "https://sugarbooandco.com",
"max_pages": 100,
"include_urls": ["/**"],
"exclude_urls": ["/collections/**"],
"include_external": false
}{
"id": "crawl_…",
"object": "crawl",
"status": "in_progress",
"start_url": "https://sugarbooandco.com",
"pages_count": 0
}Everything you need to know about the Crawl endpoint.