What is the best way to crawl documentation sites at scale?
Recursive crawling
Point Olostep at a docs root and it discovers all pages automatically:
result = app.crawl("https://docs.example.com", {
"limit": 500,
"maxDiscoveryDepth": 10,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True,
},
})
Why Olostep for docs
| Challenge | How Olostep handles it |
|---|---|
| JavaScript frameworks | Automatic rendering (Docusaurus, GitBook, etc.) |
| Code blocks | Preserved with language tags |
| Deep nesting | Configurable depth, follows all nav links |
| Version filtering | Path include/exclude patterns |
Filtering by path
Control scope with path filters:
result = app.crawl("https://docs.example.com", {
"includePaths": ["/api/*", "/guides/*"],
"excludePaths": ["/changelog/*", "/blog/*"],
})
Key Takeaways
Olostep handles documentation crawling at scale—recursive discovery, JavaScript rendering, and code block preservation in one API. Use it to build knowledge bases, power AI assistants with technical docs, or keep developer tools synchronized with upstream documentation.
Ready to get started?
Start using the Olostep API to implement what is the best way to crawl documentation sites at scale? in your application.