RAG data ingestion tool

Turn a single URL into a dataset of thousands of Markdown pages.

Perfect for RAG, machine learning training sets, and migrations.

Respects robots.txt Handles JavaScript Webhook notifications

Trusted by teams worldwide

From URL to Dataset.

We handle queue management, concurrency, and rate limiting so you don't have to.

Initiate Crawl

Send a POST request with the start URL and max_pages limit. Receive a Job ID immediately.

Recursive Spidering

Our engine visits pages, discovers links, and recursively processes the domain in parallel.

Retrieve Pages

Use the Job ID to fetch the list of processed pages containing full Markdown or HTML content.

POST /v1/crawls

"start_url": "stripe.com"

"max_pages": 100

"webhook": "https://..."

// Response
"id": "crawl_abc123"

Built for bulk.

Designed to ingest knowledge bases and massive documentation sites.

RAG Knowledge Bases

Ingest entire help centers and documentation sites to ground your AI models with truth.

SEO Audits

Crawl every page of a domain to analyze meta tags, broken links, and content structure.

Content Migration

Download all legacy content as clean Markdown to migrate to a new CMS effortlessly.

Offline Datasets

Create offline mirrors of websites for training or analysis without active internet connection.

Market Intelligence

Monitor entire competitor websites for new pages, products, or pricing changes.

Link Graph Analysis

Map internal linking structures across thousands of pages to optimize site architecture.

Usage based pricing

Pricing that Makes Sense

Crawls are billed per successful page. No extra fees for bandwidth or proxies.

Free

COST/500 $0

No credit card required.

500 successful requests
JS rendering + Residential IPs
LLM Extraction available

Starter

COST/1K $1.800

per month

5000 successful requests/month
Everything in Free Plan
150 concurrent requests

Standard

COST/1K $0.495

$99 USD

per month

200K successful requests/month
Everything in Starter Plan
500 concurrent requests

Scale

COST/1K $0.399

$399 USD

per month

1 Million successful requests/month
Everything in Standard Plan
AI-powered Browser Automations

Frequently asked questions

Everything you need to know about AI-powered scraping.

General

What is a Crawl?

A Crawl is a process that starts at a specific URL and recursively visits all links found on that page (and subsequent pages) up to a defined depth or limit. It's used to download the content of entire websites.

What is the difference between Scrape and Crawl?

Scraping extracts data from a single URL. Crawling explores a website to find and process multiple URLs automatically.

Does Olostep handle JavaScript rendering?

Yes. Our crawlers use headless browsers to render JavaScript, ensuring we capture content on SPAs (Single Page Applications) like React or Next.js sites.

Technical

Does the crawler respect robots.txt?

Yes, by default our crawlers respect `robots.txt` rules. You can override this behavior by setting `follow_robots_txt: false` in your request if you have permission to crawl the site.

How do I get notified when a crawl finishes?

You can provide a `webhook` URL in your POST request. We will send a POST request to your webhook with the results when the crawl completes.

Can I limit the crawl?

Yes, you can set `max_pages` to limit the number of pages processed, and `max_depth` to limit how many clicks away from the start URL the crawler should go.

How do I filter URLs?

You can use `include_urls` and `exclude_urls` parameters with glob patterns (e.g., `/blog/**`) to control exactly which parts of the site are crawled.

Billing

How much does a crawl cost?

Billing is based on the number of successful requests (pages crawled). If a crawl processes 100 pages, it counts as 100 requests against your plan.

Do failed pages count towards my limit?

No, we only charge for successfully processed pages.

Is there a free tier?

Yes, you get 500 free credits upon signup, which is enough to crawl small websites or test the API.

Start crawling today.

500 credits to try it for free — no credit card required.