RAG data ingestion tool

Turn a single URL into a dataset of thousands of Markdown pages.

Perfect for RAG, machine learning training sets, and migrations.

Respects robots.txt Handles JavaScript Webhook notifications

Trusted by teams worldwide

Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear

From URL to Dataset.

We handle queue management, concurrency, and rate limiting so you don't have to.

Initiate Crawl

Send a POST request with the start URL and max_pages limit. Receive a Job ID immediately.

Recursive Spidering

Our engine visits pages, discovers links, and recursively processes the domain in parallel.

Retrieve Pages

Use the Job ID to fetch the list of processed pages containing full Markdown or HTML content.

POST /v1/crawls
"start_url": "stripe.com"
"max_pages": 100
"webhook": "https://..."
// Response
"id": "crawl_abc123"

Built for bulk.

Designed to ingest knowledge bases and massive documentation sites.

RAG Knowledge Bases

Ingest entire help centers and documentation sites to ground your AI models with truth.

SEO Audits

Crawl every page of a domain to analyze meta tags, broken links, and content structure.

Content Migration

Download all legacy content as clean Markdown to migrate to a new CMS effortlessly.

Offline Datasets

Create offline mirrors of websites for training or analysis without active internet connection.

Market Intelligence

Monitor entire competitor websites for new pages, products, or pricing changes.

Link Graph Analysis

Map internal linking structures across thousands of pages to optimize site architecture.

Usage based pricing

Pricing that Makes Sense

Crawls are billed per successful page. No extra fees for bandwidth or proxies.

Free

COST/500 $0
$0

No credit card required.

  • 500 successful requests
  • JS rendering + Residential IPs
  • LLM Extraction available

Starter

COST/1K $1.800
$9

per month

  • 5000 successful requests/month
  • Everything in Free Plan
  • 150 concurrent requests

Standard

COST/1K $0.495
$99 USD

per month

  • 200K successful requests/month
  • Everything in Starter Plan
  • 500 concurrent requests

Scale

COST/1K $0.399
$399 USD

per month

  • 1 Million successful requests/month
  • Everything in Standard Plan
  • AI-powered Browser Automations

Frequently asked questions

Everything you need to know about AI-powered scraping.

General

What is a Crawl?

A Crawl is a process that starts at a specific URL and recursively visits all links found on that page (and subsequent pages) up to a defined depth or limit. It's used to download the content of entire websites.

What is the difference between Scrape and Crawl?

Scraping extracts data from a single URL. Crawling explores a website to find and process multiple URLs automatically.

Does Olostep handle JavaScript rendering?

Yes. Our crawlers use headless browsers to render JavaScript, ensuring we capture content on SPAs (Single Page Applications) like React or Next.js sites.

Technical

Does the crawler respect robots.txt?

Yes, by default our crawlers respect `robots.txt` rules. You can override this behavior by setting `follow_robots_txt: false` in your request if you have permission to crawl the site.

How do I get notified when a crawl finishes?

You can provide a `webhook` URL in your POST request. We will send a POST request to your webhook with the results when the crawl completes.

Can I limit the crawl?

Yes, you can set `max_pages` to limit the number of pages processed, and `max_depth` to limit how many clicks away from the start URL the crawler should go.

How do I filter URLs?

You can use `include_urls` and `exclude_urls` parameters with glob patterns (e.g., `/blog/**`) to control exactly which parts of the site are crawled.

Billing

How much does a crawl cost?

Billing is based on the number of successful requests (pages crawled). If a crawl processes 100 pages, it counts as 100 requests against your plan.

Do failed pages count towards my limit?

No, we only charge for successfully processed pages.

Is there a free tier?

Yes, you get 500 free credits upon signup, which is enough to crawl small websites or test the API.

Start crawling today.

500 credits to try it for free — no credit card required.