Olostep Scrape endpoint

Turn any URL into clean, LLM-ready data.

The Scrape endpoint extracts markdown, HTML, text, JSON, screenshots, and more — in real time.

How it works

POST url_to_scrape and formats.

Olostep loads the page (with optional actions or parsers).

Read structured fields from the scrape object in the response.

Trusted by teams worldwide

From URL to extracted content

One HTTP call — no headless browser fleet to run yourself.

Step 1

Send a URL and formats

POST the page you want plus formats: markdown, html, text, json, screenshot, and more.

Step 2

Olostep renders the page

Handles dynamic sites, actions, PDFs, and optional parsers or LLM extraction for structured JSON.

Step 3

Get clean content back

Receive markdown_content, html_content, json_content, hosted URLs, links_on_page, and metadata.

POST /v1/scrapes

{
  "url_to_scrape": "https://en.wikipedia.org/wiki/Alexander_the_Great",
  "formats": ["markdown", "html"]
}

Built for production scraping

The /v1/scrapes API is designed for apps that need reliable page extraction at scale.

Multiple formats

Request markdown, html, text, json, raw_pdf, or screenshot in one call.

Parsers & LLM extract

Use pre-built parsers for popular sites or define a schema for LLM-powered extraction.

Hosted assets

Get hosted URLs for large payloads alongside inline content fields.

Developer-friendly

One POST with url_to_scrape and formats — integrate in minutes.

What you can build

Power enrichment, monitoring, and AI workflows from live web pages.

RAG & knowledge bases

Chunk clean markdown or JSON into embeddings for search and Q&A.

Price & catalog monitoring

Scrape product pages on a schedule and diff structured fields.

Lead & enrichment

Pull structured contact or company data from landing pages.

One API call. Real content.

Scrape a page. Choose your formats.

Mandatory: url_to_scrape and formats. Add parsers or llm_extract when you need structured JSON.

RequestPOST /v1/scrapes

{
  "url_to_scrape": "https://en.wikipedia.org/wiki/Alexander_the_Great",
  "formats": ["markdown", "html"]
}

Response200 OK

{
  "id": "scrape_…",
  "object": "scrape",
  "url_to_scrape": "https://en.wikipedia.org/wiki/…",
  "result": {
    "markdown_content": "## Alexander the Great…",
    "html_content": "<html …>",
    "json_content": null,
    "page_metadata": { "status_code": 200, "title": "…" }
  }
}

Frequently asked questions

Product & Capabilities

What is the /v1/scrapes endpoint?

It turns any public URL into data you can feed to LLMs or pipelines: markdown, HTML, text, screenshots, or structured JSON via parsers or llm_extract.

How do I get structured JSON?

Use formats: ["json"] with a parser id, or llm_extract with a JSON schema and/or prompt. See the Scrape docs for examples.

Usage & Automation

Does it support JavaScript-heavy sites?

Yes. Olostep can render dynamic pages and supports actions (wait, click, fill_input, scroll) before extraction.

What does a scrape cost?

A standard scrape is 1 credit. Parsers vary (often 1–5 credits). LLM extraction is 20 credits per request. New accounts include free credits to try.

Pricing & Plans

Can I exclude parts of the page?

Yes. Use parameters like remove_css_selectors and other scrape options to focus on the content you need.

Start scraping with Olostep

500 free credits to try
No credit card required.

Get started free Read the docs