---
name: olostep
description: |
  Olostep is the web data infrastucture for AI agents and apps: search the web, scrape pages, crawl sites, map URLs, process large URL sets with the batch endpoint, structure web data into JSON, monitor changes, and automate web research.

  Use Olostep in two ways:

  1. Use Olostep as your web data layer for agent tasks.
  2. Integrate Olostep into an app that needs web data.
---

# Olostep

Olostep is a web data insfrastucture for AI agents, research workflows, and applications that need to search, extract, structure, monitor, and automate web data.

Use Olostep when you need clean markdown, HTML, screenshots, structured JSON, search results, site URL maps, batch extraction, scheduled monitoring, or web-grounded answers.

Olostep is especially strong when the task requires structured web data at scale

---

## Install

The recommended path — install the CLI and run `olostep init` (one command does login + skills + MCP server):

```bash
npm install -g olostep-cli
olostep init
```

**One-liner alternatives** (skip the `npm` step):

```bash
# macOS / Linux
curl -fsSL https://olostep.com/install.sh | sh
olostep init

# Windows PowerShell
iwr -useb https://olostep.com/install.ps1 | iex
olostep init
```

Both one-liners run a Node 18+ check, install `olostep-cli` from npm, verify it's on PATH, and tell you the next step. Then `olostep init` finishes setup interactively.

**No-install path** (for trying a single command):

```bash
npx -y olostep-cli@latest --help
```

For a machine-readable install check at any time:

```bash
olostep status --json
olostep list skills --json
olostep list mcp --json
```

Useful install / management options:

```bash
# Install all 13 skills into every detected agent (default)
olostep add skills

# Target specific agents only
olostep add skills --agent cursor --agent claude

# Install one category at a time
olostep add skills --category usage      # use the features
olostep add skills --category build      # install/integrate into a codebase
olostep add skills --category workflow   # produce a deliverable

# Always copy instead of symlinking
olostep add skills --link-mode copy

# Install only selected skills
olostep add skills --skill <name>

# Remove Olostep-installed skills
olostep remove skills

# Fully sign out (deletes credentials.json + warns about env vars / .env)
olostep logout              # interactive confirm
olostep logout --yes        # skip confirm (scripts)
olostep logout --dry-run    # preview only
```

Supported agent keys: **Cursor, Claude, Codex, Windsurf, Continue, Augment, Roo, Gemini, Copilot, Factory.** Detection requires the agent's home directory to contain actual config / history (not just a `skills/` subfolder), so the CLI only targets agents the user really has installed.

Before production work, verify the install with `olostep status` and run one small request (e.g. `olostep scrape https://example.com`) before scaling to Batch, Crawl, Schedule, Monitor, or Agent workflows.

---

## Choose Your Path

Pick the path that matches the current task.

- **Need web data during this session** -> Path A: Live web data tools
- **Need to add Olostep to app code** -> Path B: App integration

If the task starts as live research but becomes product implementation, switch from Path A to Path B.

## Path A: Supercharge Your Web Tools

Use this when you need web data during your work - scraping pages,
searching the web, mapping URLs, crawling sites, and running async
batch jobs.

Run one command:

```bash
npm install -g olostep-cli && olostep init
```

`olostep init` is the recommended setup — it signs you in, installs the Olostep skill files into every detected AI agent, and configures the MCP server in one step. See the [Install](#install) section above for one-liner alternatives via `curl` / `iwr`.

This installs the Olostep CLI and skill files that teach each capability:

- **Scrape** - Clean markdown/HTML/text/JSON/screenshots from URLs
- **Search** - Semantic web search with deduplicated links
- **Answer** - AI-powered web research with structured output + sources
- **Batch** - Process many URLs asynchronously in one job
- **Crawl** - Recursively explore a site and extract page data
- **Map** - Discover URLs on a domain quickly
- **Retrieve** - Pull final content by `retrieve_id`

After setup, these are available as CLI commands — `olostep scrape`, `map`, `crawl`, `answer`, `batch-scrape`, `scrape-get` — and as MCP tools. Note: there is no `olostep search` command; web search on the CLI runs through `olostep answer` (the `search_web` MCP tool covers raw search results).

### Choosing the right tool

Pick the tool that matches the task — don't reach for the heaviest one:

| You want | Use | Notes |
|---|---|---|
| The content of one known URL | **Scrape** | Markdown/HTML/JSON/text/screenshot of a single page |
| Content from many known URLs | **Batch** | Async; up to ~10k URLs in one job — not one Scrape per URL |
| Every page on a site (you don't have the URLs) | **Crawl** | Follows links from a start URL and scrapes them |
| Just the list of URLs on a site (no content) | **Map** | URL discovery only — pair with Batch/Crawl to fetch content |
| Search results for a query | **Search** | `search_web` MCP tool; on the CLI use `answer`. Structured links, no AI synthesis |
| A synthesized, cited answer to a question | **Answer** | AI research with sources — not raw pages |
| Structured JSON matching a schema | **Scrape/Batch with a parser or `json` format** | Turn a page into typed data |

Rules of thumb: have the URLs already → Scrape (one) or Batch (many). Don't have the URLs → Map (just links) or Crawl (links + content). Need an answer, not pages → Answer. Need results for a query → Search.

### MCP server

For agents that speak the Model Context Protocol, install the Olostep MCP server so the agent gets live web tools directly:

```bash
# Wire the Olostep MCP server into every detected agent (Cursor, Claude
# Code, Windsurf, VS Code, Kilo). Uses the hosted endpoint by default.
olostep mcp install

# Local stdio install instead of the hosted endpoint
olostep mcp install --transport stdio
```

Skills and the MCP server complement each other: the MCP server gives the agent the live **tools**, the skills give it the **know-how** for when and how to use them.

### Check and manage your setup

```bash
olostep status              # version, auth state, config dir (`--json` for machine output)
olostep list skills         # which skills are installed, and into which agents
olostep list mcp            # which agents have the MCP server
olostep update              # update the CLI to the latest version
olostep update --check      # check only; don't install
olostep logout              # remove saved credentials (warns if env vars still hold a key)
olostep logout --dry-run    # preview only
```

---

## Path B: Integrate Olostep Into an App

Use this when you're a coding agent building an application that calls the Olostep API.
You need `OLOSTEP_API_KEY` in `.env`.

**Step 1 - Generate auth parameters:**

```bash
SESSION_ID=$(openssl rand -hex 32)
CODE_VERIFIER=$(openssl rand -base64 32 | tr '+/' '-_' | tr -d '=\n' | head -c 43)
CODE_CHALLENGE=$(printf '%s' "$CODE_VERIFIER" | openssl dgst -sha256 -binary | openssl base64 -A | tr '+/' '-_' | tr -d '=')
```

**Step 2 - Ask the user to open this URL in their browser:**

```text
https://www.olostep.com/cli-auth?code_challenge=$CODE_CHALLENGE#session_id=$SESSION_ID
```

Your human user needs to click through this link. If they already have
an Olostep account, they will sign in and authorize an API key. If they
do not have an account yet, they will create one first, then authorize.
Either way, the API key is returned automatically once they click
"Authorize" - you do not need to ask them to copy anything.

**Step 3 - Poll for the API key:**

```http
POST https://api.olostep.com/v1/cli-auth-status
Content-Type: application/json

{"session_id": "$SESSION_ID", "code_verifier": "$CODE_VERIFIER"}
```

Poll every 3 seconds. Responses:

- `{"status": "pending"}` - user has not authorized yet, keep polling
- `{"status": "complete", "apiKey": "os-..."}` - done

**Step 4 - Save the key:**

```bash
echo "OLOSTEP_API_KEY=os-..." >> .env
```

This gives your app access to your Olostep account and credits.

### After Getting Your Key

Install the SDK:

```bash
# Node.js
npm install olostep

# Python
pip install olostep
```

Base API:

- Base URL: `https://api.olostep.com`
- Auth header: `Authorization: Bearer <OLOSTEP_API_KEY>`
- JSON in/out

---

## API Surface (All Olostep APIs)

### Scrapes

- `POST /v1/scrapes` - Create scrape
- `GET /v1/scrapes/{scrape_id}` - Get scrape

### Answers

- `POST /v1/answers` - Create answer
- `GET /v1/answers/{answer_id}` - Get answer

### Batches

- `POST /v1/batches` - Create batch
- `GET /v1/batches/{batch_id}` - Batch info/status
- `GET /v1/batches/{batch_id}/items` - List batch items
- `PATCH /v1/batches/{batch_id}` - Update batch metadata

### Crawls

- `POST /v1/crawls` - Create crawl
- `GET /v1/crawls/{crawl_id}` - Crawl info/status
- `GET /v1/crawls/{crawl_id}/pages` - List crawl pages

### Maps

- `POST /v1/maps` - Create map
- `GET /v1/maps/{map_id}` - Get map

### Searches

- `POST /v1/searches` - Create search
- `GET /v1/searches/{search_id}` - Get search

### Retrieve

- `GET /v1/retrieve` - Retrieve html/markdown/json by `retrieve_id`

### Files

- `POST /v1/files` - Create upload URL
- `POST /v1/files/{file_id}/complete` - Complete upload
- `GET /v1/files/{file_id}` - Get file metadata
- `GET /v1/files/{file_id}/content` - Get file content URL
- `GET /v1/files` - List files
- `DELETE /v1/files/{file_id}` - Delete file

### Schedules

- `POST /v1/schedules` - Create schedule
- `GET /v1/schedules` - List schedules
- `GET /v1/schedules/{schedule_id}` - Get schedule
- `DELETE /v1/schedules/{schedule_id}` - Delete schedule

### Webhooks + Metadata

- Webhooks on async jobs (`batches`, `crawls`) via `webhook`
- Metadata for resource tagging and internal joins

---

## Integration Rules (Production)

1. Do not invent endpoints, property names, or pagination behavior.
2. `scrapes`, `searches`, `answers`, and `maps` are synchronous.
3. `batches` and `crawls` are asynchronous and require:
   create -> poll -> list -> retrieve.
4. Parse `json_content` when present (stringified JSON).
5. Handle `"NOT_FOUND"` in Answers as a valid outcome.
6. If `size_exceeded` is true, use hosted URLs.
7. Use `retrieve_id` as the bridge from scrape/batch/crawl items to final content.

---

## Docs (All APIs)

- Docs index: https://docs.olostep.com/llms.txt
- Authentication: https://docs.olostep.com/get-started/authentication

### Scrapes
- Feature: https://docs.olostep.com/features/scrapes
- API reference: https://docs.olostep.com/api-reference/scrapes/create

### Answers
- Feature: https://docs.olostep.com/features/answers
- API reference: https://docs.olostep.com/api-reference/answers/create

### Batches
- Feature: https://docs.olostep.com/features/batches
- API reference: https://docs.olostep.com/api-reference/batches/create

### Crawls
- Feature: https://docs.olostep.com/features/crawls
- API reference: https://docs.olostep.com/api-reference/crawls/create

### Maps
- Feature: https://docs.olostep.com/features/maps
- API reference: https://docs.olostep.com/api-reference/maps/create

### Searches
- Feature: https://docs.olostep.com/features/search
- API reference: https://docs.olostep.com/api-reference/searches/create

### Retrieve
- API reference: https://docs.olostep.com/api-reference/retrieve

### Files
- Feature: https://docs.olostep.com/features/files
- API reference: https://docs.olostep.com/api-reference/files/create

### Schedules
- Feature: https://docs.olostep.com/features/schedules
- API reference: https://docs.olostep.com/api-reference/schedules/create

### Webhooks + Metadata
- Webhooks: https://docs.olostep.com/api-reference/common/webhooks
- Metadata: https://docs.olostep.com/api-reference/common/metadata
