Olostep Batch endpoint

Process thousands of URLs in one job.

The Batch endpoint is built for scale: submit many URLs, wait for completion, then retrieve structured content per item. Optionally add a webhook for push notifications and metadata to tag batches and rows — see the API reference.

The Batch API processes up to 10k URLs in one job (~5–8 minutes). Start a batch, poll until completed, list items with cursor, then fetch each page's content via /v1/retrieve. On create you can pass a webhook for a push when the job finishes, and metadata on the batch or each item — see Create batch in the API reference.

How it works

POST items (and optional parser).

Poll until completed — or receive a webhook when the job finishes.

List items and retrieve each URL's content.

Trusted by teams worldwide

Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear
Merchkit
Podqi
Khoj
Finny AI
Contents
Athena HQ
CivilGrid
GumLoop
Plots
Uman
Verisave
Relay
OpenMart
Profound
Centralize
Use Bear

From URL list to retrieved content

One batch job instead of hand-rolling thousands of scrape calls.

Step 1

Submit items

POST an array of items with custom_id and url — optionally a batch-level parser for structured JSON.

Step 2

Wait for completion

Poll GET /v1/batches/{id} until completed — or pass a webhook URL on create to get an HTTP POST when the batch finishes (with retries).

Step 3

List & retrieve

GET /v1/batches/{id}/items with cursor; then /v1/retrieve per retrieve_id for markdown or json.

POST /v1/batches

{
  "items": [
    { "custom_id": "a1", "url": "https://…",
      "metadata": { "sku": "SKU-1" } },
    { "custom_id": "a2", "url": "https://…" }
  ],
  "parser": { "id": "@olostep/google-search" },
  "metadata": { "job": "nightly-prices" },
  "webhook": "https://api.you.com/olostep"
}

Built for large lists

The /v1/batches API complements scrapes for high-volume workflows.

High throughput

Process large URL lists in one job instead of thousands of separate requests.

Predictable runtime

Jobs complete in roughly 5–8 minutes regardless of batch size (within limits).

Parser-ready

Attach parsers to return normalized JSON across all items.

Cursor pagination

Walk items with cursor and limit while the batch runs or after it finishes.

Completion webhooks

Skip tight polling: pass webhook on create for a POST when the batch completes, with automatic retries on failed delivery.

Batch & item metadata

Attach Stripe-style key-value metadata on the whole batch or on each item for tracing, filtering, and joins in your stack.

Webhooks & metadata

First-class options on POST /v1/batches so production pipelines don't rely on polling alone and can carry your own context through every job.

Webhooks

Pass webhook with a public HTTPS URL (not localhost). When the batch completes, Olostep sends an HTTP POST with a structured event payload. Failed deliveries are retried with backoff (multiple attempts over about 30 minutes). Respond with 2xx within 30 seconds and use the event id to handle duplicates safely.

Webhooks API reference

Metadata

Add metadata at batch level (same request as items) and/or on each item for row-level tags — ideal for project IDs, pipeline stage, or correlating results with your warehouse. Keys and values follow documented limits (Stripe-style). Metadata is returned on subsequent GET responses; you can also merge-update batch metadata via PATCH.

Metadata API reference

What you can build

Monitoring, enrichment, and training-data collection at scale.

SERP & monitoring

Run many search or listing URLs through a parser in one batch.

Catalog ingestion

Feed product or directory URLs at scale.

Data pipelines

Hand off nightly jobs from your warehouse or orchestrator.

Scale extraction

Start a batch. Poll or webhook. Retrieve items.

Parser for JSON, cursor on items, plus optional webhook and batch/item metadata — see Create batch in the API reference.

RequestPOST /v1/batches
{
  "items": [
    {
      "custom_id": "item-1",
      "url": "https://www.google.com/search?q=stripe&gl=us&hl=en",
      "metadata": { "campaign": "brand-track" }
    },
    {
      "custom_id": "item-2",
      "url": "https://www.google.com/search?q=paddle&gl=us&hl=en"
    }
  ],
  "parser": { "id": "@olostep/google-search" },
  "metadata": { "batch_name": "weekly-serp" },
  "webhook": "https://your-server.com/webhooks/olostep"
}
Response200 OK
{
  "id": "batch_z7n7hwh45x",
  "object": "batch",
  "status": "in_progress",
  "total_urls": 2,
  "completed_urls": 0,
  "parser": "@olostep/google-search",
  "metadata": { "batch_name": "weekly-serp" },
  "webhook": "https://your-server.com/webhooks/olostep"
}

Frequently asked questions

Everything you need to know about the Batch endpoint.

When should I use Batch vs parallel scrapes?

Batch is optimized for hundreds to tens of thousands of URLs with roughly constant wall-clock time. For small sets of URLs, parallel /v1/scrapes calls are often faster to complete.

Can I use Batch for e-commerce price tracking at huge scale?

Yes. Teams track millions of product pages by running many parallel batches (each within your account limits). Attach a batch-level parser so you only pull the JSON you need—price, reviews, seller, shipping time, availability—instead of full HTML, then join everything back using custom_id.

What about building an AI vertical search engine over millions of URLs and PDFs?

Batch fits large ingestion pipelines: submit huge URL lists (including pages that link to PDFs or docs), wait for the job to complete in predictable time, then list items with cursor and retrieve structured or markdown content per URL via /v1/retrieve. Scale out by splitting domains or cohorts across batches.

How does Batch help with SEO or GEO (tracking prompts on LLMs)?

For generative-engine optimization, you can batch the URLs or search results you care about—branded queries, competitor SERPs, citation pages—and use parsers to normalize titles, snippets, and structured fields. That makes it practical to monitor how brands and topics show up across many prompts and pages, not just one-off scrapes.

What is custom_id?

Your stable identifier per row so you can join batch results back to your internal systems.

How do I get structured JSON?

Pass a parser on the batch (e.g. google-search parser) so items return JSON you can retrieve via /v1/retrieve.

How do batch webhooks work?

Include a public HTTPS webhook URL when you create the batch. Olostep POSTs a completion event (e.g. batch.completed) when processing finishes, with item counts and batch id in the payload. Failed deliveries retry with exponential backoff (up to 5 attempts over ~30 minutes). Return 2xx quickly and dedupe using the event id — see the webhooks docs for the envelope shape and best practices.

What is batch and item metadata for?

Metadata is optional key-value data (batch-level on the request body, or per item inside each object in items). Use it to tag jobs with project names, internal IDs, pipeline stage, or anything you need to correlate batches in your own systems. Values follow Stripe-style limits (e.g. max 50 keys). You can also merge-update metadata later via PATCH on the batch. See the metadata docs for validation rules.

Are there account limits?

New accounts may have a per-batch item limit; contact Olostep to raise it. See the docs warning for details.

Start batching URLs

500 free credits to try — no credit card required.