Changelog

Extract Now, Analyze Later

Most web data APIs treat results as ephemeral. You make a request, get your response, and the data disappears within hours. That works for real-time lookups — but it falls apart the moment you need that data again.

We kept hearing the same thing from teams: "We want to analyze long-horizon web changes without having to think of storage as separate infrastructure"

Today we're introducing the storage parameter — a simple way to persist your extraction results for as long as you need them, and to keep full control over your data lifecycle.

Infrastructure for the Web's second user

How it works

Pass a storage object in your request to control retention:

{
  "url": "https://example.com/product/reviews",
  "formats": ["markdown"],
  "storage": {
    "expires_in": "90d"
  }
}

Supported values: 10d, 30d, 60d, 90d, 180d, 365d, or never for indefinite retention. Without the parameter, the default ~7 day retention applies.

No extra setup. No separate storage service. Your data stays accessible through the same API you already use.

Your data as an asset

The real value of web data rarely comes from a single extraction. It comes from accumulation — having weeks, months, or quarters of structured data that you can query, compare, and analyze.

Data analytics over time. When you persist your results, you're not just storing pages — you're building a dataset. Track how product reviews shift after a launch, how pricing evolves across a market, how job listings signal a competitor's strategy. Persistent storage turns isolated extractions into time-series data that compounds in value.

Full control over your data lifecycle. You decide what stays, for how long, and when it goes. No more worrying about data vanishing before your team has processed it. Whether you need 30-day windows for operational workflows or indefinite retention for long-term analysis — you set the rules. Your data, your timeline.

Compliance and auditability. Certain industries — fintech, healthcare, insurance — require you to retain evidence of what data you collected and when. Whether it's for GDPR right-of-access requests, financial auditing, or legal discovery, "expires_in": "never" gives you a timestamped record of exactly what was on the page at the time of extraction.

Reproducible datasets for AI/ML. If you're building AI models — classifiers, summarizers, extraction pipelines — you need reproducible training data. Losing the raw data a week later means you can't re-run your pipeline, debug regressions, or compare outputs across iterations. Persistent storage turns your extractions into a reusable, versioned corpus.

Debugging and pipeline reliability. When a downstream pipeline breaks, the first question is always: "What did the source data look like?" If your results are already gone, you're guessing. Persistent storage gives your team a concrete artifact to debug against — the exact content that your pipeline consumed.

What we're building next

Cold storage tiers. We're working on an option to move older data to AWS Glacier (or similar cold storage systems). This will let you retain data for much longer periods — years, not months — at a fraction of the cost. Ideal for compliance archives, historical datasets, and long-term trend analysis.

A query language for your data. We're building a SQL-like query interface that will let you run analytical queries directly on your stored extractions — or on specific subsets of them. Filter, aggregate, and analyze without exporting anything. Your stored data becomes a queryable database.

Availability

The storage parameter is available today on the /scrapes endpoint across all plans. We're actively working on bringing it to /answers, /maps, /crawls, and /batches — so you'll soon have the same retention controls across every way you extract data with Olostep.

Check out the API reference for full details, or just add "storage": { "expires_in": "30d" } to your next request and try it out.

  • New storage parameter with expires_in values: 10d, 30d, 60d, 90d, 180d, 365d, or never for indefinite retention
  • Available today on the /scrapes endpoint across all plans; default ~7 day retention when omitted
  • Coming soon to /answers, /maps, /crawls, and /batches
  • Cold storage tiers for long-term, low-cost archival (in development)
  • SQL-like query interface for running analytics directly on stored extractions (in development)

Real-time Web Search API

The Web Search API is now generally available. Send a query and get back clean, structured search results with citations in a single call, designed for grounding agents and RAG pipelines with live web context.

Let your agents search, answer and explore

Results come back as normalized JSON with titles, URLs, snippets, and ranked positions, so you can feed them straight into a model or a downstream parser without any scraping glue code.

  • New /v1/searches endpoint returning normalized, ranked results with snippets and source URLs
  • Optional mode to retrieve content from results
  • Region and language targeting for localized queries