Olostep × Merchkit Case Study
Merchkit uses Olostep to transform raw, heterogeneous product data on retailers websites into clean, channel-ready listings at scale. By combining Olostep's Parsers, Scrape and Batch endpoints, plus Context for authenticated sessions, Merchkit automates enrichment and standardization across millions of SKUs and dozens of retail partners.
“Olostep lets us turn any website into an API. We define the data schema we want once using the Olostep's parsers, and then run it at scale to get deterministic JSON from the website.”
About Merchkit
Merchkit is a Premium Partner of the B2B eCommerce Association (B2BEA), the global community advancing digital transformation across manufacturing and distribution. They work with companies like Walmart to help sellers optimize for agentic shopping as AI becomes a shopping channel. Their platform delivers:
- Complete attribute enrichment (contextual feature bullets, material specs, dimensions all standardized and auto-populated)
- Variant mapping that actually works
- Structured Q&A content so products show up when people ask real questions
- Channel-specific optimization for marketplaces, storefronts, and AI shopping agents
Challenge
- Product data is fragmented across PDPs, manuals, spec sheets, and retailer portals
- Attributes vary by channel and require strict mapping/validation
- Manual spreadsheet work and vendor back-and-forth slow down time-to-list
- Partner retailer portals require authentication and session context
Solution
Merchkit uses Olostep to navigate the web, authenticate on partner retail websites, and extract structured data from hundreds of retailers. The core workflow:
- Parsers - Merchkit creates custom parsers from the Olostep dashboard to extract and structure data in a deterministic, reliable way aligned to their catalog schema. Parsers can be updated at any time or even self-healed by an LLM when sites change.
- Context - Lets Merchkit authenticate into partner retailer portals, maintaining session state and cookies to access catalog pages.
- Crawl - Get the content of all product pages on a retailer by specifying the category URL, automatically discovering and extracting all products details.
- Scrape - Real-time extraction with parser execution for on-demand enrichment.
- Batch - Parallel processing of tens of thousands of URLs at once, all referencing the same parser for consistent, schema-aligned output.
Integrating Olostep
1. Create Parsers in the Dashboard
Merchkit builds custom parsers directly from the Olostep dashboard. Each parser extracts and structures data from a website in a deterministic, reliable way aligned to Merchkit's catalog schema. Parsers encode business logic once and can be updated at any time, or even self-healed by an LLM when retailer sites change.
- Define normalized product attributes: title, brand, model, dimensions, materials, images, bullets, category path, price, variants, Q&A content.
- Validate with live examples and iterate without code changes.
- Version parsers to maintain schema stability across channels.
Docs: Parsers
2. Run Parsers with Scrape (real-time)
Fetch any PDP or spec page and execute the parser to return structured JSON.
import requests, json
endpoint = "https://api.olostep.com/v1/scrapes"
payload = {
"url_to_scrape": "https://www.wayfair.com/furniture/pdp/mercury-row-sofa-w001234567.html",
"formats": ["json"],
"parser": {
"id": "@merchkit/wayfair-pdp-v1"
}
}
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json"
}
response = requests.post(endpoint, json=payload, headers=headers)
print(json.dumps(response.json(), indent=2))3. Use Context for Authenticated Retailer Portals
Olostep's Context feature lets Merchkit authenticate into partner retailer websites to help them automate data enrichment and catalog automation. Context securely attaches session cookies and headers to access protected partner pages, maintaining brand-safe, compliant access while automating enrichment workflows.
import requests, json
endpoint = "https://api.olostep.com/v1/scrapes"
payload = {
"url_to_scrape": "https://partners.homedepot.com/catalog/product/sku-42",
"formats": ["json"],
"parser": {
"id": "@merchkit/homedepot-portal-v2"
},
"context": {
"id": "ctx_merchkit_homedepot"
}
}
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json"
}
response = requests.post(endpoint, json=payload, headers=headers)
print(json.dumps(response.json(), indent=2))This enables Merchkit to navigate authenticated retailer portals, extract structured, deterministic data from hundreds of retailers, and keep catalogs synchronized while respecting partner access controls.
Results
- 94% faster enrichment vs. manual workflows
- 5× more SKUs optimized per month with the same team
- 10× cost reduction by eliminating spreadsheet churn and brittle scripts
- Channel-specific, compliant feeds generated automatically
With Olostep, Merchkit treats the web as a dependable, structured data source. Parsers encode business logic once; Crawl discovers all products in a category; Scrape and Batch execute that logic reliably and at scale; Context unlocks authenticated partner experiences, powering always up-to-date, channel-ready catalogs.
Ready to Get Started?
Test Olostep with 500 free credits, no credit card required.
References:
- Merchkit - AI catalog automation
- Olostep - Web Data API for AI
- Olostep Docs: Parsers
- Olostep Docs: Scrape
- Olostep Docs: Batch
- Olostep Docs: Crawl
- Olostep Docs: Context
Frequently Asked Questions
How do you automate catalog enrichment from retailer websites?
Catalog enrichment can be automated using web scraping APIs that extract structured product data from retailer websites. Olostep's Parsers let you define the exact data schema you need once, then apply it at scale across thousands of product pages. Combined with the Batch API for parallel processing and Context for authenticated portal access, you can automatically enrich product catalogs with attributes like dimensions, materials, pricing, and variants without manual data entry.
What's the best way to scrape authenticated retailer portals?
To scrape authenticated retailer portals, you need a solution that maintains session state and cookies across requests. Olostep's Context feature handles authentication by securely storing login sessions, allowing you to access partner portals and protected catalog pages while maintaining compliance. This is essential for B2B retailers and marketplace sellers who need to access vendor portals or partner platforms.
How can I standardize product data across multiple retailers?
Standardizing product data requires defining a unified schema and mapping diverse retailer formats to that schema. Use Olostep's Parsers to create custom extractors for each retailer that output to your standardized format. The parsers handle site-specific variations while ensuring consistent JSON output across all sources. This approach scales better than spreadsheets or manual mapping, especially when dealing with millions of SKUs.
What tools help with ecommerce catalog automation?
Ecommerce catalog automation tools should provide web scraping, data extraction, batch processing, and structured output capabilities. Olostep offers a complete solution with Parsers for structured extraction, Scrape for real-time data, Batch for processing thousands of URLs in parallel, Crawl for discovering all products in a category, and Context for accessing authenticated portals. This combination eliminates manual catalog work and keeps product data synchronized across channels.
How do you extract product attributes from PDPs at scale?
Extract product attributes from product detail pages (PDPs) at scale by creating a parser that identifies and extracts specific fields like title, price, dimensions, materials, and images. Use Olostep's Batch endpoint to process tens of thousands of PDPs in parallel, all referencing the same parser for consistent output. The parser can be updated anytime without code changes, and even self-heals when websites update their layout.



