
How to Extract All URLs from a Website Using Olostep Maps API and Streamlit

Ehsan
Introduction
When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the first critical tasks is finding all the URLs on a website.
While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there's a faster, modern way: using Olostep Maps API
In this guide, we'll:
- Introduce the challenge of URL discovery
- Show how to build a live Streamlit app to scrape all URLs
- Compare it with traditional techniques (like sitemap.xml and robots.txt)
- Provide complete runnable Python code
Target Audience: Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.
Why Extract All URLs?
Finding every page on a website can help you:
- Analyze site structure (for SEO)
- Scrape website content efficiently
- Find hidden gems like orphan pages
- Monitor website changes
- Prepare data for AI agents and automation
Traditional Methods (Before Olostep)
1. Sitemaps (XML Files)
Webmasters often create XML sitemaps to help Google index their sites. Here's an example:
<urlset>
<url>
<loc>https://example.com</loc>
</url>
<url>
<loc>https://example.com/about</loc>
</url>
</urlset>
To find sitemaps:
- Visit
/sitemap.xml
(e.g., https://example.com/sitemap.xml) - Check
/robots.txt
(it usually links to the sitemap)
Other possible sitemap locations:
/sitemap.xml.gz
/sitemap_index.xml
/sitemap.php
You can also Google:
site:example.com filetype:xml
If the sitemap points to other sitemap files (e.g., English, French versions), you need to crawl each of them manually.
Problem?
- Some websites don't maintain updated sitemaps.
- Not all pages may be listed.
- Dynamic websites (with lots of JavaScript) often leave pages out.
2. Robots.txt
Example:
User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin
Good for finding disallowed URLs and sitemap links, but again not comprehensive.
The Modern Solution: Olostep Maps API
✅ Find up to 100,000 URLs in seconds.
✅ Bypass the need to manually find sitemap or robots.txt.
✅ Simple API call.
✅ No server maintenance or IP bans.
Let's build a full Streamlit app to demo this!
Full Python Project: Website URL Extractor with Olostep Maps API + Streamlit
1. Install Requirements
pip install streamlit requests
2. Python Code
import streamlit as st
import requests
import json
def fetch_urls(target_url, api_key):
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = {"url": target_url}
response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
if response.status_code == 200:
return response.json()
else:
st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
return None
st.title("🔎 Website URL Scraper")
st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")
api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")
if st.button("Find URLs"):
if api_key and url_to_scrape:
with st.spinner("Fetching URLs..."):
data = fetch_urls(url_to_scrape, api_key)
if data:
urls = data.get("urls", [])
st.success(f"✅ Found {len(urls)} URLs!")
for idx, u in enumerate(urls, start=1):
st.markdown(f"{idx}. [{u}]({u})")
st.download_button(
"📄 Download URLs as Text File",
data="\n".join(urls),
file_name="discovered_urls.txt",
mime="text/plain"
)
📸 Example Output
✅ Found 35 URLs from https://docs.olostep.com
📥 Saved as discovered_urls.txt
⚡ Why Olostep Maps API Beats Traditional Methods
Feature | Sitemap/Robots.txt | SEO Spider | Olostep Maps |
---|---|---|---|
Instant Response | ❌ | ❌ | ✅ |
Handles JS-heavy Sites | ❌ | ⚠️ (Partial) | ✅ |
Handles Big Sites | ❌ | ❌ (Limit) | ✅ |
No Setup Needed | ❌ | ❌ | ✅ |
Easy Pagination | ❌ | ❌ | ✅ |
📈 Conclusion
Using Olostep Maps API + a few lines of Streamlit code, you can build powerful website discovery tools in minutes.
No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.
✅ Super fast
✅ Reliable
✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.
🚀 Ready to try?
Register at Olostep.com and start building your own data pipelines today!
Written by:
Mohammad Ehsan Ansari
Growth Engineer @ Olostep
Happy scraping! 🚀