How to Extract All URLs from a Website Using Olostep Maps API and Streamlit

How to Extract All URLs from a Website Using Olostep Maps API and Streamlit

Ehsan

Ehsan

Introduction

When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the first critical tasks is finding all the URLs on a website.

While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there's a faster, modern way: using Olostep Maps API

In this guide, we'll:

  • Introduce the challenge of URL discovery
  • Show how to build a live Streamlit app to scrape all URLs
  • Compare it with traditional techniques (like sitemap.xml and robots.txt)
  • Provide complete runnable Python code

Target Audience: Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.

Why Extract All URLs?

Finding every page on a website can help you:

  • Analyze site structure (for SEO)
  • Scrape website content efficiently
  • Find hidden gems like orphan pages
  • Monitor website changes
  • Prepare data for AI agents and automation

Traditional Methods (Before Olostep)

1. Sitemaps (XML Files)

Webmasters often create XML sitemaps to help Google index their sites. Here's an example:

<urlset>
  <url>
    <loc>https://example.com</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
</urlset>

To find sitemaps:

Other possible sitemap locations:

  • /sitemap.xml.gz
  • /sitemap_index.xml
  • /sitemap.php

You can also Google:

site:example.com filetype:xml

If the sitemap points to other sitemap files (e.g., English, French versions), you need to crawl each of them manually.

Problem?

  • Some websites don't maintain updated sitemaps.
  • Not all pages may be listed.
  • Dynamic websites (with lots of JavaScript) often leave pages out.

2. Robots.txt

Example:

User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin

Good for finding disallowed URLs and sitemap links, but again not comprehensive.

The Modern Solution: Olostep Maps API

✅ Find up to 100,000 URLs in seconds.
✅ Bypass the need to manually find sitemap or robots.txt.
✅ Simple API call.
✅ No server maintenance or IP bans.

Full code Link

Let's build a full Streamlit app to demo this!

Full Python Project: Website URL Extractor with Olostep Maps API + Streamlit

1. Install Requirements

pip install streamlit requests

2. Python Code

import streamlit as st
import requests
import json

def fetch_urls(target_url, api_key):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    payload = {"url": target_url}
    response = requests.post("https://api.olostep.com/v1/maps", headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()
    else:
        st.error(f"Failed to fetch URLs. Status code: {response.status_code}")
        return None

st.title("🔎 Website URL Scraper")

st.markdown("Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!")

api_key = st.text_input("Enter your Olostep API Key", type="password")
url_to_scrape = st.text_input("Enter Website URL (e.g., https://example.com)")

if st.button("Find URLs"):
    if api_key and url_to_scrape:
        with st.spinner("Fetching URLs..."):
            data = fetch_urls(url_to_scrape, api_key)
        if data:
            urls = data.get("urls", [])
            st.success(f"✅ Found {len(urls)} URLs!")
            for idx, u in enumerate(urls, start=1):
                st.markdown(f"{idx}. [{u}]({u})")

            st.download_button(
                "📄 Download URLs as Text File",
                data="\n".join(urls),
                file_name="discovered_urls.txt",
                mime="text/plain"
            )

📸 Example Output

✅ Found 35 URLs from https://docs.olostep.com

📥 Saved as discovered_urls.txt

⚡ Why Olostep Maps API Beats Traditional Methods

Feature Sitemap/Robots.txt SEO Spider Olostep Maps
Instant Response
Handles JS-heavy Sites ⚠️ (Partial)
Handles Big Sites ❌ (Limit)
No Setup Needed
Easy Pagination

📈 Conclusion

Using Olostep Maps API + a few lines of Streamlit code, you can build powerful website discovery tools in minutes.

No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.

✅ Super fast
✅ Reliable
✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.

🚀 Ready to try?

Register at Olostep.com and start building your own data pipelines today!

Written by:
Mohammad Ehsan Ansari
Growth Engineer @ Olostep

Happy scraping! 🚀