Web Scraping for Price Monitoring: A Complete Guide

Price Monitoring Dashboard - Web Datenextraktion
📅 March 24, 2026 ⏱ 9 min read ✍️ Leo Voss

Pricing is one of the highest-leverage levers in e-commerce — yet most businesses set prices manually, revisit them quarterly, and have no idea what competitors are charging right now. That's a massive blind spot in a market where prices can shift daily.

A Python-based web scraping price monitoring system changes the equation completely. You can track competitor prices in real time, get instant alerts when prices drop below yours, and build the data foundation for dynamic pricing strategies. This guide shows you exactly how to build one.

What You'll Build

By the end of this guide, you'll have a working automated price comparison system that:

Legal and Ethical Considerations First

⚠️ Before scraping any website, check its robots.txt file and Terms of Service. Many sites explicitly prohibit scraping. Respect rate limits. Never scrape personal data. In the EU, GDPR applies to any data collection — publicly available prices are generally fine, personal data is not.

The safest approach for e-commerce price monitoring: focus on publicly visible product pages, space your requests (minimum 2–5 seconds between requests), and identify your scraper in the User-Agent string. Many retailers also provide official price feeds or APIs — check for those first.

The Right Tool for the Job

Tool Best For Handles JS? Speed
requests + BeautifulSoup Static HTML pages No Fast
httpx + parsel Static pages, async No Very fast
Playwright JS-heavy sites (Amazon, etc.) Yes Slower
Scrapy Large-scale crawls With plugin Very fast

For most price monitoring use cases, start with requests + BeautifulSoup. If you hit JavaScript-rendered content (prices loaded via API after page load), upgrade to Playwright.

Building the Price Scraper

1 Set Up the Project

pip install requests beautifulsoup4 playwright sqlite3
pip install schedule python-dotenv

Create a simple data model first. SQLite is perfect for this — no server setup, stores everything locally:

import sqlite3
from datetime import datetime

def init_database():
    conn = sqlite3.connect("prices.db")
    cursor = conn.cursor()
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            url TEXT NOT NULL,
            selector TEXT NOT NULL,  -- CSS selector for price element
            our_price REAL,
            alert_threshold REAL     -- alert if competitor goes below this
        )
    """)
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY,
            product_id INTEGER,
            price REAL,
            currency TEXT DEFAULT 'EUR',
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (product_id) REFERENCES products (id)
        )
    """)
    
    conn.commit()
    return conn

2 The Core Scraper

import requests
from bs4 import BeautifulSoup
import re
import time
import random
import logging

logger = logging.getLogger(__name__)

HEADERS = {
    "User-Agent": "PriceMonitorBot/1.0 (contact@yourdomain.com)",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "keep-alive",
}

def scrape_price(url: str, selector: str) -> float | None:
    """Scrape a price from a URL using a CSS selector."""
    try:
        # Respectful delay
        time.sleep(random.uniform(2, 5))
        
        response = requests.get(url, headers=HEADERS, timeout=15)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.content, "html.parser")
        price_element = soup.select_one(selector)
        
        if not price_element:
            logger.warning(f"Price element not found at {url} with selector '{selector}'")
            return None
        
        # Extract numeric price from text (handles "€29,99", "$19.99", "29.99 EUR", etc.)
        price_text = price_element.get_text(strip=True)
        price_match = re.search(r'[\d]+[.,]?[\d]*', price_text.replace(',', '.'))
        
        if price_match:
            return float(price_match.group().replace(',', '.'))
        
        return None
        
    except requests.RequestException as e:
        logger.error(f"Failed to scrape {url}: {e}")
        return None


def scrape_price_js(url: str, selector: str) -> float | None:
    """Use Playwright for JavaScript-rendered prices."""
    from playwright.sync_api import sync_playwright
    
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.set_extra_http_headers(HEADERS)
        
        page.goto(url, wait_until="networkidle")
        
        try:
            element = page.wait_for_selector(selector, timeout=10000)
            price_text = element.inner_text()
            price_match = re.search(r'[\d]+[.,]?[\d]*', price_text.replace(',', '.'))
            if price_match:
                return float(price_match.group().replace(',', '.'))
        except Exception as e:
            logger.error(f"Playwright error at {url}: {e}")
        finally:
            browser.close()
    
    return None

3 Store and Compare Prices

def record_price(conn, product_id: int, price: float):
    """Store a new price reading in the database."""
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO price_history (product_id, price) VALUES (?, ?)",
        (product_id, price)
    )
    conn.commit()

def get_last_price(conn, product_id: int) -> float | None:
    """Get the most recent price for a product."""
    cursor = conn.cursor()
    cursor.execute(
        """SELECT price FROM price_history 
           WHERE product_id = ? 
           ORDER BY scraped_at DESC LIMIT 1""",
        (product_id,)
    )
    result = cursor.fetchone()
    return result[0] if result else None

def get_price_history(conn, product_id: int, days: int = 30) -> list:
    """Get price history for trend analysis."""
    cursor = conn.cursor()
    cursor.execute(
        """SELECT price, scraped_at FROM price_history 
           WHERE product_id = ? 
           AND scraped_at >= datetime('now', ?)
           ORDER BY scraped_at""",
        (product_id, f'-{days} days')
    )
    return cursor.fetchall()

4 Alerting System

import smtplib
from email.mime.text import MIMEText
import requests as http_requests

def send_email_alert(product_name: str, competitor_price: float, 
                     our_price: float, url: str):
    """Send email alert when competitor price drops below threshold."""
    subject = f"⚠️ Price Alert: {product_name} — Competitor at €{competitor_price:.2f}"
    body = f"""
Price Alert for: {product_name}

Competitor price: €{competitor_price:.2f}
Your current price: €{our_price:.2f}
Difference: €{our_price - competitor_price:.2f} ({((our_price - competitor_price) / our_price * 100):.1f}%)

Competitor URL: {url}

Review your pricing strategy to stay competitive.
"""
    msg = MIMEText(body)
    msg["Subject"] = subject
    msg["From"] = "alerts@yourdomain.com"
    msg["To"] = "you@yourdomain.com"
    
    with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
        server.login("alerts@yourdomain.com", "your-app-password")
        server.send_message(msg)

def send_slack_alert(product_name: str, competitor_price: float, 
                     our_price: float, url: str, webhook_url: str):
    """Send Slack notification for price changes."""
    diff_pct = (our_price - competitor_price) / our_price * 100
    
    http_requests.post(webhook_url, json={
        "text": f"🔴 *Price Alert*: {product_name}",
        "attachments": [{
            "color": "danger",
            "fields": [
                {"title": "Competitor Price", "value": f"€{competitor_price:.2f}", "short": True},
                {"title": "Your Price", "value": f"€{our_price:.2f}", "short": True},
                {"title": "Difference", "value": f"-{diff_pct:.1f}%", "short": True},
                {"title": "URL", "value": url, "short": False}
            ]
        }]
    })

The Main Monitoring Loop

import schedule

def run_price_check(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT id, name, url, selector, our_price, alert_threshold FROM products")
    products = cursor.fetchall()
    
    logger.info(f"Checking prices for {len(products)} products...")
    
    for product_id, name, url, selector, our_price, alert_threshold in products:
        current_price = scrape_price(url, selector)
        
        if current_price is None:
            logger.warning(f"Could not scrape price for {name}")
            continue
        
        last_price = get_last_price(conn, product_id)
        record_price(conn, product_id, current_price)
        
        logger.info(f"{name}: €{current_price:.2f}" + 
                   (f" (was €{last_price:.2f})" if last_price else ""))
        
        # Alert if price dropped below threshold
        if alert_threshold and current_price < alert_threshold:
            logger.warning(f"ALERT: {name} competitor at €{current_price:.2f}!")
            send_email_alert(name, current_price, our_price or 0, url)

def main():
    conn = init_database()
    
    # Add products to monitor
    cursor = conn.cursor()
    cursor.execute("""
        INSERT OR IGNORE INTO products (name, url, selector, our_price, alert_threshold)
        VALUES (?, ?, ?, ?, ?)
    """, (
        "Competitor Product A",
        "https://competitor-site.com/product-a",
        ".price-current",   # CSS selector for the price element
        49.99,              # Your price
        44.99               # Alert if competitor goes below this
    ))
    conn.commit()
    
    # Run immediately, then every 6 hours
    run_price_check(conn)
    schedule.every(6).hours.do(run_price_check, conn)
    
    while True:
        schedule.run_pending()
        time.sleep(60)

if __name__ == "__main__":
    main()

Finding the Right CSS Selector

The trickiest part of any scraping project is identifying the correct CSS selector for the price element. Here's the fastest method:

  1. Open the competitor's product page in Chrome
  2. Right-click on the price → "Inspect Element"
  3. In the Elements panel, right-click the price element → Copy → Copy selector
  4. Test in Python: soup.select_one("your.copied.selector")
  5. If the auto-generated selector is fragile (contains nth-child), find a more stable class name

💡 Prefer class-based selectors (.price-current) over position-based ones (div:nth-child(3) > span). Class-based selectors survive minor layout changes; position-based ones break whenever the page structure changes.

Scaling Up: Monitoring Hundreds of Products

For monitoring more than 50 products, move to an async architecture with httpx and asyncio. This lets you run multiple requests concurrently while still respecting rate limits per domain:

This is the architecture behind our PriceWatch system — a production price monitoring platform that tracks thousands of SKUs across dozens of e-commerce sites, with automated competitor intelligence reports delivered weekly. It handles dynamic pricing changes, detects flash sales, and integrates directly with the client's pricing management system.

Turning Data Into Decisions

Raw price data is only useful if you act on it. With the historical data you're collecting, you can:

For a complete e-commerce automation stack that puts all these pieces together, see our guide on E-Commerce process automation.

Need a Production-Ready Price Monitoring System?

We build custom price tracking and competitive intelligence platforms — with dashboards, automated reports, and dynamic pricing integrations for your specific market.

Visit Leo Voss Automation →