Pricing is one of the highest-leverage levers in e-commerce — yet most businesses set prices manually, revisit them quarterly, and have no idea what competitors are charging right now. That's a massive blind spot in a market where prices can shift daily.
A Python-based web scraping price monitoring system changes the equation completely. You can track competitor prices in real time, get instant alerts when prices drop below yours, and build the data foundation for dynamic pricing strategies. This guide shows you exactly how to build one.
What You'll Build
By the end of this guide, you'll have a working automated price comparison system that:
- Scrapes prices from multiple competitor websites on a schedule
- Stores historical price data in a database for trend analysis
- Sends email or Slack alerts when prices change significantly
- Handles JavaScript-rendered pages, pagination, and rate limiting
- Runs automatically in the background without manual intervention
Legal and Ethical Considerations First
⚠️ Before scraping any website, check its robots.txt file and Terms of Service. Many sites explicitly prohibit scraping. Respect rate limits. Never scrape personal data. In the EU, GDPR applies to any data collection — publicly available prices are generally fine, personal data is not.
The safest approach for e-commerce price monitoring: focus on publicly visible product pages, space your requests (minimum 2–5 seconds between requests), and identify your scraper in the User-Agent string. Many retailers also provide official price feeds or APIs — check for those first.
The Right Tool for the Job
| Tool | Best For | Handles JS? | Speed |
|---|---|---|---|
requests + BeautifulSoup |
Static HTML pages | No | Fast |
httpx + parsel |
Static pages, async | No | Very fast |
Playwright |
JS-heavy sites (Amazon, etc.) | Yes | Slower |
Scrapy |
Large-scale crawls | With plugin | Very fast |
For most price monitoring use cases, start with requests + BeautifulSoup. If you hit JavaScript-rendered content (prices loaded via API after page load), upgrade to Playwright.
Building the Price Scraper
1 Set Up the Project
pip install requests beautifulsoup4 playwright sqlite3
pip install schedule python-dotenv
Create a simple data model first. SQLite is perfect for this — no server setup, stores everything locally:
import sqlite3
from datetime import datetime
def init_database():
conn = sqlite3.connect("prices.db")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
url TEXT NOT NULL,
selector TEXT NOT NULL, -- CSS selector for price element
our_price REAL,
alert_threshold REAL -- alert if competitor goes below this
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS price_history (
id INTEGER PRIMARY KEY,
product_id INTEGER,
price REAL,
currency TEXT DEFAULT 'EUR',
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (product_id) REFERENCES products (id)
)
""")
conn.commit()
return conn
2 The Core Scraper
import requests
from bs4 import BeautifulSoup
import re
import time
import random
import logging
logger = logging.getLogger(__name__)
HEADERS = {
"User-Agent": "PriceMonitorBot/1.0 (contact@yourdomain.com)",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
}
def scrape_price(url: str, selector: str) -> float | None:
"""Scrape a price from a URL using a CSS selector."""
try:
# Respectful delay
time.sleep(random.uniform(2, 5))
response = requests.get(url, headers=HEADERS, timeout=15)
response.raise_for_status()
soup = BeautifulSoup(response.content, "html.parser")
price_element = soup.select_one(selector)
if not price_element:
logger.warning(f"Price element not found at {url} with selector '{selector}'")
return None
# Extract numeric price from text (handles "€29,99", "$19.99", "29.99 EUR", etc.)
price_text = price_element.get_text(strip=True)
price_match = re.search(r'[\d]+[.,]?[\d]*', price_text.replace(',', '.'))
if price_match:
return float(price_match.group().replace(',', '.'))
return None
except requests.RequestException as e:
logger.error(f"Failed to scrape {url}: {e}")
return None
def scrape_price_js(url: str, selector: str) -> float | None:
"""Use Playwright for JavaScript-rendered prices."""
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.set_extra_http_headers(HEADERS)
page.goto(url, wait_until="networkidle")
try:
element = page.wait_for_selector(selector, timeout=10000)
price_text = element.inner_text()
price_match = re.search(r'[\d]+[.,]?[\d]*', price_text.replace(',', '.'))
if price_match:
return float(price_match.group().replace(',', '.'))
except Exception as e:
logger.error(f"Playwright error at {url}: {e}")
finally:
browser.close()
return None
3 Store and Compare Prices
def record_price(conn, product_id: int, price: float):
"""Store a new price reading in the database."""
cursor = conn.cursor()
cursor.execute(
"INSERT INTO price_history (product_id, price) VALUES (?, ?)",
(product_id, price)
)
conn.commit()
def get_last_price(conn, product_id: int) -> float | None:
"""Get the most recent price for a product."""
cursor = conn.cursor()
cursor.execute(
"""SELECT price FROM price_history
WHERE product_id = ?
ORDER BY scraped_at DESC LIMIT 1""",
(product_id,)
)
result = cursor.fetchone()
return result[0] if result else None
def get_price_history(conn, product_id: int, days: int = 30) -> list:
"""Get price history for trend analysis."""
cursor = conn.cursor()
cursor.execute(
"""SELECT price, scraped_at FROM price_history
WHERE product_id = ?
AND scraped_at >= datetime('now', ?)
ORDER BY scraped_at""",
(product_id, f'-{days} days')
)
return cursor.fetchall()
4 Alerting System
import smtplib
from email.mime.text import MIMEText
import requests as http_requests
def send_email_alert(product_name: str, competitor_price: float,
our_price: float, url: str):
"""Send email alert when competitor price drops below threshold."""
subject = f"⚠️ Price Alert: {product_name} — Competitor at €{competitor_price:.2f}"
body = f"""
Price Alert for: {product_name}
Competitor price: €{competitor_price:.2f}
Your current price: €{our_price:.2f}
Difference: €{our_price - competitor_price:.2f} ({((our_price - competitor_price) / our_price * 100):.1f}%)
Competitor URL: {url}
Review your pricing strategy to stay competitive.
"""
msg = MIMEText(body)
msg["Subject"] = subject
msg["From"] = "alerts@yourdomain.com"
msg["To"] = "you@yourdomain.com"
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login("alerts@yourdomain.com", "your-app-password")
server.send_message(msg)
def send_slack_alert(product_name: str, competitor_price: float,
our_price: float, url: str, webhook_url: str):
"""Send Slack notification for price changes."""
diff_pct = (our_price - competitor_price) / our_price * 100
http_requests.post(webhook_url, json={
"text": f"🔴 *Price Alert*: {product_name}",
"attachments": [{
"color": "danger",
"fields": [
{"title": "Competitor Price", "value": f"€{competitor_price:.2f}", "short": True},
{"title": "Your Price", "value": f"€{our_price:.2f}", "short": True},
{"title": "Difference", "value": f"-{diff_pct:.1f}%", "short": True},
{"title": "URL", "value": url, "short": False}
]
}]
})
The Main Monitoring Loop
import schedule
def run_price_check(conn):
cursor = conn.cursor()
cursor.execute("SELECT id, name, url, selector, our_price, alert_threshold FROM products")
products = cursor.fetchall()
logger.info(f"Checking prices for {len(products)} products...")
for product_id, name, url, selector, our_price, alert_threshold in products:
current_price = scrape_price(url, selector)
if current_price is None:
logger.warning(f"Could not scrape price for {name}")
continue
last_price = get_last_price(conn, product_id)
record_price(conn, product_id, current_price)
logger.info(f"{name}: €{current_price:.2f}" +
(f" (was €{last_price:.2f})" if last_price else ""))
# Alert if price dropped below threshold
if alert_threshold and current_price < alert_threshold:
logger.warning(f"ALERT: {name} competitor at €{current_price:.2f}!")
send_email_alert(name, current_price, our_price or 0, url)
def main():
conn = init_database()
# Add products to monitor
cursor = conn.cursor()
cursor.execute("""
INSERT OR IGNORE INTO products (name, url, selector, our_price, alert_threshold)
VALUES (?, ?, ?, ?, ?)
""", (
"Competitor Product A",
"https://competitor-site.com/product-a",
".price-current", # CSS selector for the price element
49.99, # Your price
44.99 # Alert if competitor goes below this
))
conn.commit()
# Run immediately, then every 6 hours
run_price_check(conn)
schedule.every(6).hours.do(run_price_check, conn)
while True:
schedule.run_pending()
time.sleep(60)
if __name__ == "__main__":
main()
Finding the Right CSS Selector
The trickiest part of any scraping project is identifying the correct CSS selector for the price element. Here's the fastest method:
- Open the competitor's product page in Chrome
- Right-click on the price → "Inspect Element"
- In the Elements panel, right-click the price element → Copy → Copy selector
- Test in Python:
soup.select_one("your.copied.selector") - If the auto-generated selector is fragile (contains
nth-child), find a more stable class name
💡 Prefer class-based selectors (
.price-current) over position-based ones (div:nth-child(3) > span). Class-based selectors survive minor layout changes; position-based ones break whenever the page structure changes.
Scaling Up: Monitoring Hundreds of Products
For monitoring more than 50 products, move to an async architecture with httpx and asyncio. This lets you run multiple requests concurrently while still respecting rate limits per domain:
- Group URLs by domain and apply per-domain rate limits
- Use a semaphore to limit concurrent requests (3–5 per domain max)
- Add retry logic with exponential backoff for failed requests
- Consider a proxy rotation service for very large-scale monitoring
This is the architecture behind our PriceWatch system — a production price monitoring platform that tracks thousands of SKUs across dozens of e-commerce sites, with automated competitor intelligence reports delivered weekly. It handles dynamic pricing changes, detects flash sales, and integrates directly with the client's pricing management system.
Turning Data Into Decisions
Raw price data is only useful if you act on it. With the historical data you're collecting, you can:
- Identify pricing patterns: Does a competitor always drop prices on Fridays? Before holidays?
- Calculate price elasticity: How did your sales change when a competitor raised prices?
- Set dynamic pricing rules: Automatically adjust your prices to stay within a defined range of the market
- Detect stock-outs: When a competitor's price drops dramatically, they may be clearing stock
For a complete e-commerce automation stack that puts all these pieces together, see our guide on E-Commerce process automation.
Need a Production-Ready Price Monitoring System?
We build custom price tracking and competitive intelligence platforms — with dashboards, automated reports, and dynamic pricing integrations for your specific market.
Visit Leo Voss Automation →