- Stealth mode: playwright-stealth, random fingerprints, human delays - Retry logic: exponential backoff (3 attempts) - Logging: rotating logs to /root/.hermes/logs/gmb/ - Validation: phone/website/rating validation + dedup - Pain detection: 12 signals, scoring, service matching - Review scraper: extract reviews + pain keyword detection - Website health: SSL, speed, mobile, contact form checks - Pitch generator: Apex pitches (SMS, email, call, Gumtree) - Docker containerization - .env for secrets (no hardcoded API keys) - Integration with Pipecat voice dialer (gmb_to_voice.py)
4.5 KiB
4.5 KiB
GMB Scraper v4 — Pain-Aware Lead Generation
Your own tool. Zero API cost. Production-grade.
Extracts business data from Google Maps, detects pain signals, checks website health, and generates personalized apex pitches for Darwisyah Digital Media.
Quick Start
# Activate environment
source /root/tools/gmb-scraper/venv/bin/activate
# Basic scrape
./scrape.sh "lawyers Perth CBD"
# Pain-aware scrape (recommended)
./scrape.sh "dentists Joondalup" --full
# Quick pain detection (no reviews, no website checks)
./scrape.sh "accountants Perth" --quick
# Lead generation focused
./scrape.sh "electricians Perth" --leads
Presets
| Preset | Flags | Best For |
|---|---|---|
--full |
--detect-pain --scrape-reviews --check-websites --pitch-report --json |
Complete analysis |
--quick |
--detect-pain --json |
Fast pain screening |
--leads |
--detect-pain --check-websites --pitch-report --json |
Lead gen focus |
Advanced Usage
# Custom query with filters
python3 gmb_scraper.py -q "lawyers Perth" \
--detect-pain \
--scrape-reviews \
--check-websites \
--pitch-report \
--channel email \
--min-pain 25 \
--max-results 50
# All options
python3 gmb_scraper.py -q "QUERY" \
--min-rating 0.0 \
--min-reviews 0 \
--max-results 100 \
--detect-pain \
--min-pain 0 \
--scrape-reviews \
--max-reviews 30 \
--check-websites \
--pitch-report \
--channel sms|email|call|gumtree \
--output /path/to/output.csv \
--json \
--slow \
--headful \
--no-stealth \
--proxy http://user:pass@host:port
Pain Signals Detected
| Signal | Weight | Service |
|---|---|---|
| Missed calls in reviews | 30 | Lead Gen + Call Tracking |
| No website | 25 | Website Development |
| Broken website | 20 | Website Maintenance |
| Recent 1-star reviews | 20 | Review Response Service |
| No contact form | 15 | Lead Capture Optimization |
| Low rating (<3.5★) | 15 | Reputation Management |
| Not mobile-friendly | 12 | Mobile Optimization |
| Unclaimed GMB | 12 | GMB Optimization |
| Slow website (>3s) | 10 | Website Performance |
| Missing phone | 10 | GMB Cleanup |
| Few reviews (<10) | 8 | Review Generation |
| No hours listed | 5 | GMB Optimization |
Output Fields
Basic Mode
name, address, phone, website, rating, review_count, category, hours, maps_url
Pain Detection Mode (+fields)
pain_score, pain_signals, primary_service, confidence
Website Health Mode (+fields)
website_reachable, website_ssl, website_load_time, website_mobile, website_form
Pitch Mode (+fields)
pitch (personalized outreach message)
Voice Dialer Integration
# Import leads into Pipecat voice agent
python3 gmb_to_voice.py --csv results.csv --campaign CAMPAIGN_ID
# Full pipeline: scrape + create campaign + import
python3 gmb_to_voice.py \
--query "dentists Perth" \
--campaign-name "Dentist Outreach June" \
--topic "Dental Lead Generation" \
--start-dialer
Docker
# Build
docker build -t gmb-scraper .
# Run
docker run --rm -v $(pwd)/output:/root/.hermes/cache/gmb \
gmb-scraper -q "lawyers Perth" --detect-pain --json
Architecture
gmb_scraper.py ← Main entry point
├── lib/
│ ├── logger.py ← Logging + stats
│ ├── retry.py ← Exponential backoff
│ ├── stealth.py ← Anti-detection
│ ├── validator.py ← Data validation
│ ├── pain_detector.py ← Pain signal detection
│ ├── review_scraper.py ← Review extraction
│ ├── health_checker.py ← Website health checks
│ └── pitch_generator.py ← Apex pitch generation
├── gmb_to_voice.py ← Pipecat voice agent bridge
├── scrape.sh ← CLI wrapper
├── .env ← API keys (not in git)
└── Dockerfile ← Container build
Logs
- Console: Human-readable progress
- File:
/root/.hermes/logs/gmb/gmb_scraper.log(rotating, 10MB) - Errors:
/root/.hermes/logs/gmb/gmb_scraper_errors.log(rotating, 5MB)
Tips
- Start with
--quickto screen a niche before committing to full scrape - Use
--min-pain 25to filter out low-value leads --channel smsis best for cold outreach (short, punchy)--channel callgenerates full call scripts with objection handling--slowadds longer delays — use when you're hitting CAPTCHAs--headfulshows the browser — good for debugging