GMB-Scraper/README.md
Zulkifli 5e893db025 feat: GMB Scraper v4 — production-grade pain-aware lead gen engine
- Stealth mode: playwright-stealth, random fingerprints, human delays
- Retry logic: exponential backoff (3 attempts)
- Logging: rotating logs to /root/.hermes/logs/gmb/
- Validation: phone/website/rating validation + dedup
- Pain detection: 12 signals, scoring, service matching
- Review scraper: extract reviews + pain keyword detection
- Website health: SSL, speed, mobile, contact form checks
- Pitch generator: Apex pitches (SMS, email, call, Gumtree)
- Docker containerization
- .env for secrets (no hardcoded API keys)
- Integration with Pipecat voice dialer (gmb_to_voice.py)
2026-06-06 19:45:44 +08:00

4.5 KiB

GMB Scraper v4 — Pain-Aware Lead Generation

Your own tool. Zero API cost. Production-grade.

Extracts business data from Google Maps, detects pain signals, checks website health, and generates personalized apex pitches for Darwisyah Digital Media.


Quick Start

# Activate environment
source /root/tools/gmb-scraper/venv/bin/activate

# Basic scrape
./scrape.sh "lawyers Perth CBD"

# Pain-aware scrape (recommended)
./scrape.sh "dentists Joondalup" --full

# Quick pain detection (no reviews, no website checks)
./scrape.sh "accountants Perth" --quick

# Lead generation focused
./scrape.sh "electricians Perth" --leads

Presets

Preset Flags Best For
--full --detect-pain --scrape-reviews --check-websites --pitch-report --json Complete analysis
--quick --detect-pain --json Fast pain screening
--leads --detect-pain --check-websites --pitch-report --json Lead gen focus

Advanced Usage

# Custom query with filters
python3 gmb_scraper.py -q "lawyers Perth" \
  --detect-pain \
  --scrape-reviews \
  --check-websites \
  --pitch-report \
  --channel email \
  --min-pain 25 \
  --max-results 50

# All options
python3 gmb_scraper.py -q "QUERY" \
  --min-rating 0.0 \
  --min-reviews 0 \
  --max-results 100 \
  --detect-pain \
  --min-pain 0 \
  --scrape-reviews \
  --max-reviews 30 \
  --check-websites \
  --pitch-report \
  --channel sms|email|call|gumtree \
  --output /path/to/output.csv \
  --json \
  --slow \
  --headful \
  --no-stealth \
  --proxy http://user:pass@host:port

Pain Signals Detected

Signal Weight Service
Missed calls in reviews 30 Lead Gen + Call Tracking
No website 25 Website Development
Broken website 20 Website Maintenance
Recent 1-star reviews 20 Review Response Service
No contact form 15 Lead Capture Optimization
Low rating (<3.5★) 15 Reputation Management
Not mobile-friendly 12 Mobile Optimization
Unclaimed GMB 12 GMB Optimization
Slow website (>3s) 10 Website Performance
Missing phone 10 GMB Cleanup
Few reviews (<10) 8 Review Generation
No hours listed 5 GMB Optimization

Output Fields

Basic Mode

name, address, phone, website, rating, review_count, category, hours, maps_url

Pain Detection Mode (+fields)

pain_score, pain_signals, primary_service, confidence

Website Health Mode (+fields)

website_reachable, website_ssl, website_load_time, website_mobile, website_form

Pitch Mode (+fields)

pitch (personalized outreach message)


Voice Dialer Integration

# Import leads into Pipecat voice agent
python3 gmb_to_voice.py --csv results.csv --campaign CAMPAIGN_ID

# Full pipeline: scrape + create campaign + import
python3 gmb_to_voice.py \
  --query "dentists Perth" \
  --campaign-name "Dentist Outreach June" \
  --topic "Dental Lead Generation" \
  --start-dialer

Docker

# Build
docker build -t gmb-scraper .

# Run
docker run --rm -v $(pwd)/output:/root/.hermes/cache/gmb \
  gmb-scraper -q "lawyers Perth" --detect-pain --json

Architecture

gmb_scraper.py          ← Main entry point
├── lib/
│   ├── logger.py       ← Logging + stats
│   ├── retry.py        ← Exponential backoff
│   ├── stealth.py      ← Anti-detection
│   ├── validator.py    ← Data validation
│   ├── pain_detector.py ← Pain signal detection
│   ├── review_scraper.py ← Review extraction
│   ├── health_checker.py ← Website health checks
│   └── pitch_generator.py ← Apex pitch generation
├── gmb_to_voice.py     ← Pipecat voice agent bridge
├── scrape.sh           ← CLI wrapper
├── .env                ← API keys (not in git)
└── Dockerfile          ← Container build

Logs

  • Console: Human-readable progress
  • File: /root/.hermes/logs/gmb/gmb_scraper.log (rotating, 10MB)
  • Errors: /root/.hermes/logs/gmb/gmb_scraper_errors.log (rotating, 5MB)

Tips

  • Start with --quick to screen a niche before committing to full scrape
  • Use --min-pain 25 to filter out low-value leads
  • --channel sms is best for cold outreach (short, punchy)
  • --channel call generates full call scripts with objection handling
  • --slow adds longer delays — use when you're hitting CAPTCHAs
  • --headful shows the browser — good for debugging