# GMB Scraper v4 — Pain-Aware Lead Generation **Your own tool. Zero API cost. Production-grade.** Extracts business data from Google Maps, detects pain signals, checks website health, and generates personalized apex pitches for Darwisyah Digital Media. --- ## Quick Start ```bash # Activate environment source /root/tools/gmb-scraper/venv/bin/activate # Basic scrape ./scrape.sh "lawyers Perth CBD" # Pain-aware scrape (recommended) ./scrape.sh "dentists Joondalup" --full # Quick pain detection (no reviews, no website checks) ./scrape.sh "accountants Perth" --quick # Lead generation focused ./scrape.sh "electricians Perth" --leads ``` --- ## Presets | Preset | Flags | Best For | |--------|-------|----------| | `--full` | `--detect-pain --scrape-reviews --check-websites --pitch-report --json` | Complete analysis | | `--quick` | `--detect-pain --json` | Fast pain screening | | `--leads` | `--detect-pain --check-websites --pitch-report --json` | Lead gen focus | --- ## Advanced Usage ```bash # Custom query with filters python3 gmb_scraper.py -q "lawyers Perth" \ --detect-pain \ --scrape-reviews \ --check-websites \ --pitch-report \ --channel email \ --min-pain 25 \ --max-results 50 # All options python3 gmb_scraper.py -q "QUERY" \ --min-rating 0.0 \ --min-reviews 0 \ --max-results 100 \ --detect-pain \ --min-pain 0 \ --scrape-reviews \ --max-reviews 30 \ --check-websites \ --pitch-report \ --channel sms|email|call|gumtree \ --output /path/to/output.csv \ --json \ --slow \ --headful \ --no-stealth \ --proxy http://user:pass@host:port ``` --- ## Pain Signals Detected | Signal | Weight | Service | |--------|--------|---------| | Missed calls in reviews | 30 | Lead Gen + Call Tracking | | No website | 25 | Website Development | | Broken website | 20 | Website Maintenance | | Recent 1-star reviews | 20 | Review Response Service | | No contact form | 15 | Lead Capture Optimization | | Low rating (<3.5★) | 15 | Reputation Management | | Not mobile-friendly | 12 | Mobile Optimization | | Unclaimed GMB | 12 | GMB Optimization | | Slow website (>3s) | 10 | Website Performance | | Missing phone | 10 | GMB Cleanup | | Few reviews (<10) | 8 | Review Generation | | No hours listed | 5 | GMB Optimization | --- ## Output Fields ### Basic Mode `name, address, phone, website, rating, review_count, category, hours, maps_url` ### Pain Detection Mode (+fields) `pain_score, pain_signals, primary_service, confidence` ### Website Health Mode (+fields) `website_reachable, website_ssl, website_load_time, website_mobile, website_form` ### Pitch Mode (+fields) `pitch` (personalized outreach message) --- ## Voice Dialer Integration ```bash # Import leads into Pipecat voice agent python3 gmb_to_voice.py --csv results.csv --campaign CAMPAIGN_ID # Full pipeline: scrape + create campaign + import python3 gmb_to_voice.py \ --query "dentists Perth" \ --campaign-name "Dentist Outreach June" \ --topic "Dental Lead Generation" \ --start-dialer ``` --- ## Docker ```bash # Build docker build -t gmb-scraper . # Run docker run --rm -v $(pwd)/output:/root/.hermes/cache/gmb \ gmb-scraper -q "lawyers Perth" --detect-pain --json ``` --- ## Architecture ``` gmb_scraper.py ← Main entry point ├── lib/ │ ├── logger.py ← Logging + stats │ ├── retry.py ← Exponential backoff │ ├── stealth.py ← Anti-detection │ ├── validator.py ← Data validation │ ├── pain_detector.py ← Pain signal detection │ ├── review_scraper.py ← Review extraction │ ├── health_checker.py ← Website health checks │ └── pitch_generator.py ← Apex pitch generation ├── gmb_to_voice.py ← Pipecat voice agent bridge ├── scrape.sh ← CLI wrapper ├── .env ← API keys (not in git) └── Dockerfile ← Container build ``` --- ## Logs - Console: Human-readable progress - File: `/root/.hermes/logs/gmb/gmb_scraper.log` (rotating, 10MB) - Errors: `/root/.hermes/logs/gmb/gmb_scraper_errors.log` (rotating, 5MB) --- ## Tips - **Start with `--quick`** to screen a niche before committing to full scrape - **Use `--min-pain 25`** to filter out low-value leads - **`--channel sms`** is best for cold outreach (short, punchy) - **`--channel call`** generates full call scripts with objection handling - **`--slow`** adds longer delays — use when you're hitting CAPTCHAs - **`--headful`** shows the browser — good for debugging