- Stealth mode: playwright-stealth, random fingerprints, human delays - Retry logic: exponential backoff (3 attempts) - Logging: rotating logs to /root/.hermes/logs/gmb/ - Validation: phone/website/rating validation + dedup - Pain detection: 12 signals, scoring, service matching - Review scraper: extract reviews + pain keyword detection - Website health: SSL, speed, mobile, contact form checks - Pitch generator: Apex pitches (SMS, email, call, Gumtree) - Docker containerization - .env for secrets (no hardcoded API keys) - Integration with Pipecat voice dialer (gmb_to_voice.py)
175 lines
4.5 KiB
Markdown
175 lines
4.5 KiB
Markdown
# GMB Scraper v4 — Pain-Aware Lead Generation
|
|
|
|
**Your own tool. Zero API cost. Production-grade.**
|
|
|
|
Extracts business data from Google Maps, detects pain signals, checks website health, and generates personalized apex pitches for Darwisyah Digital Media.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Activate environment
|
|
source /root/tools/gmb-scraper/venv/bin/activate
|
|
|
|
# Basic scrape
|
|
./scrape.sh "lawyers Perth CBD"
|
|
|
|
# Pain-aware scrape (recommended)
|
|
./scrape.sh "dentists Joondalup" --full
|
|
|
|
# Quick pain detection (no reviews, no website checks)
|
|
./scrape.sh "accountants Perth" --quick
|
|
|
|
# Lead generation focused
|
|
./scrape.sh "electricians Perth" --leads
|
|
```
|
|
|
|
---
|
|
|
|
## Presets
|
|
|
|
| Preset | Flags | Best For |
|
|
|--------|-------|----------|
|
|
| `--full` | `--detect-pain --scrape-reviews --check-websites --pitch-report --json` | Complete analysis |
|
|
| `--quick` | `--detect-pain --json` | Fast pain screening |
|
|
| `--leads` | `--detect-pain --check-websites --pitch-report --json` | Lead gen focus |
|
|
|
|
---
|
|
|
|
## Advanced Usage
|
|
|
|
```bash
|
|
# Custom query with filters
|
|
python3 gmb_scraper.py -q "lawyers Perth" \
|
|
--detect-pain \
|
|
--scrape-reviews \
|
|
--check-websites \
|
|
--pitch-report \
|
|
--channel email \
|
|
--min-pain 25 \
|
|
--max-results 50
|
|
|
|
# All options
|
|
python3 gmb_scraper.py -q "QUERY" \
|
|
--min-rating 0.0 \
|
|
--min-reviews 0 \
|
|
--max-results 100 \
|
|
--detect-pain \
|
|
--min-pain 0 \
|
|
--scrape-reviews \
|
|
--max-reviews 30 \
|
|
--check-websites \
|
|
--pitch-report \
|
|
--channel sms|email|call|gumtree \
|
|
--output /path/to/output.csv \
|
|
--json \
|
|
--slow \
|
|
--headful \
|
|
--no-stealth \
|
|
--proxy http://user:pass@host:port
|
|
```
|
|
|
|
---
|
|
|
|
## Pain Signals Detected
|
|
|
|
| Signal | Weight | Service |
|
|
|--------|--------|---------|
|
|
| Missed calls in reviews | 30 | Lead Gen + Call Tracking |
|
|
| No website | 25 | Website Development |
|
|
| Broken website | 20 | Website Maintenance |
|
|
| Recent 1-star reviews | 20 | Review Response Service |
|
|
| No contact form | 15 | Lead Capture Optimization |
|
|
| Low rating (<3.5★) | 15 | Reputation Management |
|
|
| Not mobile-friendly | 12 | Mobile Optimization |
|
|
| Unclaimed GMB | 12 | GMB Optimization |
|
|
| Slow website (>3s) | 10 | Website Performance |
|
|
| Missing phone | 10 | GMB Cleanup |
|
|
| Few reviews (<10) | 8 | Review Generation |
|
|
| No hours listed | 5 | GMB Optimization |
|
|
|
|
---
|
|
|
|
## Output Fields
|
|
|
|
### Basic Mode
|
|
`name, address, phone, website, rating, review_count, category, hours, maps_url`
|
|
|
|
### Pain Detection Mode (+fields)
|
|
`pain_score, pain_signals, primary_service, confidence`
|
|
|
|
### Website Health Mode (+fields)
|
|
`website_reachable, website_ssl, website_load_time, website_mobile, website_form`
|
|
|
|
### Pitch Mode (+fields)
|
|
`pitch` (personalized outreach message)
|
|
|
|
---
|
|
|
|
## Voice Dialer Integration
|
|
|
|
```bash
|
|
# Import leads into Pipecat voice agent
|
|
python3 gmb_to_voice.py --csv results.csv --campaign CAMPAIGN_ID
|
|
|
|
# Full pipeline: scrape + create campaign + import
|
|
python3 gmb_to_voice.py \
|
|
--query "dentists Perth" \
|
|
--campaign-name "Dentist Outreach June" \
|
|
--topic "Dental Lead Generation" \
|
|
--start-dialer
|
|
```
|
|
|
|
---
|
|
|
|
## Docker
|
|
|
|
```bash
|
|
# Build
|
|
docker build -t gmb-scraper .
|
|
|
|
# Run
|
|
docker run --rm -v $(pwd)/output:/root/.hermes/cache/gmb \
|
|
gmb-scraper -q "lawyers Perth" --detect-pain --json
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
gmb_scraper.py ← Main entry point
|
|
├── lib/
|
|
│ ├── logger.py ← Logging + stats
|
|
│ ├── retry.py ← Exponential backoff
|
|
│ ├── stealth.py ← Anti-detection
|
|
│ ├── validator.py ← Data validation
|
|
│ ├── pain_detector.py ← Pain signal detection
|
|
│ ├── review_scraper.py ← Review extraction
|
|
│ ├── health_checker.py ← Website health checks
|
|
│ └── pitch_generator.py ← Apex pitch generation
|
|
├── gmb_to_voice.py ← Pipecat voice agent bridge
|
|
├── scrape.sh ← CLI wrapper
|
|
├── .env ← API keys (not in git)
|
|
└── Dockerfile ← Container build
|
|
```
|
|
|
|
---
|
|
|
|
## Logs
|
|
|
|
- Console: Human-readable progress
|
|
- File: `/root/.hermes/logs/gmb/gmb_scraper.log` (rotating, 10MB)
|
|
- Errors: `/root/.hermes/logs/gmb/gmb_scraper_errors.log` (rotating, 5MB)
|
|
|
|
---
|
|
|
|
## Tips
|
|
|
|
- **Start with `--quick`** to screen a niche before committing to full scrape
|
|
- **Use `--min-pain 25`** to filter out low-value leads
|
|
- **`--channel sms`** is best for cold outreach (short, punchy)
|
|
- **`--channel call`** generates full call scripts with objection handling
|
|
- **`--slow`** adds longer delays — use when you're hitting CAPTCHAs
|
|
- **`--headful`** shows the browser — good for debugging
|