GMB-Scraper/README.md
Zulkifli 5e893db025 feat: GMB Scraper v4 — production-grade pain-aware lead gen engine
- Stealth mode: playwright-stealth, random fingerprints, human delays
- Retry logic: exponential backoff (3 attempts)
- Logging: rotating logs to /root/.hermes/logs/gmb/
- Validation: phone/website/rating validation + dedup
- Pain detection: 12 signals, scoring, service matching
- Review scraper: extract reviews + pain keyword detection
- Website health: SSL, speed, mobile, contact form checks
- Pitch generator: Apex pitches (SMS, email, call, Gumtree)
- Docker containerization
- .env for secrets (no hardcoded API keys)
- Integration with Pipecat voice dialer (gmb_to_voice.py)
2026-06-06 19:45:44 +08:00

175 lines
4.5 KiB
Markdown

# GMB Scraper v4 — Pain-Aware Lead Generation
**Your own tool. Zero API cost. Production-grade.**
Extracts business data from Google Maps, detects pain signals, checks website health, and generates personalized apex pitches for Darwisyah Digital Media.
---
## Quick Start
```bash
# Activate environment
source /root/tools/gmb-scraper/venv/bin/activate
# Basic scrape
./scrape.sh "lawyers Perth CBD"
# Pain-aware scrape (recommended)
./scrape.sh "dentists Joondalup" --full
# Quick pain detection (no reviews, no website checks)
./scrape.sh "accountants Perth" --quick
# Lead generation focused
./scrape.sh "electricians Perth" --leads
```
---
## Presets
| Preset | Flags | Best For |
|--------|-------|----------|
| `--full` | `--detect-pain --scrape-reviews --check-websites --pitch-report --json` | Complete analysis |
| `--quick` | `--detect-pain --json` | Fast pain screening |
| `--leads` | `--detect-pain --check-websites --pitch-report --json` | Lead gen focus |
---
## Advanced Usage
```bash
# Custom query with filters
python3 gmb_scraper.py -q "lawyers Perth" \
--detect-pain \
--scrape-reviews \
--check-websites \
--pitch-report \
--channel email \
--min-pain 25 \
--max-results 50
# All options
python3 gmb_scraper.py -q "QUERY" \
--min-rating 0.0 \
--min-reviews 0 \
--max-results 100 \
--detect-pain \
--min-pain 0 \
--scrape-reviews \
--max-reviews 30 \
--check-websites \
--pitch-report \
--channel sms|email|call|gumtree \
--output /path/to/output.csv \
--json \
--slow \
--headful \
--no-stealth \
--proxy http://user:pass@host:port
```
---
## Pain Signals Detected
| Signal | Weight | Service |
|--------|--------|---------|
| Missed calls in reviews | 30 | Lead Gen + Call Tracking |
| No website | 25 | Website Development |
| Broken website | 20 | Website Maintenance |
| Recent 1-star reviews | 20 | Review Response Service |
| No contact form | 15 | Lead Capture Optimization |
| Low rating (<3.5★) | 15 | Reputation Management |
| Not mobile-friendly | 12 | Mobile Optimization |
| Unclaimed GMB | 12 | GMB Optimization |
| Slow website (>3s) | 10 | Website Performance |
| Missing phone | 10 | GMB Cleanup |
| Few reviews (<10) | 8 | Review Generation |
| No hours listed | 5 | GMB Optimization |
---
## Output Fields
### Basic Mode
`name, address, phone, website, rating, review_count, category, hours, maps_url`
### Pain Detection Mode (+fields)
`pain_score, pain_signals, primary_service, confidence`
### Website Health Mode (+fields)
`website_reachable, website_ssl, website_load_time, website_mobile, website_form`
### Pitch Mode (+fields)
`pitch` (personalized outreach message)
---
## Voice Dialer Integration
```bash
# Import leads into Pipecat voice agent
python3 gmb_to_voice.py --csv results.csv --campaign CAMPAIGN_ID
# Full pipeline: scrape + create campaign + import
python3 gmb_to_voice.py \
--query "dentists Perth" \
--campaign-name "Dentist Outreach June" \
--topic "Dental Lead Generation" \
--start-dialer
```
---
## Docker
```bash
# Build
docker build -t gmb-scraper .
# Run
docker run --rm -v $(pwd)/output:/root/.hermes/cache/gmb \
gmb-scraper -q "lawyers Perth" --detect-pain --json
```
---
## Architecture
```
gmb_scraper.py ← Main entry point
├── lib/
│ ├── logger.py ← Logging + stats
│ ├── retry.py ← Exponential backoff
│ ├── stealth.py ← Anti-detection
│ ├── validator.py ← Data validation
│ ├── pain_detector.py ← Pain signal detection
│ ├── review_scraper.py ← Review extraction
│ ├── health_checker.py ← Website health checks
│ └── pitch_generator.py ← Apex pitch generation
├── gmb_to_voice.py ← Pipecat voice agent bridge
├── scrape.sh ← CLI wrapper
├── .env ← API keys (not in git)
└── Dockerfile ← Container build
```
---
## Logs
- Console: Human-readable progress
- File: `/root/.hermes/logs/gmb/gmb_scraper.log` (rotating, 10MB)
- Errors: `/root/.hermes/logs/gmb/gmb_scraper_errors.log` (rotating, 5MB)
---
## Tips
- **Start with `--quick`** to screen a niche before committing to full scrape
- **Use `--min-pain 25`** to filter out low-value leads
- **`--channel sms`** is best for cold outreach (short, punchy)
- **`--channel call`** generates full call scripts with objection handling
- **`--slow`** adds longer delays use when you're hitting CAPTCHAs
- **`--headful`** shows the browser good for debugging