GMB-Scraper/Dockerfile
Zulkifli 5e893db025 feat: GMB Scraper v4 — production-grade pain-aware lead gen engine
- Stealth mode: playwright-stealth, random fingerprints, human delays
- Retry logic: exponential backoff (3 attempts)
- Logging: rotating logs to /root/.hermes/logs/gmb/
- Validation: phone/website/rating validation + dedup
- Pain detection: 12 signals, scoring, service matching
- Review scraper: extract reviews + pain keyword detection
- Website health: SSL, speed, mobile, contact form checks
- Pitch generator: Apex pitches (SMS, email, call, Gumtree)
- Docker containerization
- .env for secrets (no hardcoded API keys)
- Integration with Pipecat voice dialer (gmb_to_voice.py)
2026-06-06 19:45:44 +08:00

42 lines
870 B
Docker

FROM python:3.11-slim
# Install system deps
RUN apt-get update && apt-get install -y \
wget \
gnupg \
libnss3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libcups2 \
libdrm2 \
libxkbcommon0 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
libgbm1 \
libpango-1.0-0 \
libcairo2 \
libasound2 \
libxshmfence1 \
&& rm -rf /var/lib/apt/lists/*
# Install Playwright browsers
RUN pip install playwright && playwright install chromium --with-deps
# Copy app
WORKDIR /app
COPY lib/ lib/
COPY gmb_scraper.py .
COPY gmb_to_voice.py .
COPY scrape.sh .
COPY .env .env
# Install Python deps
RUN pip install playwright-stealth python-dotenv requests beautifulsoup4 lxml
# Create output dir
RUN mkdir -p /root/.hermes/cache/gmb /root/.hermes/logs/gmb
# Default command
ENTRYPOINT ["python", "gmb_scraper.py"]
CMD ["--help"]