Skip to content

🧰 Operations & Runbook


✅ Startup Checklist

  1. Ensure AWS SQS is configured.
  2. Run docker-compose up.
  3. Verify:
  4. DB tables created
  5. ML model loaded
  6. Prometheus scraping metrics

⚠️ Common Issues

  • SQS Connection Failure → Check credentials and queue URL.
  • Scraper Timeout → Increase timeout/delay in SCRAPER_CONFIG.
  • DB IntegrityError → Ensure schema matches ORM definitions.
  • Model Not Loading → Verify correct MODEL_PATH.

🛡️ On-Call Guide

  • First check Grafana dashboards.
  • Look for spikes in SCRAPER_FAILURES or API latency.
  • Validate Prometheus targets are healthy.

🔄 Recovery Procedures

  • Restart consumer if scraping fails continuously.
  • Replay failed messages from SQS DLQ.
  • For DB issues, rollback transaction & re-run.