Skip to content

Property Intelligence Pipeline

Core Components

Lewingtonnn/Property-Intelligence-Pipeline

🧩 Core Components¶

This section breaks down each component of the pipeline.

⚙️ config.py¶

Centralized configuration for all parameters:
PROMETHEUS_PORT, SCRAPER_CONFIG, USER_AGENTS, ML_CONFIG.
Prevents magic numbers and centralizes system settings.

🚀 producer.py¶

Discovers property listing URLs.
Enqueues each URL into AWS SQS.
Key Features:
Async Playwright scraping
Pagination handling
URL filtering/limiting
Concurrent enqueueing

📦 consumer.py¶

Polls SQS for messages.
Scrapes property details.
Persists data into PostgreSQL.
Features:
Long-lived Playwright browser context
Async scraping with concurrency limits
Data validation
Upsert persistence logic
Prometheus metrics
Graceful shutdown (SIGINT, SIGTERM)

🕷️ scraper.py¶

Encapsulates Playwright scraping logic.
Features:
Async context manager for browser lifecycle
User-agent rotation
Resilient retry logic
Single property page scraping with validation
Defensive scraping (closing pages after use)

📝 data_extractor.py¶

Handles parsing and cleaning of data.
Fault-tolerant with helper methods (safe_inner_text, safe_get_attribute).
Extracts multi-floor plans, normalizes data.

💾 Database Layer¶

dbmodels.py¶

SQLModel ORM definitions:
Property table (listing info, metadata)
Pricing_and_floor_plans table (unit-level details)

db_ops.py¶

Handles database sessions, inserts, updates.
Features:
Async PostgreSQL engine
Upsert logic with rollback on error
Numeric parsing & type conversion
Timezone-aware timestamps

📈 FastAPI Layer¶

Routers: /properties, /analytics, /predict
Security: token-based authentication
Features:
Pydantic models for response validation
Analytics queries
Real-time ML predictions