WEB SCRAPER & DATA PIPELINE
A complete data ingestion and automation pipeline capable of scraping
JavaScript-rendered websites (Playwright) and static HTML sources
(BeautifulSoup). The system cleans and normalizes data, stores it into
PostgreSQL, and visualizes insights through a Streamlit dashboard.
Designed to simulate real-world industry use cases for automation and
data engineering tasks.


Web Scraper & Data Pipeline — Playwright + BeautifulSoup + PostgreSQL
Multi-source scraping engine
Support for JS-rendered pages (Playwright)
Support for lightweight HTML pages (BS4)
Data normalization service
Automated scheduler for recurring scrapes
PostgreSQL storage (JSON fields & structured data)
Streamlit reporting dashboard
Fully dockerized infrastructure
Features
Data Pipeline Flow


