WEB SCRAPER & DATA PIPELINE

A complete data ingestion and automation pipeline capable of scraping

JavaScript-rendered websites (Playwright) and static HTML sources

(BeautifulSoup). The system cleans and normalizes data, stores it into

PostgreSQL, and visualizes insights through a Streamlit dashboard.

Designed to simulate real-world industry use cases for automation and

data engineering tasks.

Web Scraper & Data Pipeline — Playwright + BeautifulSoup + PostgreSQL

Multi-source scraping engine

Support for JS-rendered pages (Playwright)

Support for lightweight HTML pages (BS4)

Data normalization service

Automated scheduler for recurring scrapes

PostgreSQL storage (JSON fields & structured data)

Streamlit reporting dashboard

Fully dockerized infrastructure

Features

Data Pipeline Flow

Scraper → Normalizer → Database → Dashboard → Insights

Technologies Used

Playwright, BeautifulSoup, FastAPI, PostgreSQL, Streamlit, Docker