An end-to-end Data Science & Machine Learning project that predicts customer churn, segments customers, and provides actionable business insights using real-world data.
Customer churn is a critical problem for telecom and SaaS companies.
This project builds a production-style data science pipeline that:
- Analyzes customer behavior
- Predicts customer churn using Machine Learning
- Segments customers using clustering
- Visualizes insights in an interactive dashboard
- ✅ Data cleaning & preprocessing
- ✅ Exploratory Data Analysis (EDA)
- ✅ Churn prediction using XGBoost
- ✅ Customer segmentation using KMeans
- ✅ Interactive dashboard using Streamlit
- ✅ Industry-standard project structure
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- XGBoost
- Jupyter Notebook
- Streamlit
customer-intelligence-ai/ │ ├── data/ │ ├── raw/ # Raw dataset │ └── processed/ # Cleaned data │ ├── notebooks/ │ ├── 01_eda.ipynb │ └── 03_segmentation.ipynb │ ├── src/ │ ├── preprocessing.py │ └── train_model.py │ ├── models/ │ └── churn_model.pkl │ ├── app/ │ └── dashboard.py │ ├── requirements.txt └── README.md
- Telco Customer Churn Dataset
- Source: Kaggle
- File used:
customers.csv
pip install -r requirements.txt
2️⃣ Run data preprocessing
python src/preprocessing.py
3️⃣ Train churn prediction model
python src/train_model.py
4️⃣ Run EDA & Segmentation
Open Jupyter Notebook:
jupyter notebook
Then run:
notebooks/01_eda.ipynb
notebooks/03_segmentation.ipynb
5️⃣ Launch dashboard
streamlit run app/dashboard.py