This program applies machine learning techniques to predict heart disease based on patient data. It analyzes key health metrics, balances the dataset for fair model training, and selects the most accurate prediction model. The system also provides interactive visualizations to highlight important factors influencing heart disease risk, making it valuable for both medical professionals and data scientists.
Key Applications:
✅ Medical Diagnosis Assistance – Helps doctors analyze heart disease risk based on patient data, can serve as an educational tool for medical students. ✅ Healthcare Research – Identifies trends and important health indicators.
Handles imbalanced datasets (if there are more healthy patients than diseased patients) using oversampling (SMOTE) or undersampling to ensure fair training.
✅ Machine Learning Models
Trains multiple models, including:
Logistic Regression – A simple yet effective statistical model for classification.
Random Forest – Uses multiple decision trees to improve accuracy.
Support Vector Machines (SVM) – Helps classify heart disease presence based on medical indicators.
XGBoost – An advanced boosting algorithm known for high accuracy in medical predictions.
Compares model performance using accuracy, precision, recall, F1-score, and AUC-ROC curves to select the best one.
✅ Hyperparameter Tuning & Model Optimization
Uses GridSearchCV or RandomizedSearchCV to find the best model parameters.
Ensures the chosen model achieves optimal accuracy without overfitting.
✅ Feature Importance Analysis
Identifies key factors (e.g., cholesterol, blood pressure, age) that contribute most to heart disease risk.
Uses techniques like SHAP values to explain model decisions.
✅ Data Visualization & Insights
Generates heatmaps, histograms, box plots, and correlation matrices to explore relationships between features.
ROC & Precision-Recall curves help evaluate model effectiveness.
✅ Deployment & User Interaction
Can be integrated into a Flask or FastAPI web app for real-time predictions.
Supports interactive dashboards (using Dash or Streamlit) to visualize patient data and prediction results.
Languages & Libraries Used
Python – Core programming language.
Pandas & NumPy – Data cleaning, manipulation, and numerical analysis.
Scikit-Learn – Machine learning model training and evaluation.
XGBoost – Advanced gradient boosting for better accuracy.
Matplotlib & Seaborn – Visualization of trends and patterns.
SHAP – Explains model predictions by showing feature importance.
SMOTE (Imbalanced-learn) – Handles class imbalances for fair model training.
Flask / FastAPI – Enables model deployment as a web application.
Dash / Streamlit – Creates interactive dashboards for real-time predictions.