This project involves building predictive models to analyze customer churn using Python, pandas, scikit-learn, and XGBoost. It includes data preprocessing, feature engineering, model training, and evaluation.
/data: Contains raw and processed datasets. Raw data should not be committed./notebooks: Jupyter notebooks for exploratory data analysis and prototyping./src: Python scripts for data processing, modeling, and evaluation./models: Saved model artifacts./reports: Generated reports and visualizations.
- pandas and NumPy for data manipulation.
- scikit-learn and XGBoost for modeling.
- matplotlib and seaborn for visualization.
- Jupyter for interactive development.
- Use consistent naming conventions for variables (e.g.,
snake_case). - Avoid hardcoding file paths; use config files or environment variables.
- Include docstrings for all functions and classes.
- Use logging instead of print statements for debugging.
- Set random seeds for reproducibility in modeling scripts.
- Clear outputs before committing notebooks.
- Use markdown cells to explain each step of the analysis.
- Avoid committing large datasets or outputs.
- Keep notebooks modular and focused on a single task.
- Include metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
- Visualize confusion matrices and ROC curves.
- Compare multiple models and justify selection.
- Do not commit raw data files containing sensitive information.
- Use
.gitignoreto exclude large or private files. - Mask or anonymize any personally identifiable information (PII).
- Are functions well-documented and modular?
- Is the code reproducible (e.g., random seeds, environment setup)?
- Are notebooks clean and readable?
- Are evaluation metrics appropriate and clearly presented?
- Are data privacy practices followed?