This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| dvc stage add -n feature_engineering_analysis_oct_2025 \ | |
| -d data/raw/analysis_oct_2025/analysis.csv \ | |
| -d src/feature_engineering.py \ | |
| -o data/features/engineered_features_analysis_oct_2025.parquet \ | |
| python src/feature_engineering.py \ | |
| --input data/raw/analysis_oct_2025/analysis.csv \ | |
| --output data/features/engineered_features_analysis_oct_2025.parquet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| dvc add data/raw/analysis_oct_2025/analysis.csv | |
| git commit -m "October 2025 monitoring dataset" | |
| git add dvc.yaml dvc.lock src/feature_engineering.py data\features\. | |
| dvc add data/raw/analysis_oct_2025/analysis.csv | |
| dvc commit | |
| git add data/raw/analysis_oct_2025/analysis.csv.dvc data/raw/analysis_oct_2025/.gitignore | |
| git commit -m "October 2025 monitoring dataset" | |
| git tag -a v-oct2025-monitoring -m "Monitoring dataset for October 2025" | |
| dvc push |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| dvc stage add -n monitoring_oct_2025 | |
| -d data/features/engineered_features_analysis_oct_2025.parquet | |
| -d data/features/filtered_features.parquet -d models/ensemble_model.joblib | |
| -d src/monitoring_nannyml.py -o reports/monitoring/oct_2025 python src/monitoring_nannyml.py | |
| --input data/features/engineered_features_analysis_oct_2025.parquet | |
| --output reports/monitoring/oct_2025/performance_estimation.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # UNIVARIATE DRIFT + RCA | |
| # ---------------------------------------------------------------------------- | |
| def run_univariate_drift(reference_df, analysis_df, selected_features, perf_results, output_dir: Path): | |
| logger.info("Running univariate drift + RCA...") | |
| uv = nml.UnivariateDriftCalculator( | |
| column_names=selected_features, | |
| continuous_methods=["kolmogorov_smirnov"], # Best for continuous: detects distribution shape and location changes | |
| categorical_methods=["jensen_shannon"], # Best for categorical: information-theoretic, symmetric distance |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # MULTIVARIATE DRIFT (PCA) | |
| # ---------------------------------------------------------------------------- | |
| def run_multivariate_drift(reference_df, analysis_df, selected_features, output_dir: Path): | |
| logger.info("Running PCA multivariate drift...") | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # PERFORMANCE ESTIMATION (CBPE ONLY ON ANALYSIS DATA) | |
| # ---------------------------------------------------------------------------- | |
| def run_performance_estimation(reference_df, analysis_df, output_dir: Path): | |
| import time | |
| logger.info("Running CBPE performance estimation...") | |
| logger.info(f"Analysis dataset: {len(analysis_df)} records") | |
| print(reference_df["y_true"].nunique()) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # MODEL SCORING | |
| # ---------------------------------------------------------------------------- | |
| def score_model(df: pd.DataFrame, selected_features, model_path: str, is_reference: bool = False): | |
| logger.info(f"Scoring model: {model_path}") | |
| model = load(model_path) | |
| X = df[selected_features].fillna(0) | |
| df["y_pred_proba"] = model.predict_proba(X)[:, 1] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # LOAD ENGINEERED MONITORING DATA AND FILTER BY selected_features | |
| # ---------------------------------------------------------------------------- | |
| def load_and_filter_engineered_monitoring(engineered_path: str, selected_features): | |
| logger.info(f"Loading engineered monitoring dataset: {engineered_path}") | |
| df = pd.read_parquet(engineered_path) | |
| # Ensure timestamp exists | |
| if "timestamp" in df.columns: | |
| df["timestamp"] = pd.to_datetime(df["timestamp"]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # ---------------------------------------------------------------------------- | |
| # LOAD TRAINING FILTERED FEATURES → DETERMINE SELECTED FEATURE LIST | |
| # ---------------------------------------------------------------------------- | |
| def load_selected_features(filtered_training_path: str): | |
| df = pd.read_parquet(filtered_training_path) | |
| # all model features EXCEPT identifiers, typology, and transaction_type | |
| selected = [ | |
| col for col in df.columns | |
| if col not in ["session_id", "table_id", "patron_id", "timestamp", "typology", "transaction_type"] |