Skip to content

Instantly share code, notes, and snippets.

@Geremie
Geremie / sample_dissimilar_predictors.py
Created November 15, 2020 22:07
Easily improve your company machine learning system with cost-efficient automated model retraining
instance = {'pickuplon': random.uniform(0, 1),
'pickuplat': random.uniform(1, 2),
'dropofflon': random.uniform(3, 4),
'dropofflat': random.uniform(4, -5),
'passengers': random.randint(1000, 2000)}
@Geremie
Geremie / sample_similar_predictors.py
Created November 15, 2020 22:00
Easily improve your company machine learning system with cost-efficient automated model retraining
instance = {'pickuplon': random.uniform(-71, -75),
'pickuplat': random.uniform(38, 42),
'dropofflon': random.uniform(-71, -75),
'dropofflat': random.uniform(38, 42),
'passengers': random.randint(1, 10)}
@Geremie
Geremie / is_retraining_needed.py
Last active November 16, 2020 22:58
Easily improve your company machine learning system with cost-efficient automated model retraining
def is_retraining_needed(context, dag_run_obj):
current_version = utils.get_model_current_version(MODEL_NAME)
label_mean_query = "SELECT labelMean FROM training_jobs WHERE versionName = '" + current_version + "'"
predictions_query = "SELECT prediction FROM predictions WHERE versionName = '" + current_version + "'"
hook = PostgresHook('cloud_sql_proxy_conn')
try:
label_mean = float(hook.get_records(label_mean_query)[0][0])
except IndexError: # probably caused by manual training job not persisting metrics
label_mean = np.inf
print('label mean: {} '.format(label_mean))
@Geremie
Geremie / check_if_version_exists.sh
Created November 15, 2020 19:03
Easily improve your company machine learning system with cost-efficient automated model retraining
gcloud ai-platform versions list --model=taxi_fare_predictor
@Geremie
Geremie / create_cloud_sql_proxy_service.sh
Last active November 16, 2020 20:27
Easily improve your company machine learning system with cost-efficient automated model retraining
# Clone the gitlab repository
git clone https://gitlab.com/marcdjoh/tensorflow-taxi-fare-predictor.git
# Navigate to the manifest folder
cd tensorflow-taxi-fare-predictor/manifests
# Create the namespace
kubectl apply -f namespace.yaml
# Create the deployment
@Geremie
Geremie / create_table_predictions.sql
Created November 15, 2020 18:11
Easily improve your company machine learning system with cost-efficient automated model retraining
CREATE TABLE predictions (
versionName varchar(200) NOT NULL,
pickuplon numeric NOT NULL,
pickuplat numeric NOT NULL,
dropofflat numeric NOT NULL,
dropofflon numeric NOT NULL,
passengers integer NOT NULL,
prediction numeric NOT NULL);
@Geremie
Geremie / create_table_training_jobs.sql
Last active November 15, 2020 18:13
Easily improve your company machine learning system with cost-efficient automated model retraining
CREATE TABLE training_jobs (
jobName varchar(200) NOT NULL,
versionName varchar(200) NOT NULL,
evalLoss numeric NOT NULL,
labelMean numeric NOT NULL);
@Geremie
Geremie / create_composer.sh
Created November 15, 2020 03:24
Easily improve your company machine learning system with automatic model retraining
gcloud composer environments create ml-model-retraining-prod --location=europe-west1 --zone=europe-west1-b
@Geremie
Geremie / copy_dag_file.sh
Last active October 31, 2020 20:01
Automate your Cloud SQL data synchronization to BigQuery with Airflow
gsutil cp path/to/cloud_sql_to_bq.py <dag_folder_name>/cloud_sql_to_bq.py
@Geremie
Geremie / retrieve_dag_folder_name.sh
Created October 31, 2020 19:52
Automate your Cloud SQL data synchronization to BigQuery with Airflow
gcloud composer environments describe data-synchronization-env --location=europe-west1