Skip to content

Instantly share code, notes, and snippets.

View ricklentz's full-sized avatar

RWL ricklentz

View GitHub Profile
@ricklentz
ricklentz / debunk_competitive_advantage_loss.txt
Created July 21, 2017 15:42
countering criticism on loss of competitive advantage by companies exposed to new, open and parallel regulatory bodies
I've debated your question on the loss of competitive advantage by companies exposed to this new regulatory risk myself. Here is why I think my recommendation of a parallel governance/regulatory model will hold:
The trend to move models/algorithm improvement work to externals continues and appears persistent
Rapid cycle times, new and relevant data sources (proxies), and barriers to entry keep the current ROI balance towards using externals
Quicker improvements lead to greater near term cash flow for the line of business, amplifying the near-term ROI factor
Open competition repeatedly shows us that innovative improvements come from new participants
Sourcing people skilled in this domain isn't going to get easier
The meta-analysis work itself is a target for automation
We use a process called 'Roll Back, Replay' (RBRP) to counter the unavoidable effects of data decay. RBRP is a parallel set of data flows that monitors all source system tables that contain data that can change and are significant to the business unit. We have two types of RBRP processes; historical and daily.
The RBRP historical process is basically a restatement of all values based on the current version of the truth for a prior period (e.g. start through end time, the month of Jan 2017). Consuming systems are set up to do an 'initial load' based on these new sets of data. The RBRP daily process is a delta restatement of all values based on the current version of the truth accounting for changes going back to some time (e.g. 2 years) but not including values for the current day.
The RBRP daily process is a delta restatement of all values based on the current version of the truth accounting for changes going back to some time (e.g. 2 years) but not including values for the current day. Consuming syst
import numpy as np
from pandas import DataFrame, Series
def points():
'''
Imagine a point system in which each country is awarded 4 points for each
gold medal, 2 points for each silver medal, and one point for each
bronze medal.
from prep_terrain_data import makeTerrainData
from class_vis import prettyPicture, output_image
from ClassifyNB import classify
### import the sklearn module for GaussianNB
from sklearn.naive_bayes import GaussianNB
import numpy as np
import pylab as pl
import warnings
# load the titanic data and then perform one-hot encoding on the feature names
import numpy as np
import pandas as pd
# Load the dataset
X = pd.read_csv('titanic_data.csv')
# Limit to categorical data
X = X.select_dtypes(include=[object])
import pandas
import numpy
# Read the data
data = pandas.read_csv('data.csv')
# Split the data into X and y
X = numpy.array(data[['x1', 'x2']])
y = numpy.array(data['y'])
import pandas
import numpy
# Read the data
data = pandas.read_csv('data.csv')
# Split the data into X and y
X = numpy.array(data[['x1', 'x2']])
y = numpy.array(data['y'])
# Reading the csv file
import pandas as pd
data = pd.read_csv("data.csv")
# Splitting the data into X and y
import numpy as np
X = np.array(data[['x1', 'x2']])
y = np.array(data['y'])
# Import statement for train_test_split
# Import, read, and split data
import pandas as pd
data = pd.read_csv('data.csv')
import numpy as np
from sklearn.model_selection import learning_curve
X = np.array(data[['x1', 'x2']])
y = np.array(data['y'])
# Fix random seed
np.random.seed(55)
@ricklentz
ricklentz / beginner_sql_examples.txt
Created July 26, 2017 01:16
yawn SQL beginner statements
SQL Statements
Purpose: Practice running SQL statements.
Prepares you for: Assignment 4
Get Started
Log in to the virtual desktop
Start SQL Server 2014 Management Studio