Created
February 8, 2021 03:05
-
-
Save DeflateAwning/128b0d95dd15e3b4cce1d00f698d8869 to your computer and use it in GitHub Desktop.
Have a table with a ton of columns? Know what you want your primary key to be? Use this method to find which columns have different values per row within each target primary key.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import BOE_SQLAccess as boesql | |
sqlEngine = boesql.createSQLAlchemyEngine() # or equivalent SQLEngine object | |
tableName = 'petrinex_ab_well_licence' | |
pk = 'Licence Number' # target column to check for duplicates as the primary key | |
df_in = pd.read_sql(tableName, con=sqlEngine) | |
df = df_in.copy() # make a copy for safety, because downloads can be expensive | |
df = df.groupby(pk, as_index=True).nunique() | |
df = df.transpose() | |
# give list of columns with multiple distinct values | |
df[df.gt(1).any(axis=1)] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment