Skip to content

Instantly share code, notes, and snippets.

View wbhinton's full-sized avatar

Weston Hinton wbhinton

View GitHub Profile
@wbhinton
wbhinton / pdf-summary-pdfminer.py
Created December 3, 2019 17:05
Summarize All PDF's in a directory using pdfMiner
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import StringIO
path = "pdf/*.pdf"
def convert_pdf_to_txt(path):
rsrcmgr = PDFResourceManager()
@wbhinton
wbhinton / pdf-summary-tika.py
Created December 3, 2019 17:00
Summarize All PDF's in a directory using Tika. Update the directory path as needed.
import PyPDF2
import textract
import nltk
import re
from tika import parser
import heapq
import glob
sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
@wbhinton
wbhinton / outlier-pyod.py
Created December 3, 2019 16:48
Outlier detection using PyOD. You need to narrow down the data to a two column DF with X,Y as the column names.
from sklearn.preprocessing import MinMaxScaler
# Import models
from pyod.models.cblof import CBLOF
from pyod.models.feature_bagging import FeatureBagging
from pyod.models.hbos import HBOS
from pyod.models.iforest import IForest
from pyod.models.knn import KNN
from pyod.models.lof import LOF
import matplotlib
df['X1'] = df['X']
@wbhinton
wbhinton / get-schema.sql
Created December 3, 2019 16:35
Get the schema for an existing DB.
SELECT OBJECT_SCHEMA_NAME(c.object_id) SchemaName
,o.Name AS Table_Name
,c.Name AS Field_Name
,t.Name AS Data_Type
,t.max_length AS Length_Size
,t.precision AS Precision
FROM sys.columns c
INNER JOIN sys.objects o ON o.object_id = c.object_id
LEFT JOIN sys.types t ON t.user_type_id = c.user_type_id
WHERE o.type = 'U'
@wbhinton
wbhinton / table-indexes.sql
Created December 3, 2019 16:33
Get a list of tables and their indices
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.Name AS TableName ,
i.Name AS IndexName
FROM sys.objects o
INNER JOIN sys.indexes i ON o.object_id = i.object_id
WHERE o.Type = 'U' -- User table
AND LEFT(i.Name, 1) <> '_'
-- Remove hypothetical indexes
@wbhinton
wbhinton / hierarchical-deps.sql
Created December 3, 2019 16:32
Get the table hierarchical dependencies in a database.
-- How to find the hierarchical dependencies
-- Solve recursive queries using Common Table Expressions (CTE)
WITH
TableHierarchy ( ParentTable, DependentTable, Level )
AS
(
-- Anchor member definition (First level group to start the process)
SELECT DISTINCT
CAST(NULL AS INT) AS ParentTable ,
@wbhinton
wbhinton / foreignkeys.sql
Created December 3, 2019 16:30
List tables and foreign keys
-- Foreign Keys
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
parent.name AS 'TableName' ,
o.name AS 'ForeignKey' ,
o.[Type] ,
o.Create_date
FROM sys.objects o
INNER JOIN sys.objects parent ON o.parent_object_id = parent.object_id
@wbhinton
wbhinton / finding-heaps.sql
Created December 3, 2019 16:29
Three options to find heaps in MSSQL
-- Heap tables (Method 1)
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
t.Name AS HeapTable ,
t.Create_Date
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
AND i.type_desc = 'HEAP'
ORDER BY t.Name
@wbhinton
wbhinton / col-desc.sql
Created December 3, 2019 16:28
A simple script to get all the tables and columns in a DB with descriptions
-- This works for T-SQL
SELECT u.name + '.' + t.name AS [table],
td.value AS [table_desc],
c.name AS [column],
cd.value AS [column_desc]
FROM sysobjects t
INNER JOIN sysusers u
ON u.uid = t.uid
LEFT OUTER JOIN sys.extended_properties td
ON td.major_id = t.id
@wbhinton
wbhinton / active-users.sql
Created December 3, 2019 16:26
Get all active users accessing a SQL DB
-- Similar information can be derived from sp_who
SELECT @@Servername AS Server ,
DB_NAME(database_id) AS DatabaseName ,
COUNT(database_id) AS Connections ,
Login_name AS LoginName ,
MIN(Login_Time) AS Login_Time ,
MIN(COALESCE(last_request_end_time, last_request_start_time))
AS Last_Batch
FROM sys.dm_exec_sessions
WHERE database_id > 0