This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| param( | |
| [string]$RootPath = "C:\", | |
| [string]$OutputPath = "C:\projects\laptop-cleanup\folder_analysis\", | |
| [string[]]$ExcludeFolders = @("laptop-cleanup"), | |
| [ValidateRange(0, [int64]::MaxValue)] | |
| [int64]$MinimumDirectorySizeToRecurseBytes = 50MB, | |
| [switch]$ShowAccessDeniedWarnings | |
| ) | |
| # Create output directory if it doesn't exist |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #requires -version 2 | |
| <# | |
| .SYNOPSIS | |
| Gets folder sizes using COM and by default with a fallback to robocopy.exe, with the | |
| logging only option, which makes it not actually copy or move files, but just list them, and | |
| the end summary result is parsed to extract the relevant data. | |
| There is a -ComOnly parameter for using only COM, and a -RoboOnly parameter for using only |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <# | |
| .SYNOPSIS | |
| .PARAMETER Ratio | |
| A real number within (0, 1). Ratio of the compressed file that is accepted as the archive. | |
| Files that cannot be compressed better or equal to this ratio compared to the original, | |
| are not archived (left as is). | |
| #> | |
| param( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| import pandas as pd | |
| #load dataset | |
| df = pd.read_csv("data.csv") | |
| # axis 0 -> row -> i | |
| # axis 1 -> col -> j | |
| # get cols |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Read data from csv | |
| data = pd.read_csv('data.csv', sep=',', index_col='Number') | |
| # Write data to csv | |
| data.to_csv("data_wo_sensitive_lemmatized.csv", index=False, encoding='utf-8', sep=';') | |
| # Read and concat several files in one dataframe | |
| files = glob.glob('*.csv') | |
| small_dfs = [pd.read_csv(fp, names=columns) for fp in files] | |
| df = pd.concat(small_dfs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # List unique values in a DataFrame column | |
| df['Column Name'].unique() | |
| # To extract a specific column (subset the dataframe), you can use [ ] (brackets) or attribute notation. | |
| df.height | |
| df['height'] | |
| # are same thing!!! (from http://www.stephaniehicks.com/learnPython/pages/pandas.html | |
| # -or- | |
| # http://www.datacarpentry.org/python-ecology-lesson/02-index-slice-subset/) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| from typing import Dict, List | |
| class DataQualityValidator: | |
| def __init__(self, df: pd.DataFrame): | |
| self.df = df | |
| self.issues = [] | |
| def check_nulls(self, columns: List[str], threshold: float = 0.05): | |
| """Check if null percentage exceeds threshold""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from functools import wraps | |
| import datetime as dt | |
| import pandas as pd | |
| def log_start(func): | |
| @wraps(func) | |
| def wrapper(*args, **kwargs): | |
| tic = dt.datetime.now() | |
| result = func(*args, **kwargs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def remove_non_printing_chars(df): | |
| """Clean a dataframe column to remove any non-printing characters. | |
| We've encountered values like tabs in some of the data. | |
| :param df: Pandas dataframe | |
| :return: Pandas dataframe | |
| """ | |
| clean_df = df.copy(deep=True) | |
| clean_df = clean_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x) | |
| for col in list(clean_df.columns): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Apply Lambda function to pandas | |
| # if we require other column as a logic for the new column | |
| df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3'])) | |
| # if we need to modify all the element of selected entity based only on that entity | |
| # this will in-place update all the element | |
| df = df.apply(lambda x: np.square(x) if x.name in ['a', 'e', 'g'] else x, axis=1) | |
| # compare from the previous element of the colums use shift |