This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Import Regular Expression - Used to replace all special characters other than alphanumeric | |
| import re | |
| ## Input | |
| giventext = "This is Medium article presented by ramstkp in the month of October. On the day of writing it was cold, and autumn started early this in the october month. October month is relatively less cold compared to winter months" | |
| ## Replacing all other characters other than alphanumerics | |
| giventext = re.sub('[^a-zA-Z0-9 \n]', '', giventext) | |
| ## Converting to lower and splitting the text to list by word (split by space) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Import Regular Expression - Used to replace all special characters other than alphanumeric | |
| import re | |
| ## Input | |
| giventext = "This is Medium article presented by ramstkp in the month of October. On the day of writing it was cold, and autumn started early this in the october month. October month is relatively less cold compared to winter months" | |
| ## Replacing all characters other than alphanumerics | |
| giventext = re.sub('[^a-zA-Z0-9 \n]', '', giventext) | |
| ## Converting to lower and splitting the text to list by word (split by space) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Import Regular Expression - Used to replace all special characters other than alphanumeric | |
| import re | |
| ## Import Counter from collections | |
| from collections import Counter | |
| ## Input | |
| giventext = "This is Medium article presented by ramstkp in the month of October. On the day of writing it was cold, and autumn started early this in the october month. October month is relatively less cold compared to winter months" | |
| ## Replacing all other characters other than alphanumerics | |
| giventext = re.sub('[^a-zA-Z0-9 \n]', '', giventext) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Import Pandas to make dataframe | |
| import pandas as pd | |
| ## Import Regular Expression - Used to replace all special characters other than alphanumeric | |
| import re | |
| ## Import Counter from collections | |
| from datetime import datetime | |
| ## Input |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Don't change configs | |
| configs = { | |
| "fs.azure.account.auth.type": "CustomAccessToken", | |
| "fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName") | |
| } | |
| """ | |
| One need following details from ADLS | |
| 1. Your container Name (Optionally, coresponding directory name) | |
| 2. Your Storage account Name |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## import lit from sql functions - useful to add withcolumn a constant value | |
| from pyspark.sql.functions import lit | |
| ## Provide mount with directory where the files exists | |
| mount_path = '/mnt/<Your mount name>/<directory>' | |
| ## loop through the files | |
| for file in dbutils.fs.ls(mount_path): | |
| ## This could be better with defining a schema | |
| if 'flights1.csv' in file.name: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Provide mount with directory where the files exists | |
| mount_path = '/mnt/<mount name>/<directory>' | |
| spark.sql(f"create table flights_data_2 using csv location '{mount_path}/*.csv' options(header 'true', inferSchema 'true', sep ',')") | |
| ## run a group by command on registered table | |
| resultdf = spark.sql("select input_file_name() as filename, count(*) from flights_data_2 group by filename") | |
| resultdf.display() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| import random | |
| ## Provide file name with path for example: "C:\Users\xxxxx\flights.csv" | |
| split_source_file = input("File Name with absolute Path? : ") | |
| ## find number of lines using Pandas | |
| pd_dataframe = pd.read_csv(split_source_file, header=0) | |
| number_of_rows = len(pd_dataframe.index) + 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| import timeit | |
| def count_lines_enumrate_list(file_name): | |
| fp = open(file_name,'r') | |
| line_count = list(enumerate(fp))[-1][0] | |
| return line_count |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import random | |
| ## Define list of operators required | |
| operators = ['+','-','*','/'] | |
| ## generate random numbers based on random complexity counter | |
| def get_random_numbers(random_complexity): | |
| num1 = random.randint(1, random_complexity) | |
| num2 = random.randint(1, random_complexity) |
OlderNewer