This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import sys | |
| from timeit import timeit | |
| n = int(sys.argv[1]) | |
| test1 = f""" | |
| a_list= [] | |
| for i in range({n}): | |
| a_list.append(i) | |
| """ | |
| print(timeit(test1)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| lines = sc.textFile('data.txt') #reading a text file | |
| lines_filtered = lines.filter(lambda line : ('word1' in line)) #filtering line contain the word "word1" | |
| lines_filtered.first() #took 1s to run | |
| lines_filtered.collect() #took 100s to run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Repatition | |
| 1. create even number of records in resultant partitions so the resources are consumed equally | |
| 2. Go for full shuffle so it will cost effective | |
| 3. used to increase or decerase number of partitions | |
| Coalesce: | |
| 1. Create un-even number of records in resultant partitions due to this load will be un-balanced | |
| 2. won't go for full shuffle so it will be fast | |
| 3. used to decrease number of partitions | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #https://www.hackerrank.com/challenges/missing-numbers/problem?isFullScreen=false | |
| a = "11 4 11 7 13 4 12 11 10 14".split(" ") | |
| b = "11 4 11 7 3 7 10 13 4 8 12 11 10 14 12".split(" ") | |
| result = [] | |
| arr = list(map(int,a)) | |
| brr = list(map(int,b)) | |
| a_dict = {} | |
| b_dict = {} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| import time | |
| import numpy as np | |
| #http://eforexcel.com/wp/wp-content/uploads/2020/09/5m-Sales-Records.zip | |
| df = pd.read_csv("5m Sales Records.csv") | |
| def filter1(df): | |
| start_time = time.time() | |
| for i in df.Country.unique(): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| select date_parse('2021-12-31 00:00:00','%Y-%m-%d %H:%i:%s') |
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| multi_date_format | |
| 07/01/2020 13:01 | |
| 03/01/2020 | |
| 02/01/2020 13:01 | |
| 01/01/2020 13:01 | |
| 05/01/2020 13:01 | |
| 04-Jan-20 | |
| 06/01/2020 13:01 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| SELECT * | |
| FROM | |
| ( | |
| SELECT '2021-01-15 13:01:01' AS multi_date_format | |
| UNION ALL | |
| SELECT '2021/01/15 13:01:02' | |
| UNION ALL | |
| SELECT '2021/01/03' | |
| UNION ALL | |
| SELECT '04 JAN 2021' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| SELECT | |
| Coalesce( | |
| try(date_parse(multi_date_format, '%Y-%m-%d %H:%i:%s')), | |
| try(date_parse(multi_date_format, '%Y/%m/%d %H:%i:%s')), | |
| try(date_parse(multi_date_format, '%Y/%m/%d')), | |
| try(date_parse(multi_date_format, '%d %M %Y')), | |
| try(date_parse(multi_date_format, '%d %M %Y %H:%i:%s')), | |
| try(date_parse(multi_date_format, '%d/%m/%Y %H:%i:%s')), | |
| try(date_parse(multi_date_format, '%d-%m-%Y %H:%i:%s')) | |
| ) as DateConvertedToTimestamp, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import json | |
| import javalang as jl | |
| tree = jl.parse.parse(content) | |
| def json_ast_encoder(o): | |
| if type(o) is set and len(o) == 0: | |
| return [] | |
| if hasattr(o, "__dict__"): | |
| return o.__dict__ | |
| return "" | |