itsthanga thangarajan8

💭

Learning to How to Learn

Analytics, Data Engineer, Big Data, Python

thangarajan8 / list_append_fast.py

Created March 25, 2021 16:15

thangarajan8 / lazy_eval_2.py

Created August 11, 2021 08:39

	lines = sc.textFile('data.txt') #reading a text file

	lines_filtered = lines.filter(lambda line : ('word1' in line)) #filtering line contain the word "word1"

	lines_filtered.first() #took 1s to run
	lines_filtered.collect() #took 100s to run

thangarajan8 / Apache Spark Repartition vs coalesce.txt

Created August 24, 2021 12:51

Apache Spark Repartition vs coalesce

	Repatition
	1. create even number of records in resultant partitions so the resources are consumed equally
	2. Go for full shuffle so it will cost effective
	3. used to increase or decerase number of partitions

	Coalesce:
	1. Create un-even number of records in resultant partitions due to this load will be un-balanced
	2. won't go for full shuffle so it will be fast
	3. used to decrease number of partitions

thangarajan8 / Missing Numbers.py

Created September 14, 2021 10:56

	#https://www.hackerrank.com/challenges/missing-numbers/problem?isFullScreen=false
	a = "11 4 11 7 13 4 12 11 10 14".split(" ")
	b = "11 4 11 7 3 7 10 13 4 8 12 11 10 14 12".split(" ")

	result = []
	arr = list(map(int,a))

	brr = list(map(int,b))
	a_dict = {}
	b_dict = {}

thangarajan8 / pandas_filter.py

Created September 15, 2021 10:19

	import pandas as pd
	import time
	import numpy as np
	#http://eforexcel.com/wp/wp-content/uploads/2020/09/5m-Sales-Records.zip
	df = pd.read_csv("5m Sales Records.csv")

	def filter1(df):
	start_time = time.time()

	for i in df.Country.unique():

thangarajan8 / date_parse_athena.sql

Created September 15, 2021 11:10

select date_parse('2021-12-31 00:00:00','%Y-%m-%d %H:%i:%s')

thangarajan8 / multiple_date_format.csv

Created September 15, 2021 11:13

We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.

thangarajan8 / multiple_date_format.sql

Last active September 15, 2021 11:16

thangarajan8 / multiple_date_format_answer.sql

Created September 15, 2021 11:20

	SELECT
	Coalesce(
	try(date_parse(multi_date_format, '%Y-%m-%d %H:%i:%s')),
	try(date_parse(multi_date_format, '%Y/%m/%d %H:%i:%s')),
	try(date_parse(multi_date_format, '%Y/%m/%d')),
	try(date_parse(multi_date_format, '%d %M %Y')),
	try(date_parse(multi_date_format, '%d %M %Y %H:%i:%s')),
	try(date_parse(multi_date_format, '%d/%m/%Y %H:%i:%s')),
	try(date_parse(multi_date_format, '%d-%m-%Y %H:%i:%s'))
	) as DateConvertedToTimestamp,

thangarajan8 / javalang_parser.py

Created October 25, 2021 05:51