Skip to content

Instantly share code, notes, and snippets.

View tecmaverick's full-sized avatar

AbrahamJP tecmaverick

View GitHub Profile
@tecmaverick
tecmaverick / S3ListFileExtensions
Created September 30, 2024 01:07
AWS CLI - S3 List File Extensions
# Retrieves the list of unique file extensions, including those with multiple extensions like gz.txt
aws s3 ls s3://the_bucket_name/prefix_name --recursive | rev | cut -d "/" -f 1 | rev | cut -d "." -f 2- | sort | uniq
# Retrieves the count of file extensions in a given s3 prefix
aws s3 ls s3://the_bucket_name/prefix_name --recursive | rev | cut -d "/" -f 1 | rev | cut -d "." -f 2- | uniq -c
@tecmaverick
tecmaverick / NetworkUtils.sh
Last active February 19, 2023 03:11
Network Utilities
#List all network connection
lsof -i
#List all open IPv4 connections
lsof -i 4
#List all open IPv6 connections
lsof -i 6
#List process running on specific port
@tecmaverick
tecmaverick / ProcessUtils.sh
Last active February 19, 2023 03:04
Process Utilities
# Change the process priority to the lowest -20 lowest and 20 is the highest. 0 is the default priority for all process
renice 20 -p $(pgrep "ProcessName")
# List all open files (Disk and Network) by a specific process
lsof | awk '{ if ($1=="ProcessName") { print}}'
@tecmaverick
tecmaverick / AddHeaderToCSV.sh
Last active February 13, 2023 03:46
Insert Header to CSV AWK and Bash version
# abc.csv contents
# Alpha, USA, 12
# Beta, USA, 13
# abc.csv contents after adding header
# Name, Country, Age
# Alpha, USA, 12
# Beta, USA, 13
@tecmaverick
tecmaverick / DetectInCognito.js
Created January 23, 2023 23:28
Detect if browser window is in Incognito mode
const detectIncognito = function() {
return new Promise(function(resolve, reject) {
var browserName = "Unknown";
function __callback(isPrivate) {
resolve({
isPrivate: isPrivate,
browserName: browserName,
});
}
function identifyChromium() {
@tecmaverick
tecmaverick / CSVHeaderInsert_FieldShuffle.sh
Created January 20, 2023 04:36
Insert New Header and Value to CSV file, and shuffle fields
#To execute script
# ./script.sh MyHeader MyCellValue input.csv output.csv
HEADER_NAME=$1
FIELD_VAL=$2
INPUT_FILENAME=$3
OUPUT_FILENAME=$4
awk -v HEADER_NAME=$1 -v FIELD_VAL=$2 -F"," '{if(NR==1){$0=HEADER_NAME","$0;};if (NR>1) { OFS = ",";{$0=FIELD_VAL","$0; }};if(NR>=0){FS=OFS=",";{f1=$1;f2=$2;f3=$3;f4=$4;f5=$5;$1=f2;$2=f1;$3=f3;$4=f4;$5=f5;print}}}' $INPUT_FILENAME > $OUPUT_FILENAME
//Set local checkpoint directory. This is unreliable incase of driver restarts
sc.setCheckpointDir("file:///tmp/sparkcheckpoints")
//View the checkpoint dir
sc.getCheckpointDir.get
val rdd = sc.parallelize(Seq.range(0,100))
val filteredRdd = rdd.filter(x=> x>50).map(x=> x * 2)
//Input Data
// studentid,coursename_with_attendance
// 01,CHEM:12|PHY:33|MATH:22
// 02,CHEM:34|PHY:3
// 03,MATH:12|COMP:45|CHEM:12
// 04,MATH:67|PHY:76
// 05,HIST:88|MARKT:33|BIOL:55
// 06,BIOL:88|PHY:77
// 07,BOTONY:34|ZOOL:77
// 08,BOTONY:34|COMP:99
@tecmaverick
tecmaverick / RDDSample001.scala
Created December 19, 2022 04:36
Get the number of students per course
// Input
// studentid,coursename
// 01,CHEM|PHY|MATH
// 02,CHEM|PHY
// 03,MATH|COMP|CHEM
// 04,MATH|PHY
// 05,HIST|MARKT|BIOL
// 06,BIOL|PHY
// 07,BOTONY|ZOOL
// 08,BOTONY|COMP
@tecmaverick
tecmaverick / SparkRDDScratchPad.scala
Last active December 20, 2022 00:50
Spark RDD ScratchPad
// ============================================================
// Generate a test KeyValue Pair
spark.conf.set("spark.sql.shuffle.partitions",2)
val num = Seq((2000,10),(2001,20),(2000,20),(2002,30),(2003,30),(2004,50),(2004,100),(2004,250),(2005,250),(2005,25),
(2006,150),(2006,225),(2007,250),(2007,125),(2008,250),(2009,25),(2010,250),(2010,125))
val rdd = sc.parallelize(num)
val prdd = rdd.reduceByKey(_ + _).repartition(2)
val srdd = rdd.sortByKey().repartition(2)