Skip to content

Instantly share code, notes, and snippets.

View akhld's full-sized avatar

Akhil akhld

View GitHub Profile
@akhld
akhld / zlib.scala
Created July 2, 2015 10:54
Zlib compression
import java.util.zip.{Inflater, Deflater} // Zlib library
import java.nio.file.{Files, Paths}
import java.io.{File, FileOutputStream}
object Inf {
def compress(inData: Array[Byte]): Array[Byte] = {
var deflater: Deflater = new Deflater()
deflater.setInput(inData)
deflater.finish
val compressedData = new Array[Byte](inData.size * 2) // compressed data can be larger than original data
@akhld
akhld / ReadFileWithColon.java
Created August 27, 2015 05:31
Reading files with colon in the name
final Configuration hadoopConf = sparkContext.hadoopConfiguration();
hadoopConf.set("fs." + CustomS3FileSystem.SCHEMA + ".impl",
CustomS3FileSystem.class.getName());
public class CustomS3FileSystem extends NativeS3FileSystem {
public static final String SCHEMA = "custom";
@Override
public FileStatus[] globStatus(final Path pathPattern, final PathFilter filter)
throws IOException {
@akhld
akhld / wa.txt
Created January 14, 2016 08:10
Whatsapp crasher
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘���
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘���😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘���😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
@akhld
akhld / AppendPartionedBy.scala
Created October 28, 2016 04:09
Reading multiple parquets, partitioning by columns and appending to table
val storage = "hdfs://nameservice1/user/plutus/data/kmeans_prediction_par_"
val penInputs = (1 to 30).map(x =>{
val date = DateTime.now().minusDays(x).toString("yyyy-MM-dd")
(date, storage + date)
}).filter(prediction_storage => {
HdfsTools.checkIfFolderExists(new Path(prediction_storage._2))
})
penInputs.foreach(println)
@akhld
akhld / scrapper.sh
Created December 1, 2018 12:37
Email scrapper
cat urls | while read url;
do
curl -o- $url | grep -oh -i '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' > emails;
email_found=`[[ $(wc -l < emails) -ge 1 ]] && echo "yes" || echo "no"`;
emails=`head -n3 emails | perl -00 -lpe 's/\n/,/g'`;
domain=`echo $url | awk -F[/:] '{print $4}'`;
more_emails=`[[ $(wc -l < emails) -ge 3 ]] && echo "yes" || echo "no"`;
echo "$domain, $email_found, $emails, $more_emails, $url";
done