Skip to content

Instantly share code, notes, and snippets.

View akhld's full-sized avatar

Akhil akhld

View GitHub Profile
@akhld
akhld / scrapper.sh
Created December 1, 2018 12:37
Email scrapper
cat urls | while read url;
do
curl -o- $url | grep -oh -i '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' > emails;
email_found=`[[ $(wc -l < emails) -ge 1 ]] && echo "yes" || echo "no"`;
emails=`head -n3 emails | perl -00 -lpe 's/\n/,/g'`;
domain=`echo $url | awk -F[/:] '{print $4}'`;
more_emails=`[[ $(wc -l < emails) -ge 3 ]] && echo "yes" || echo "no"`;
echo "$domain, $email_found, $emails, $more_emails, $url";
done
@akhld
akhld / AppendPartionedBy.scala
Created October 28, 2016 04:09
Reading multiple parquets, partitioning by columns and appending to table
val storage = "hdfs://nameservice1/user/plutus/data/kmeans_prediction_par_"
val penInputs = (1 to 30).map(x =>{
val date = DateTime.now().minusDays(x).toString("yyyy-MM-dd")
(date, storage + date)
}).filter(prediction_storage => {
HdfsTools.checkIfFolderExists(new Path(prediction_storage._2))
})
penInputs.foreach(println)
@akhld
akhld / wa.txt
Created January 14, 2016 08:10
Whatsapp crasher
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘���
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘���😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘���😘😘😘😘😘😘😘😘
😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘😘
@akhld
akhld / ReadFileWithColon.java
Created August 27, 2015 05:31
Reading files with colon in the name
final Configuration hadoopConf = sparkContext.hadoopConfiguration();
hadoopConf.set("fs." + CustomS3FileSystem.SCHEMA + ".impl",
CustomS3FileSystem.class.getName());
public class CustomS3FileSystem extends NativeS3FileSystem {
public static final String SCHEMA = "custom";
@Override
public FileStatus[] globStatus(final Path pathPattern, final PathFilter filter)
throws IOException {
@akhld
akhld / zlib.scala
Created July 2, 2015 10:54
Zlib compression
import java.util.zip.{Inflater, Deflater} // Zlib library
import java.nio.file.{Files, Paths}
import java.io.{File, FileOutputStream}
object Inf {
def compress(inData: Array[Byte]): Array[Byte] = {
var deflater: Deflater = new Deflater()
deflater.setInput(inData)
deflater.finish
val compressedData = new Array[Byte](inData.size * 2) // compressed data can be larger than original data
@akhld
akhld / StreamingHBase.scala
Created June 4, 2015 10:35
Spark Streaming with HBase
val push_hbase = aggregatedStream.transform(rdd => {
val hbaseTableName = "global_aggregate"
val hbaseColumnName = "aggregate"
//Creates the HBase confs
val hconf = HBaseConfiguration.create()
hconf.set("hbase.zookeeper.quorum", "sigmoid-machine1,sigmoid-machine2,sigmoid-machine3,sigmoid-machine4")
hconf.set("hbase.zookeeper.property.clientPort", "2181")
@akhld
akhld / TestMain.java
Created June 4, 2015 07:31
Spark Streaming Listener Example
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.scheduler.*;
@akhld
akhld / checkpointed-data-not-found.log
Created April 17, 2015 09:37
checkpointed-data-not-found
INFO : WriteAheadLogManager for ReceivedBlockHandlerMaster - Attempting to clear 0 old log files in hdfs://spark-akhil-master:9000/checkpointed/receivedBlockMetadata older than 1429262834000:
INFO : WriteAheadLogManager for ReceivedBlockHandlerMaster - Cleared log files in hdfs://spark-akhil-master:9000/checkpointed/receivedBlockMetadata older than 1429262834000
[Stage 10:> (0 + 2) / 2]INFO : WriteAheadLogManager for ReceivedBlockHandlerMaster - Attempting to clear 0 old log files in hdfs://spark-akhil-master:9000/checkpointed/receivedBlockMetadata older than 1429262974000:
INFO : WriteAheadLogManager for ReceivedBlockHandlerMaster - Cleared log files in hdfs://spark-akhil-master:9000/checkpointed/receivedBlockMetadata older than 1429262974000
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 981.0 failed 4 times, most recent failure: Lost task 0.3 in stage 981.0 (TID 1330, spark-akhil-slave1.c.
@akhld
akhld / JacksonParser
Created February 14, 2015 14:46
Jackson Parser
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
/**
* Created by akhld on 14/2/15.
*/
object Parser {
def main(args: Array[String]): Unit ={
@akhld
akhld / SocketBenchmark
Created December 12, 2014 13:12
JavaServerSocket, listens to a port and sends the content of the given file
package com.sigmoidanlytics;
import java.io.*;
import java.net.InetSocketAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.ByteBuffer;
import java.nio.channels.ServerSocketChannel;
import java.nio.channels.SocketChannel;
import java.nio.file.Paths;