Umberto Griffo umbertogriffo

🫵

How dare you nitpicking my PR

I'm a Software Engineer in continuous learning about Machine Learning, Data Engineering, and Software Design.

187 followers · 24 following

@promaton
Lisbon, Portugal

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

umbertogriffo / TwitterSentimentAnalysisAndN-gramWithHadoopAndHiveSQL.md

Last active May 11, 2021 13:22

Step by step Tutorial on Twitter Sentiment Analysis and n-gram with Hadoop and Hive SQL

PREREQUISITES

* Download JSON Serde at:
* http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
* and to renominate it as hive-serdes-1.0.jar

Add Jar to HIVE_AUX_JARS_PATH of HiveServer2:
1. Copy the JAR files to the host on which HiveServer2 is running. Save the JARs to any directory you choose, and make a note of the path (create directory in /usr/share/).

umbertogriffo / HBaseBackup.rb

Last active March 24, 2023 15:01

This code takes a snapshot of all HBase tables, using the snapshot command (No file copies are performed). Tested on CDH-5.4.4-1

	# Checking if the hbase.snapshot.enabled property in hbase-site.xml is set to true
	# To execute script launch this command on shell: hbase shell HBaseBackup.rb

	@clusterToSave = "hdfs:///srv2:8082/hbase"
	# CHECK THE PATH OF HBase lib
	@libjars = `ls /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/*.jar \| tr "\n" ","`
	@ignore = [ /zipkin\../i, /._temp/i, /.tmp/i, /test_./i, /._test/i, /._old/i ]
	@mappers = "2"

	include Java

umbertogriffo / HBaseRestore.rb

Created February 19, 2016 15:04

This code restore the snapshots of all HBase tables saved using the script HBaseBackup.rb (https://gist.github.com/umbertogriffo/fe1bce24f8e9ee68c75f). Tested on CDH-5.4.4-1

	# To execute script launch this command on shell: hbase shell HBaseRestore.rb

	include Java

	java_import org.apache.hadoop.hbase.HBaseConfiguration
	java_import org.apache.hadoop.hbase.client.HBaseAdmin
	java_import org.apache.hadoop.hbase.snapshot.ExportSnapshot
	java_import org.apache.hadoop.hbase.TableExistsException
	java_import org.apache.hadoop.util.ToolRunner

umbertogriffo / Kmeans Readme.md

Last active March 8, 2024 13:40

Step by step Code Tutorial on implementing a basic k-means in Spark in order to cluster a geo-located devices

DATASET

Download dataset here

CODE

* Follow the well-comented code kmeans.scala

umbertogriffo / ObjectPool.java

Created June 28, 2016 08:01

Generic Java object pool with minimalistic code

	import java.util.Queue;
	import java.util.concurrent.ConcurrentLinkedQueue;
	import java.util.concurrent.Executors;
	import java.util.concurrent.ScheduledExecutorService;
	import java.util.concurrent.TimeUnit;
	import java.util.concurrent.atomic.AtomicInteger;
	/**
	* @param <T>
	*/
	public abstract class ObjectPool<T> {

umbertogriffo / Transpose.scala

Created October 26, 2016 08:05

Utility Methods to Transpose a org.apache.spark.mllib.linalg.distributed.RowMatrix

	def transposeRowMatrix(m: RowMatrix): RowMatrix = {
	val transposedRowsRDD = m.rows.zipWithIndex.map{case (row, rowIndex) => rowToTransposedTriplet(row, rowIndex)}
	.flatMap(x => x) // now we have triplets (newRowIndex, (newColIndex, value))
	.groupByKey
	.sortByKey().map(_._2) // sort rows and remove row indexes
	.map(buildRow) // restore order of elements in each row and remove column indexes
	new RowMatrix(transposedRowsRDD)
	}

	def rowToTransposedTriplet(row: Vector, rowIndex: Long): Array[(Long, (Long, Double))] = {

umbertogriffo / UniqueId.java

Last active March 6, 2023 08:16

Generate Long ID from UUID

	/**
	* Genereate unique ID from UUID in positive space
	* Reference: http://www.gregbugaj.com/?p=587
	* @return long value representing UUID
	*/
	private Long generateUniqueId()
	{
	long val = -1;
	do
	{

umbertogriffo / Method1.java

Last active January 22, 2017 14:21

How to make the method run() of class NoThreadSafe thread-safe in Java

	public class Method1 {
	/*
	Adding synchronized to this method will makes it thread-safe.
	When synchronized is added to a static method, the Class object is the object which is locked.
	*/
	public static void main(String[] args) throws InterruptedException {

	ProcessingThreadS pt = new ProcessingThreadS();

	Thread t1 = new Thread(pt, "t1");

umbertogriffo / DataFrameSuite.scala

Last active February 12, 2020 06:13

DataFrameSuite allows you to check if two DataFrames are equal. You can assert the DataFrames equality using method assertDataFrameEquals. When DataFrames contains doubles or Spark Mllib Vector, you can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals

	package test.com.idlike.junit.df

	import breeze.numerics.abs
	import org.apache.spark.rdd.RDD
	import org.apache.spark.sql.functions.col
	import org.apache.spark.sql.{Column, DataFrame, Row}

	/**
	* Created by Umberto on 06/02/2017.
	*/

umbertogriffo / Winner.java

Created February 15, 2017 09:02

Java 8 Streams Cookbook

	package knowledgebase.java.stream;

	import java.time.Duration;
	import java.util.*;

	import static java.util.stream.Collectors.*;

	/**
	* Created by Umberto on 15/02/2017.
	* https://dzone.com/articles/a-java-8-streams-cookbook

OlderNewer