Elie A. eliasah

Lead Data Scientist Interested in Recommender Systems and beyond. I'm also a Scala & Spark evangelist. @awesome-spark @kiliba-codebase

110 followers · 84 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

eliasah / aggregateByKeyStatCounter.java

Last active May 30, 2016 14:25 — forked from zero323/aggregateByKey.java

	import org.apache.spark.SparkConf;
	import org.apache.spark.api.java.JavaPairRDD;
	import org.apache.spark.api.java.JavaRDD;
	import org.apache.spark.api.java.JavaSparkContext;
	import org.apache.spark.util.StatCounter;
	import scala.Tuple2;
	import scala.Tuple3;

	import java.util.Arrays;
	import java.util.List;

eliasah / Update remote repo

Created May 27, 2016 20:29 — forked from mandiwise/Update remote repo

Transfer repo from Bitbucket to Github

	// Reference: http://www.blackdogfoundry.com/blog/moving-repository-from-bitbucket-to-github/
	// See also: http://www.paulund.co.uk/change-url-of-git-repository

	$ cd $HOME/Code/repo-directory
	$ git remote rename origin bitbucket
	$ git remote add origin https://github.com/mandiwise/awesome-new-repo.git
	$ git push origin master

	$ git remote rm bitbucket

eliasah / testIndexer.java

Created April 22, 2016 17:46 — forked from anonymous/testIndexer.java

	import java.io.IOException;
	import java.util.ArrayList;
	import java.util.LinkedList;
	import java.util.List;

	import org.apache.lucene.analysis.Analyzer;
	import org.apache.lucene.analysis.en.EnglishAnalyzer;
	import org.apache.lucene.analysis.standard.StandardAnalyzer;
	import org.apache.lucene.analysis.util.CharArraySet;
	import org.apache.lucene.document.Document;

eliasah / PrepareData.scala

Created April 8, 2016 12:14 — forked from oluies/PrepareData.scala

	package com.combient.sparkjob.tedsds

	/**
	* Created by olu on 09/03/16.
	*/

	import org.apache.spark.{SparkContext, SparkConf}
	import org.apache.spark.sql.hive.HiveContext
	import org.apache.spark.sql.expressions.Window
	import org.apache.spark.sql.functions._

eliasah / RunRandomForest2.scala

Created April 8, 2016 12:14 — forked from oluies/RunRandomForest2.scala

	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*

eliasah / null_test.py

Created April 4, 2016 15:59 — forked from opikalo/null_test.py

	#!/usr/bin/env python
	# encoding: utf-8

	# This file lives in tests/project_test.py in the usual disutils structure
	# Remember to set the SPARK_HOME evnironment variable to the path of your spark installation

	import logging
	import sys
	import unittest

eliasah / gist:adeacd2537640d733fb1

Created October 17, 2015 15:41 — forked from rezazadeh/gist:5a3bb88d9fdf423dd861

CosineSimilarity DIMSUM Example

	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*

eliasah / PercolatorTests.java

Created August 14, 2015 14:57

	/*
	* Licensed to Elastic Search and Shay Banon under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. Elastic Search licenses this
	* file to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0

eliasah / Setup.md

Last active August 29, 2015 14:20 — forked from xrstf/setup.md

Nutch 2.3 crawler + HBase 0.94 + Elasticsearch 1.4.2

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

Nutch - the crawler (fetches and parses websites)
HBase - filesystem storage for Nutch (Hadoop component, basically)

eliasah / LDA_SparkDocs

Last active August 29, 2015 14:17 — forked from jkbradley/LDA_SparkDocs

	/*
	This example uses Scala. Please see the MLlib documentation for a Java example.

	Try running this code in the Spark shell. It may produce different topics each time (since LDA includes some randomization), but it should give topics similar to those listed above.

	This example is paired with a blog post on LDA in Spark: http://databricks.com/blog
	Spark: http://spark.apache.org/
	*/

	import scala.collection.mutable

Newer Older