Skip to content

Instantly share code, notes, and snippets.

View wavescholar's full-sized avatar
🎯
Focusing

Bruce Campbell wavescholar

🎯
Focusing
  • These are projects that I have developed out of academic interest or as part of various consulting roles.
  • Columbus, Ohio
View GitHub Profile
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[2] --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10
val lines = sc.textFile("README.md")
val rdd2 = sc.textFile("hdfs:///some/path.txt")
Spark UI
http://10.22.7.183:4040/jobs/
@wavescholar
wavescholar / gist:39e0fa6da94d953a9cda
Last active August 29, 2015 14:16
BytesWriteable snippet
Writable key = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getKeyClass(), conf);
Writable value = (Writable) ReflectionUtils.newInstance(
sequenceFileReader.getValueClass(), conf);
boolean next = true;
while (next) {
try {
if (sequenceFileReader.next(key, value)) {
next = true;
} else {
@wavescholar
wavescholar / gist:bd185d798e94a025446a
Last active August 29, 2015 14:15
HDFS Compression Code Snippet
//Other options for typeare record and block. I'n not sure that block will work with any other codec than bzip2
CompressionType compressionType= CompressionType.NONE;
compressionCodecEnum {gzip, bzip2,none};
compressionCodecEnum compressionCodecType = compressionCodecEnum.bzip2;
if( compressionCodecType==compressionCodecEnum.bzip2)
{
org.apache.hadoop.io.SequenceFile.Writer.Option compressionClass = SequenceFile.Writer.valueClass(GzipCodec.class);
CompressionCodec Codec = new BZip2Codec();
org.apache.hadoop.io.SequenceFile.Writer.Option optCom = SequenceFile.Writer.compression(CompressionType.BLOCK, Codec);
Maven Eclipse Setup
run this in the directory
mvn archetype:generate \
-DarchetypeGroupId=org.apache.maven.archetypes \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DgroupId=com.bcampbell.hadoopproject \
-DartifactId=wordcount
Many things will be downloaded from cloudera.
@wavescholar
wavescholar / gist:1d7527840f386f567a40
Last active August 29, 2015 14:06
Maven Pom file for generating eclipse workspace - CDH5 Fedora 20
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bcampbell.hadoopproject</groupId>
<artifactId>wordcount</artifactId>
<version>0.0.1</version>
<packaging>jar</packaging>
<name>wordcount</name>
@wavescholar
wavescholar / gist:088fac6a275a3e44fb80
Created September 7, 2014 20:18
running the hadoop grep example
hadoop fs -put /etc/hadoop/conf/*.xml input
[bcampbell@localhost ~]$ hadoop fs -ls input
Found 7 items
-rw-r--r-- 1 bcampbell supergroup 507105 2014-09-07 15:55 input/Milton_ParadiseLost.txt
-rw-r--r-- 1 bcampbell supergroup 246679 2014-09-07 15:55 input/WilliamYeats.txt
-rw-r--r-- 1 bcampbell supergroup 2133 2014-09-07 15:58 input/core-site.xml
-rw-r--r-- 1 bcampbell supergroup 2324 2014-09-07 15:58 input/hdfs-site.xml
-rw-r--r-- 1 bcampbell supergroup 246679 2014-09-07 15:56 input/inputWC
-rw-r--r-- 1 bcampbell supergroup 1549 2014-09-07 15:58 input/mapred-site.xml
-rw-r--r-- 1 bcampbell supergroup 2375 2014-09-07 15:58 input/yarn-site.xml
@wavescholar
wavescholar / gist:5e1b9f98baae2c95278c
Last active August 29, 2015 14:06
CDH5 On Fedora 20 -
First This
https://gist.github.com/wavescholar/6cc708de5f9bea623c86
Get The RPM
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Quick-Start/cdh5qs_yarn_pseudo.html
sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
@wavescholar
wavescholar / gist:6cc708de5f9bea623c86
Created September 6, 2014 23:53
Fedora Setup on clean install
yum update
yum groupinstall "Books and Guides" "C Development Tools and Libraries" "Development Tools" "Fedora Eclipse" "System Tools" "Editors"
rpm -ivh jdk-7u67-linux-x64.rpm
#as su
alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_67/jre/bin/java 2
alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_67/jre/bin/javaws 2
alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_67/bin/javac 2
alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_67/bin/jar 2
@wavescholar
wavescholar / gist:5f1cb490f9439ede7bc1
Created September 6, 2014 22:21
java alternatives setup Oracle on top of OpenJDK
--------------------------
Swap between OpenJDK and Sun/Oracle Java JDK/JRE
alternatives --config java
alternatives --config javaws
alternatives --config libjavaplugin.so
alternatives --config libjavaplugin.so.x86_64
alternatives --config javac
Post-Installation Setup
@wavescholar
wavescholar / gist:3352c66c08dda3f1722c
Created September 6, 2014 22:21
Linux Directories
http://www.tecmint.com/linux-directory-structure-and-important-files-paths-explained/
/bin : All the executable binary programs (file) required during booting, repairing, files required to run into single-user-mode, and other important, basic commands viz., cat, du, df, tar, rpm, wc, history, etc.
/boot : Holds important files during boot-up process, including Linux Kernel.
/dev : Contains device files for all the hardware devices on the machine e.g., cdrom, cpu, etc
/etc : Contains Application’s configuration files, startup, shutdown, start, stop script for every individual program.
/home : Home directory of the users. Every time a new user is created, a directory in the name of user is created within home directory which contains other directories like Desktop, Downloads, Documents, etc.
/lib : The Lib directory contains kernel modules and shared library images required to boot the system and run commands in root file system.
/lost+found : This Directory is installed during installation of Linux, useful fo