Borhan Kazimipour borhan-kazimipour

Local Instructions

copy files to a directory: git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount and then cd wordcount.
see the input files: cat *.txt
make sure mapper and reducer are executable chmod +x *.scala
see how mapper works: cat baa.txt | ./mapper.scala
see how reducer works: cat baa.txt | ./mapper.scala | ./reducer.scala

Hadoop Instruction

copy files to a directory: git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount and then cd wordcount.
create a directory on HDFS: hadoop fs -mkdir -p /wc/in

Setting up Your BigVM Instance

You need to do the followings only once when you connect to your BigVM instance for the first time.

Login to MoVE using your authcate and password.
Go to Desktops Tab and select Linux Template. This will take you to an Ubuntu Desktop screen. We call this window MoVE Desktop hereafter.

This desktop is your jump host from which you will be accessing the BigVM server (you cannot connect directly without using MoVE in the middle).

File Transfer

CSV Files: Melbourne water use by postcode: https://www.data.vic.gov.au/data/dataset/38f43f13-d988-419a-8e99-39ff6c41e5f2/resource/00378584-3157-42f7-9370-a8f0e71ecfc1/download/.fileswateruse.csv
- wget link -O file_name
Zip Files: Taxation Statistics 1994-95 to 2008-09: https://data.gov.au/dataset/67265383-0ecc-4523-8ffd-02790297a65a/resource/2f236165-69a4-4d6a-9da6-5b39bfa70737/download/taxstats-1994-95-to-2008-09.zip
1. wget zip_link -O file_name
2. unzip file_name -d folder_name
Gist Snippets: https://gist.github.com/borhan-kazimipour
- As a single file: wget raw_link -O file_name
- All files: git clone https_link folder_name

Jupyter with Scala Support on BigVM

Installation

On Big VM terminal:

After a successful ssh connection to your BigVM instance from a MoVE terminal do the following only once (or just download and run setup-jupyter.sh): 1- Go to Python 3 environment

cd ~/env3

	# ref: https://pydash.readthedocs.io/en/latest/
	import pydash

	# Arrays
	pydash.flatten([1, 2, [3, [4, 5, [6, 7]]]])
	# [1, 2, 3, [4, 5, [6, 7]]]

	pydash.flatten_deep([1, 2, [3, [4, 5, [6, 7]]]])
	# [1, 2, 3, 4, 5, 6, 7]

	import json
	json_file = "file.json"
	with open(json_file, 'r') as read_file:
	data = json.load(read_file)
	with open(json_file+'l', 'w') as write_file:
	for entry in data:
	json.dump(entry, write_file)
	write_file.write('\n')

	The Project Gutenberg EBook of The Adventures of Sherlock Holmes
	by Sir Arthur Conan Doyle
	(#15 in our series by Sir Arthur Conan Doyle)

	Copyright laws are changing all over the world. Be sure to check the
	copyright laws for your country before downloading or redistributing
	this or any other Project Gutenberg eBook.

	This header should be the first thing seen when viewing this Project
	Gutenberg file. Please do not remove it. Do not change or edit the

	#! /bin/bash
	IP="$(hostname -I \| cut -d' ' -f1)"
	THIS_USER="$(whoami)"
	zeppelin-daemon.sh start
	echo
	echo
	echo Please do the following:
	echo
	echo 1- On MoVE Terminal issue:
	echo " ssh -N -f -L localhost:8080:localhost:8080 ${THIS_USER}@${IP}"

	// Use Gists to store code you would like to remember later on
	console.log(window); // log the "window" object to the console