This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Dataset available from http://multimedia-commons.s3-website-us-west-2.amazonaws.com/?prefix=tools/etc/ in Sqlite3 database format ('yfcc100m_dataset.sql' file) | |
SET NAMES utf8; | |
SET time_zone = '+00:00'; | |
SET foreign_key_checks = 0; | |
SET sql_mode = 'NO_AUTO_VALUE_ON_ZERO'; | |
DROP TABLE IF EXISTS `yfcc100m_dataset`; | |
CREATE TABLE `yfcc100m_dataset` ( | |
`photoid` int NOT NULL, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Loading PySpark modules | |
from pyspark.sql import DataFrame | |
from pyspark.sql.types import * | |
#from pyspark.context import SparkContext | |
#from pyspark.sql.session import SparkSession | |
# sc = SparkContext('local') | |
# spark = SparkSession(sc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- 'xxx.stackexchange.com' is the name of the database | |
-- `xxx.stackexchange.com`.badges definition | |
CREATE TABLE `badges` ( | |
`Id` int(11) NOT NULL, | |
`UserId` int(11) NOT NULL, | |
`Name` varchar(30) NOT NULL, | |
`Date` datetime NOT NULL, | |
`Class` int(11) NOT NULL, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Loading the posts | |
LOAD CSV WITH HEADERS FROM 'file:///posts_all_csv.csv' AS row | |
WITH toInteger(row[0]) AS postId, row[5] AS postBody, toInteger(row[3]) AS postScore | |
RETURN count(row); | |
LOAD CSV WITH HEADERS FROM 'file:///posts_all_csv.csv' AS row FIELDTERMINATOR '\t' | |
WITH row[0] AS postId, row[3] AS postScore, row[5] AS postBody | |
MERGE (p:Post {postId: postId}) | |
SET p.postBody = postBody, p.postScore = postScore | |
RETURN p; |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a | |
about | |
above | |
after | |
again | |
against | |
ain | |
all | |
am | |
an |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## run the following commands in BASH | |
start-master.sh | |
# go to http://localhost:8080 and check if the Spark's master service is started | |
start-slave.sh spark://$(hostname):7077 | |
# if the worker's service is started successfully you should be able to see the worker in http://localhost:8080, at the connected worker's section |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## run the following commands in BASH | |
cd # let's get back to your user's home directory | |
wget -c https://www-us.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz # this will download spark | |
tar xvfz spark-2.4.5-bin-hadoop2.7.tgz # this will extract the downloaded file to current directory | |
mv spark-2.4.5-bin-hadoop2.7 spark # renaming the extarcted folder to "spark" | |
# appending the JAVA_HOME and SPARK_HOME environement variables to end of your BASH startup script | |
# we are assuming that our JRE 8 is installed in "/usr/lib/jvm/java-1.8.0-openjdk-amd64" | |
cat >> .bashrc <<'EOF' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## run the following commands in BASH | |
sudo apt-get update -y && sudo apt-get upgrade -y | |
# you may need to enter your password when you use "sudo" before another command | |
# if you are asked a Yes, and No question during the execution of the previous command choose "Yes" | |
sudo apt-get install -y htop nload netcat emacs nano openjdk-8-jdk-headless python-pip python3-pip wget \ | |
curl python-mode scala-mode-el | |
# be patient it can take a while for all the packages to be downloaded and be installed | |
sudo pip install pyspark | |
sudo pip3 install pyspark | |
# again it may take a while for PySpark packages be downloaded for both Python 2 [which is defunct] and Python 3 |