habedi’s gists

habedi / tikz_or_pgf_to_pdf.tex

Created June 24, 2022 06:47

	\RequirePackage{tikz}


	\documentclass[varwidth]{standalone}

	\usepackage{import}
	\usepackage{pgfplots}

	\usepgfplotslibrary{groupplots}
	\pgfplotsset{compat=newest}

habedi / yfcc100m_dataset_schema.sql

Last active June 24, 2022 06:54

Schema for the table that can contain the data from YFCC100m dataset. The schema is compatile with a MySQL or a MariaDB table.

	-- Dataset available from http://multimedia-commons.s3-website-us-west-2.amazonaws.com/?prefix=tools/etc/ in Sqlite3 database format ('yfcc100m_dataset.sql' file)

	SET NAMES utf8;
	SET time_zone = '+00:00';
	SET foreign_key_checks = 0;
	SET sql_mode = 'NO_AUTO_VALUE_ON_ZERO';

	DROP TABLE IF EXISTS `yfcc100m_dataset`;
	CREATE TABLE `yfcc100m_dataset` (
	`photoid` int NOT NULL,

habedi / MyNotebook.py

Created January 17, 2022 09:38

Example code for loading a CSV file as a DF in Databricks Community Edition and saving it as a table

	# Loading PySpark modules

	from pyspark.sql import DataFrame
	from pyspark.sql.types import *

	#from pyspark.context import SparkContext
	#from pyspark.sql.session import SparkSession

	# sc = SparkContext('local')
	# spark = SparkSession(sc)

habedi / StackExchange DB schema.sql

Last active January 1, 2022 15:28

Schemas for some of the tables in the StackExchnages database dumps (available here: https://archive.org/download/stackexchange); schemas work with MySQL and MariaDB

	-- 'xxx.stackexchange.com' is the name of the database

	-- `xxx.stackexchange.com`.badges definition

	CREATE TABLE `badges` (
	`Id` int(11) NOT NULL,
	`UserId` int(11) NOT NULL,
	`Name` varchar(30) NOT NULL,
	`Date` datetime NOT NULL,
	`Class` int(11) NOT NULL,

habedi / load_data_to_neo4j.cyp

Last active September 2, 2024 16:01

A script with the commands to load CSV files into a Neo4j graph database. The data is from StackExchange data dump (https://archive.org/details/stackexchange) #cypher #neo4j #stackoverflow_data #csv #graphdb

	// Loading the posts
	LOAD CSV WITH HEADERS FROM 'file:///posts_all_csv.csv' AS row
	WITH toInteger(row[0]) AS postId, row[5] AS postBody, toInteger(row[3]) AS postScore
	RETURN count(row);

	LOAD CSV WITH HEADERS FROM 'file:///posts_all_csv.csv' AS row FIELDTERMINATOR '\t'
	WITH row[0] AS postId, row[3] AS postScore, row[5] AS postBody
	MERGE (p:Post {postId: postId})
	SET p.postBody = postBody, p.postScore = postScore
	RETURN p;

habedi / pyspark-helloworld-app-graphframes.ipynb

Last active June 17, 2022 07:40

PySpark HelloWorld App + GraphFrames

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

habedi / pyspark_helloworld-app.ipynb

Last active February 24, 2021 12:25

PySpark HelloWorld App

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

habedi / big_list_of_english_stopwords

Created January 31, 2021 10:47

A large list of English stopwords (original source: https://gist.github.com/sebleier/554280)

	a
	about
	above
	after
	again
	against
	ain
	all
	am
	an

habedi / start_single_worker_spark_cluster.sh

Created February 12, 2020 21:50

Starting Spark cluster with minimum requirements

	## run the following commands in BASH

	start-master.sh
	# go to http://localhost:8080 and check if the Spark's master service is started

	start-slave.sh spark://$(hostname):7077
	# if the worker's service is started successfully you should be able to see the worker in http://localhost:8080, at the connected worker's section

habedi / get_spark.sh

Last active February 12, 2020 21:18

Simple commands to download and extract Apache Spark's pre-built binaries from its websites

	## run the following commands in BASH
	cd # let's get back to your user's home directory
	wget -c https://www-us.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz # this will download spark
	tar xvfz spark-2.4.5-bin-hadoop2.7.tgz # this will extract the downloaded file to current directory
	mv spark-2.4.5-bin-hadoop2.7 spark # renaming the extarcted folder to "spark"

	# appending the JAVA_HOME and SPARK_HOME environement variables to end of your BASH startup script
	# we are assuming that our JRE 8 is installed in "/usr/lib/jvm/java-1.8.0-openjdk-amd64"
	cat >> .bashrc <<'EOF'

Hassan Abedi habedi