Skip to content

Instantly share code, notes, and snippets.

from pyspark.sql import SparkSession
def init_spark():
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
sc = spark.sparkContext
return spark,sc
def main():
spark,sc = init_spark()
nums = sc.parallelize([1,2,3,4])
graph = {
1: {2:1,5:1},
2: {1:1,3:1,5:1},
3: {2:1,4:1},
4: {3:1,5:1,6:1},
5: {1:1,2:1,4:1},
6: {4:1}
}
time = 0
@ijokarumawak
ijokarumawak / 0.README.md
Last active December 13, 2023 17:16
NiFi Example: Load CSV File into Table, the traditional and the new way using Record.

NiFi Example: Load CSV file into RDBMS Table using the traditional way and the new way using Record

Example Data

Ingested to the example flows by GenerateFlowFile:

ID, CITY_NAME, ZIP_CD, STATE_CD
001, CITY_A, 1111, AA
002, CITY_B, 2222, BB
003, CITY_C, 3333, CC
@jhenaoz
jhenaoz / docker-ubuntu-setup.sh
Created August 16, 2017 01:00
docker set up in ubuntu
#!/bin/sh
# This script is from the tutorial of set docker in ubuntu server, further info https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#next-steps
# This usually fail in aws ubuntu linux because is no extra packages in aws linux
sudo apt-get install \
linux-image-extra-$(uname -r) \
linux-image-extra-virtual
sudo apt-get update
# Allow to use apt over https
@jnovack
jnovack / README.md
Last active November 1, 2023 23:07
Proxy SSL Client Certificate through NGINX Load-Balancer

Proxy SSL Client Certificate through NGINX Load-Balancer

The frontend stream proxy_pass can be used for load-balancing without SSL off-loading. All SSL connections will be terminated on the backend and client certificate information can be properly authenticated.

This should be used in cases:

  • you have enough CPU to decrypt SSL on the backend servers
  • you require direct client AUTHENTICATION on the backend servers

Backend

@giuseppebonaccorso
giuseppebonaccorso / twitter_sentiment_analysis_convnet.py
Last active March 16, 2020 19:26
Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Networks
import keras.backend as K
import multiprocessing
import tensorflow as tf
from gensim.models.word2vec import Word2Vec
from keras.callbacks import EarlyStopping
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv1D
@codspire
codspire / making-zeppelin-work-on-windows.md
Last active December 2, 2021 03:54
Making Zeppelin, Spark, pyspark work on Windows

Zeppelin, Spark, PySpark Setup on Windows (10)

I wish running Zeppelin on windows wasn't as hard as it is. Things go haiwire if you already have Spark installed on your computer. Zeppelin's embedded Spark interpreter does not work nicely with existing Spark and you may need to perform below steps (hacks!) to make it work. I am hoping that these will be fixed in newer Zeppelin versions.

If you try to run Zeppelin after extracting the package, you might encounter "The filename, directory name, or volume label syntax is incorrect."

Google search landed me to https://issues.apache.org/jira/browse/ZEPPELIN-1584, this link was helpful but wasn't enough to get Zeppelin working.

Below is what I had to do to make it work on my Windows 10 computer.

@luoq
luoq / vw_hash.md
Last active August 20, 2020 11:34
how feature hash is calculated in vw

feature hashing in vw

base hash function

Murmur32 hash is implemented in uniform_hash in hash.cc. It takes a string and a seed and return uint64_t.

There are two hash modes specified by --hash option.

Strings mode (hashstring in parse_primitives.cc)

@smijar
smijar / TestSSLClientMutualAuth.java
Last active December 22, 2021 06:04
Test SSL Client in java using mutual authentication.
import java.io.File;
import java.io.FileInputStream;
import java.io.StringWriter;
import java.security.KeyStore;
import javax.net.ssl.SSLContext;
import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;