Skip to content

Instantly share code, notes, and snippets.

View vaquarkhan's full-sized avatar
:octocat:
while( !(succeed=try())){}

Vaquar Khan vaquarkhan

:octocat:
while( !(succeed=try())){}
View GitHub Profile
@vaquarkhan
vaquarkhan / 00.graphx.scala
Created May 22, 2017 03:42 — forked from ceteri/00.graphx.scala
Spark GraphX demo
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
case class Peep(name: String, age: Int)
val vertexArray = Array(
(1L, Peep("Kim", 23)),
(2L, Peep("Pat", 31)),
(3L, Peep("Chris", 52)),
(4L, Peep("Kelly", 39)),
@vaquarkhan
vaquarkhan / log.scala
Created May 22, 2017 03:43 — forked from ceteri/log.scala
Intro to Apache Spark: code example for RDD animation
// load error messages from a log into memory
// then interactively search for various patterns
// base RDD
val lines = sc.textFile("log.txt")
// transformed RDDs
val errors = lines.filter(_.startsWith("ERROR"))
val messages = errors.map(_.split("\t")).map(r => r(1))
messages.cache()
@vaquarkhan
vaquarkhan / boto3_emr_create_cluster_with_wordcount_step.py
Created June 9, 2019 15:41 — forked from ruanbekker/boto3_emr_create_cluster_with_wordcount_step.py
Create EMR Cluster with a Wordcount Job as a Step in Boto3
import boto3
client = boto3.client(
'emr',
region_name='eu-west-1'
)
cmd = "hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount file:///etc/services /output"
emrcluster = client.run_job_flow(
@vaquarkhan
vaquarkhan / introrx.md
Created July 3, 2019 13:23 — forked from staltz/introrx.md
The introduction to Reactive Programming you've been missing
@vaquarkhan
vaquarkhan / aws_glue_boto3_example.md
Created August 26, 2019 05:50 — forked from ejlp12/aws_glue_boto3_example.md
AWS Glue Create Crawler, Run Crawler and update Table to use "org.apache.hadoop.hive.serde2.OpenCSVSerde"
import boto3

client = boto3.client('glue')

response = client.create_crawler(
    Name='SalesCSVCrawler',
    Role='AWSGlueServiceRoleDefault',
    DatabaseName='sales-cvs',
    Description='Crawler for generated Sales schema',
@vaquarkhan
vaquarkhan / DemoApplication.java
Created July 4, 2020 20:59 — forked from MrBW/DemoApplication.java
chaos-monkey-pivotal-test
package com.issue.chaos.monkey.demo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
@vaquarkhan
vaquarkhan / MNIST_Keras2DML.py
Created July 12, 2020 02:18 — forked from NiloyPurkait/MNIST_Keras2DML.py
An example of using Apache SparkML to train a convolutional neural network in parallel using the MNIST dataset, on IBM watson studio. Written for medium article: https://medium.com/@niloypurkait/how-to-train-your-neural-networks-in-parallel-with-keras-and-apache-spark-ea8a3f48cae6
################################### Keras2DML: Parallely training neural network with SystemML#######################################
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Input, Dense, Conv1D, Conv2D, MaxPooling2D, Dropout,Flatten
from keras import backend as K
from keras.models import Model
import numpy as np
import matplotlib.pyplot as plt
@vaquarkhan
vaquarkhan / cassandra-notes.md
Created July 23, 2020 20:52 — forked from gavinmh/cassandra-notes.md
Cassandra Notes

Cassandra Notes

Introduction

Apache Cassandra is an open source, distributed database management system. Cassandra is designed to handle large amounts of data across many commodity servers. Cassandra uses a query language named CQL.

Cassandra's data model is a partitioned row store; Cassandra combines elements of key-value stores and tabular/columnar databases. Like a relational database, Cassandra stores data in tables, called column families, that have defined columns and associated data types. Each row in a column family is uniquely identified by a key. Each row has multiple columns, each of which has a timestamp, name, and value. Unlike a relational database, each row in a column family does not need to have the same set of columns. At any time, a column may be added to one or more rows. If this explanation is unclear, you might think of column families instead as sets of key-value pairs, in which the values are nested sets of key-value pairs.

The following depicts two rows of a column-family fro