Skip to content

Instantly share code, notes, and snippets.

View szeitlin's full-sized avatar

Sam Zeitlin szeitlin

View GitHub Profile
@nguyendv
nguyendv / boto3_tutorial.py
Created June 30, 2017 20:36
Boto3 tutorial: create a vpc, a security group, a subnet, an instance on that subnet, then make that instance 'pingable' from Internet
import boto3
# http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#service-resource
ec2 = boto3.resource('ec2', aws_access_key_id='AWS_ACCESS_KEY_ID',
aws_secret_access_key='AWS_SECRET_ACCESS_KEY',
region_name='us-west-2')
# create VPC
vpc = ec2.create_vpc(CidrBlock='192.168.0.0/16')
@graphadvantage
graphadvantage / neo4j-kakfa-demo.md
Last active February 25, 2025 14:54
Neo4j GraphGist: Enterprise Architectures - Real-time Neo4j Graph Updates using Kafka Messaging

##Neo4j GraphGist - Enterprise Architectures: Real-time Graph Updates using Kafka Messaging

Neo4j Use Case: Low Latency Graph Analytics & OLTP - Update 1M Nodes in 90 secs with Kafka and Neo4j Bolt

Introduction

A recent Neo4j whitepaper describes how Monsanto is performing real-time updates on a 600M node Neo4j graph using Kafka to consume data extracted from a large Oracle Exadata instance.

This modern data architecture combines a fast, scalable messaging platform (Kafka) for low latency data provisioning and an enterprise graph database (Neo4j) for high performance, in-memory analytics & OLTP - creating new and powerful real-time graph analytics capabilities for your enterprise applications.

@gene1wood
gene1wood / role_arn_to_session.py
Created December 29, 2016 17:38
Simple python function to assume an AWS IAM Role from a role ARN and return a boto3 session object
import boto3
def role_arn_to_session(**args):
"""
Usage :
session = role_arn_to_session(
RoleArn='arn:aws:iam::012345678901:role/example-role',
RoleSessionName='ExampleSessionName')
client = session.client('sqs')
"""
@pratos
pratos / condaenv.txt
Created November 30, 2016 07:01
To package a conda environment (Requirement.txt and virtual environment)
# For Windows users# Note: <> denotes changes to be made
#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>
#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment
conda list -e > requirements.txt #Save all the info about packages to your folder
@roycoding
roycoding / Intro to Neural Networks.ipynb
Created November 16, 2016 21:51
Neural Network in Python 3
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@joelgrus
joelgrus / counting.py
Last active November 27, 2016 16:46
choose your collections wisely
"""
how to count as fast as possible
(numbers from Python 3.5.2 on a Macbook Pro)
YMMV, but these results are pretty stable for me, say +/- 0.1s on repeated runs
"""
from collections import Counter, defaultdict
import random
random_numbers = [random.randrange(10000) for _ in range(10000000)]
@smartnose
smartnose / spark-internals-through-code.md
Last active October 29, 2024 06:03
Spark internal notes

Spark internals through code

Nothing gives you more detail about spark internals than actually reading it source code. In addition, you get to learn many design techniques and improve your scala coding skills. These are the random notes I make while reading the spark code. The best way to comprehend the notes is to load spark code into an IDE, e.g. IntelliJ, and navigate the code on the side.

Genesis - creation of a spark cluster

The scripts for creating a spark cluster are: start-master.sh and start-slave.sh. Read them carefully, and you can see that both scripts are very similar except the values for $CLASS variable. For start-master.sh, the value is CLASS="org.apache.spark.deploy.master.Master", while the value for start-slave.sh is shown below with more context.

# NOTE: This exact class name is matched downstream by SparkSubmit.
[core]
# The home folder for airflow, default is ~/airflow
airflow_home = /Users/p1nox/airflow
# The folder where your airflow pipelines live, most likely a
# subfolder in a code repository
dags_folder = /Users/p1nox/airflow/dags
# The folder where airflow should store its log files. This location
base_log_folder = /Users/p1nox/airflow/logs
@trestletech
trestletech / instance-types.sh
Created June 15, 2016 16:41
Get all EC2 Instance Types in All Availability Zones
#!/bin/bash
echo "Getting list of Availability Zones"
all_regions=$(aws ec2 describe-regions --output text --query 'Regions[*].[RegionName]' | sort)
all_az=()
while read -r region; do
az_per_region=$(aws ec2 describe-availability-zones --region $region --query 'AvailabilityZones[*].[ZoneName]' --output text | sort)
while read -r az; do
@DaniSancas
DaniSancas / neo4j_cypher_cheatsheet.md
Created June 14, 2016 23:52
Neo4j's Cypher queries cheatsheet

Neo4j Tutorial

Fundamentals

Store any kind of data using the following graph concepts:

  • Node: Graph data records
  • Relationship: Connect nodes (has direction and a type)
  • Property: Stores data in key-value pair in nodes and relationships
  • Label: Groups nodes and relationships (optional)