Skip to content

Instantly share code, notes, and snippets.

@dankohn
dankohn / starbucks_us_locations.csv
Last active October 14, 2023 16:47
8902 locations of US Starbucks with addresses, latitude, and longitude
We can't make this file beautiful and searchable because it's too large.
-149.8935557,61.21759217,Starbucks - AK - Anchorage 00001,"601 West Street_601 West 5th Avenue_Anchorage, Alaska 99501_907-277-2477"
-149.9054948,61.19533942,Starbucks - AK - Anchorage 00002,"Carrs-Anchorage #1805_1650 W Northern Lights Blvd_Anchorage, Alaska 99503_907-339-0500"
-149.7522,61.2297,Starbucks - AK - Anchorage 00003,"Elmendorf AFB_Bldg 5800 Westover Avenue_Anchorage, Alaska 99506"
-149.8643361,61.19525062,Starbucks - AK - Anchorage 00004,"Fred Meyer - Anchorage #11_1000 E Northern Lights Blvd_Anchorage, Alaska 995084283_907-264-9600"
-149.8379726,61.13751355,Starbucks - AK - Anchorage 00005,"Fred Meyer - Anchorage #656_2300 Abbott Road_Anchorage, Alaska 99507_907-365-2000"
-149.9092788,61.13994658,Starbucks - AK - Anchorage 00006,"Fred Meyer - Anchorage (Dimond) #71_2000 W Dimond Blvd_Anchorage, Alaska 995151400_907-267-6700"
-149.7364877,61.19533265,Starbucks - AK - Anchorage 00007,"Safeway-Anchorage #1817_7731 E Northern Lights Blvd_Anchorage, Alaska 99504_907-331-1700"
-149.8211,61.2156
@jasonrdsouza
jasonrdsouza / combineS3Files.py
Last active June 3, 2023 17:22
Python script to efficiently concatenate S3 files
'''
This script performs efficient concatenation of files stored in S3. Given a
folder, output location, and optional suffix, all files with the given suffix
will be concatenated into one file stored in the output location.
Concatenation is performed within S3 when possible, falling back to local
operations when necessary.
Run `python combineS3Files.py -h` for more info.
'''
@arturmkrtchyan
arturmkrtchyan / get_job_status.sh
Last active October 22, 2024 05:45
Apache Spark Hidden REST API
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
@shyamsalimkumar
shyamsalimkumar / sstable-format-version-numbers.md
Last active March 14, 2023 21:59
Cassandra SSTable Format Version Numbers

Original Source

Finding all sstables not matching version “ib”

find /var/lib/cassandra/data/ -type f | grep -v -- -ib- | grep -v "/snapshots"

The version numbers, to date are:

Version 0

@diegopacheco
diegopacheco / ice.md
Last active April 17, 2019 20:13
How to Install and Run Netflix/ICE on Amazon Linux OS
@maryrosecook
maryrosecook / ...
Last active September 13, 2018 18:17
Reminders to myself to help me get better at programming. I don't always manage to do these things, but I try. Please feel free to add your own reminders to yourself in the comments below!
We couldn’t find that file to show.
@mrtns
mrtns / README.md
Last active August 12, 2019 06:31
Reading and Writing Event Streams to S3
@snowindy
snowindy / spark-create-rdd-from-s3-parallel.scala
Last active September 24, 2020 12:25
This code allows parallel loading of data from S3 to Spark RDD. Support multiple paths to load from. Based on http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219
val s3Paths = "s3://yourbucket/path/to/file1.txt,s3://yourbucket/path/to/directory"
val pageLength = 100
val key = "YOURKEY"
val secret = "YOUR_SECRET"
import com.amazonaws.services.s3._, model._
import com.amazonaws.auth.BasicAWSCredentials
import com.amazonaws.services.s3.model.ObjectListing
import scala.collection.JavaConverters._
import scala.io.Source
@silasdavis
silasdavis / MultipleOutputs.scala
Last active January 18, 2022 07:07
Wrapping OutputFormat to produce multiple outputs with hadoop MultipleOutputs
/**
* This file contains the core idea of wrapping an underlying OutputFormat with an OutputFormat
* with an augmented key that writes to partitions using MultipleOutputs (or something similar)
*/
package model.hadoop
import model.hadoop.HadoopIO.MultipleOutputer
import model.hadoop.HadoopIO.MultipleOutputer._
import org.apache.hadoop.io.{DataInputBuffer, NullWritable}
@greglu
greglu / check-reserved-instances
Created July 3, 2015 07:06
Checks reserved instance utilization of an EC2 account
#!/bin/bash
# Checks active, non-spot, EC2 instances against the list reserved instances
# in order to determine which availability-zone/instance-type combinations
# are paying for on-demand prices (that could benefit from reserved instance
# purchase), or current reservations that aren't being used.
#
# Dependencies:
# * awscli (http://aws.amazon.com/cli/)
# * jq (http://stedolan.github.io/jq/)