Skip to content

Instantly share code, notes, and snippets.

View emres's full-sized avatar
💭
"The purpose of computing is insight, not numbers." -- Richard W. Hamming

Emre Sevinç emres

💭
"The purpose of computing is insight, not numbers." -- Richard W. Hamming
View GitHub Profile
@mannharleen
mannharleen / spark all file format types and compression codecs.scala
Created September 9, 2017 14:59
Text file, json, csv, sequence, parquet, ORC, Avro, newHadoopAPI
/*
Assume that the following "rdd" exists
val rdd = sc.parallelize(Array((1,1), (0,2), (1,3), (0,4), (1,5), (0,6), (1,7), (0,8), (1,9), (0,10)))
type of rdd -> org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[1]
rdd.collect -> Array[(Int, Int)] = Array((1,1), (0,2), (1,3), (0,4), (1,5), (0,6), (1,7), (0,8), (1,9), (0,10))
*/
import org.apache.spark.{SparkConf,SparkContext}
import org.apache.spark.sql.SQLContext
# Bash best practices and style-guide
Just simple methods to keep the code clean.
Inspired by [progrium/bashstyle](https://github.com/progrium/bashstyle) and [Kfir Lavi post](http://www.kfirlavi.com/blog/2012/11/14/defensive-bash-programming/).
## Quick big rules
* All code goes in a function
* Always double quote variables
@emres
emres / control.sh
Created April 27, 2017 14:17 — forked from randerzander/control.sh
Ambari Service Start/Stop script
USER='admin'
PASS='admin'
CLUSTER='dev'
HOST=$(hostname -f):8080
function start(){
curl -u $USER:$PASS -i -H 'X-Requested-By: ambari' -X PUT -d \
'{"RequestInfo": {"context" :"Start '"$1"' via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' \
http://$HOST/api/v1/clusters/$CLUSTER/services/$1
}
@sebble
sebble / stars.sh
Last active May 12, 2025 16:55
List all starred repositories of a GitHub user.
#!/bin/bash
USER=${1:-sebble}
STARS=$(curl -sI https://api.github.com/users/$USER/starred?per_page=1|egrep '^Link'|egrep -o 'page=[0-9]+'|tail -1|cut -c6-)
PAGES=$((658/100+1))
echo You have $STARS starred repositories.
echo
@hxmuller
hxmuller / list reverse dependencies for Debian package
Created October 21, 2016 10:12
list reverse dependencies for Debian package
# apt-cache - query the APT cache
# rdepends shows a listing of each reverse dependency a package has
# --no-suggests omit suggests dependencies
# --no-conflicts omit conflicts dependencies
# --no-breaks omit breaks dependencies
# --no-replaces omit replaces dependencies
# --no-enhances omit enhances dependencies
# --installed limit the output to packages which are currently installed
# --recurse make recursive so all packages mentioned are printed once
@jasonrdsouza
jasonrdsouza / combineS3Files.py
Last active June 3, 2023 17:22
Python script to efficiently concatenate S3 files
'''
This script performs efficient concatenation of files stored in S3. Given a
folder, output location, and optional suffix, all files with the given suffix
will be concatenated into one file stored in the output location.
Concatenation is performed within S3 when possible, falling back to local
operations when necessary.
Run `python combineS3Files.py -h` for more info.
'''
@ThatRendle
ThatRendle / explanation.md
Last active July 3, 2022 07:56
Why I was previously not a fan of Apache Kafka

Update, September 2016

OK, you can pretty much ignore what I wrote below this update, because it doesn't really apply anymore.

I wrote this over a year ago, and at the time I had spent a couple of weeks trying to get Kafka 0.8 working with .NET and then Node.js with much frustration and very little success. I was rather angry. It keeps getting linked, though, and just popped up on Hacker News, so here's sort of an update, although I haven't used Kafka at all this year so I don't really have any new information.

In the end, we managed to get things working with a Node.js client, although we continued to have problems, both with our code and with managing a Kafka/Zookeeper cluster generally. What made it worse was that I did not then, and do not now, believe that Kafka was the correct solution for that particular problem at that particular company. What they were trying to achieve could have been done more simply with any number of other messaging systems, with a subscriber reading messages off and writing

@wesfloyd
wesfloyd / stormTopologyCheck.py
Last active August 5, 2019 12:22
Script runs constantly at X seconds interval checking to see if topologies have stopped processing new Tuples
# Wes Floyd April 2015
import sys
import requests
import json
import argparse
import pprint
import time
pp = pprint.PrettyPrinter(indent=4)
@randerzander
randerzander / control.sh
Last active February 26, 2025 11:46
Ambari Service Start/Stop script
USER='admin'
PASS='admin'
CLUSTER='dev'
HOST=$(hostname -f):8080
function start(){
curl -u $USER:$PASS -i -H 'X-Requested-By: ambari' -X PUT -d \
'{"RequestInfo": {"context" :"Start '"$1"' via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' \
http://$HOST/api/v1/clusters/$CLUSTER/services/$1
}
@rcalsaverini
rcalsaverini / strip_accents.py
Created August 30, 2014 15:05
Removing accents from unicode strings in python
import unicodedata
def strip_accents(unicode_string):
"""
Strip accents (all combining unicode characters) from a unicode string.
"""
ndf_string = unicodedata.normalize('NFD', unicode_string)
is_not_accent = lambda char: unicodedata.category(char) != 'Mn'
return ''.join(
char for char in ndf_string if is_not_accent(char)