Skip to content

Instantly share code, notes, and snippets.

@praveen-symphony
praveen-symphony / virtualenvwrapper.md
Created February 17, 2017 09:45 — forked from guyhughes/virtualenvwrapper.md
virtualenvwrapper quickstart

virtualenvwrapper has broken dependencies right now on macOS, so install like this:

sudo pip install pbr
sudo pip install --no-deps stevedore
sudo pip install --no-deps virtualenvwrapper

So that the commands mkvirtualenv and workon stay in your shell, you're going to want to do this (if you use zsh or another shell, change the filename here):

#on cluster
thrift /spark/sbin/start-thriftserver.sh --master yarn-client
#ssh tunnel, direct 10000 to unused 8157
ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 [email protected]
#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools
@praveen-symphony
praveen-symphony / hello_analytics_api_v3_10krows_nosampling_ryanpraski_single_csv.py Export more than 10,000 rows & a solution for the sampling limitations of Google Analytics using Python and the Google Analytics API. Includes functionality to pull data from multiple Google Analytics profiles. This version puts all data for all the profiles into a single csv file.
#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# Copyright 2012 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
@praveen-symphony
praveen-symphony / move_github_repos.sh
Created October 27, 2016 02:01 — forked from subelsky/move_github_repos.sh
How to archive old github private projects on Dropbox
# http://stackoverflow.com/questions/1960799/using-gitdropbox-together-effectively/1961515#1961515
export REPONAME=????
take ~/Dropbox/git/$REPONAME.git
git init --bare
cd ~/code/$REPONAME
git remote rm origin
git remote add origin ~/Dropbox/git/$REPONAME.git
git push -u origin master
@praveen-symphony
praveen-symphony / aliases.sh
Created October 27, 2016 01:59 — forked from subelsky/aliases.sh
Useful aliases for Ruby and Rails development and git maintenance
alias a='ack'
alias a?='alias | grep -i'
alias adx='rake db:drop && rake db:create && heroku pg:transfer --from black --to postgres://postgres@localhost/staq_development --confirm staqweb --app staqweb && rails r "User.all.each { |u| u.update_attribute(:password,%q(password)) }" && rake db:test:prepare'
alias b='bundle'
alias bb='bundle install --binstubs=.bundle/bin --path=.bundle/gems && bundle package --all && reload ; sd'
alias bc='bin/console'
alias be='bundle exec'
alias bea='bundle exec annotate'
alias bu='bundle update'
alias bus='bundle update staq_extraction'
@praveen-symphony
praveen-symphony / large_redshift_tables.sql
Created October 27, 2016 01:58 — forked from subelsky/large_redshift_tables.sql
Quick SQL command to find large tables in redshift
-- based on http://stackoverflow.com/questions/21767780/how-to-find-size-of-database-schema-table-in-redshift
SELECT name AS table_name, ROUND((COUNT(*) / 1024.0),2) as "Size in Gigabytes"
FROM stv_blocklist
INNER JOIN
(SELECT DISTINCT id, name FROM stv_tbl_perm) names
ON names.id = stv_blocklist.tbl
GROUP BY name
ORDER BY "Size in Gigabytes" DESC
@praveen-symphony
praveen-symphony / spark-sql_error.log
Last active October 4, 2016 01:02
Inconsistent Hive versions on EMR 5.0.0 Cluster
chgrp: '' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
16/10/04 00:30:44 WARN AvroSerDe: Encountered exception determining schema. Returning signal schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:131)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:112)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:167)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:103)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:161)
@praveen-symphony
praveen-symphony / hive_error_stack_trace.log
Created October 4, 2016 00:19
Hive on Spark on EMR 5.0.0 is not working ?
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1119)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
@praveen-symphony
praveen-symphony / spark_ide.py
Created September 21, 2016 23:54 — forked from bigaidream/spark_ide.py
To enable IDE (PyCharm) syntax support for Apache Spark, adopted from http://www.abisen.com/spark-from-ipython-notebook.html
#!/public/spark-0.9.1/bin/pyspark
import os
import sys
# Set the path for spark installation
# this is the path where you have built spark using sbt/sbt assembly
os.environ['SPARK_HOME'] = "/public/spark-0.9.1"
# os.environ['SPARK_HOME'] = "/home/jie/d2/spark-0.9.1"
# Append to PYTHONPATH so that pyspark could be found