Skip to content

Instantly share code, notes, and snippets.

View sankars's full-sized avatar

Sankar sankars

  • Dubai
View GitHub Profile
@sankars
sankars / hdfs shell.sh
Last active October 8, 2015 17:42
HDFS common commands
## run hdfs balancer
sudo -u hdfs hdfs balancer
## hdfs health check
sudo -u hdfs hdfs fsck
## restore deleted files. For hive user
hdfs mv /User/hive/.Trash/source /target
@sankars
sankars / pdsh.sh
Created September 29, 2015 11:18
Install PDSH and configure.
# https://linuxcluster.wordpress.com/2013/07/29/installing-pdsh-to-issue-commands-to-a-group-of-nodes-in-parallel-in-centos/
yum install pdsh
vim /etc/profile.d/pdsh.sh
## inside pdsh.sh
export PDSH_RCMD_TYPE='ssh'
@sankars
sankars / Nutch Hadoop Integration Notes
Last active August 29, 2015 14:21
Integration Steps
Install Hadoop on all machines
Dowload Nutch source and extract it
Modify Nutch-site.xml to set nutch http agent & robots agent config
Copy all hadoop xml configs & hadoop-env.sh to Nutch's conf directory
Build nutch using ant
@sankars
sankars / jdk_install.sh
Last active October 23, 2015 09:53
shell script for JDK install
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jdk-7u75-linux-x64.tar.gz"
tar xzf jdk-7u75-linux-x64.tar.gz
alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_75/bin/java 2
alternatives --config java
@sankars
sankars / io.py
Created July 23, 2014 16:00
File manipulation in pyhton
import os
def main():
print os.getcwd()
print os.path.abspath('..')
print os.path.basename(os.getcwd())
print os.path.dirname(os.path.realpath(__file__))
@sankars
sankars / date.py
Created July 23, 2014 15:56
Date related operations in python
import datetime
def main():
## Parse string to date
some_date = '2012-05-30 23:00:00'
date1 = datetime.datetime.strptime(some_date, '%Y-%m-%d %H:%M:%S')
print ('date1 => ' + str(date1))
@sankars
sankars / networking.sh
Created June 20, 2014 11:15
Command used to find network related information
## From http://stackoverflow.com/questions/657482/how-to-find-host-name-from-ip-with-out-login-to-the-host
nslookup hostname
nslookup ipaddr
## To find the process using a port
import sys
def main():
line = sys.stdin.readline()
printFibonacci(line)
def printFibonacci(memberCount):
def main():
print('hello from python')
## 2 ways to iterate a list
movies = ['spi', 23, "pos"]
## way 1
@sankars
sankars / Hive-HDFS.sql
Created May 20, 2014 06:49
Hive Commands that uses HDFS for storage
-- http://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv
INSERT OVERWRITE LOCAL DIRECTORY '/target/directory/' select books from table; -- writes to local file system
INSERT OVERWRITE DIRECTORY '/target/directory/' select books from table; -- writes to HDFS. column seperator -> ^A row -> /n
hive -e 'select books from table' > /target/directory/file.tsv -- to local file system