Skip to content

Instantly share code, notes, and snippets.

View bugcy013's full-sized avatar
🪄
Focusing

Dhanasekaran Anbalagan bugcy013

🪄
Focusing
View GitHub Profile
I ran into an issue with importing from SQL Server using Sqoop, where the import/import-all-tables options do not seem to support custom defined schema prefix owners (default is 'dbo', which is not a problem).
This is using the MS SQL Server - Hadoop Connector (sqoop-sqlserver-1.0.tar.gz) found at http://download.microsoft.com. In addition, and you'll find this in the instructions/user guide for the connector, you will need the Microsoft JDBC Driver (sqljdbc_3.0), which will need to be placed into your $SQOOP_HOME/lib directory. This can be downloaded from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21599
All of this is assuming you are running Cloudera's distribution on Ubuntu 11.10 through VMWare Player on Windows 7 64-bit (this is my environment anyway).
Query:
bin/sqoop import --connect 'jdbc:sqlserver://<ip-address>;instanceName=<instance-name>;username=<user-name>;password=<password>;database=<database-name>' --query 'SELECT * FROM [Owner].[prefix].[table-name] WHERE $CONDI
export HADOOP_HOME=/home/hadoop/hadoop
export JAVA_HOME=/usr/java/jdk6-1.6.0
DB_CON="jdbc:mysql://127.0.0.1:9000/hadoop_test?useUnicode=true&amp;characterEncoding=utf8"
DB_USERNAME="scott"
DB_PASSWORD="tiger"
DB_TABLE_NAME="TABLE_NAME"
DB_COLUMNS="COL1, COL2, COL3, COL4"
hbase(main):014:0> add_peer '1', 'localhost:2181:/hbase-2'
0 row(s) in 0.0580 seconds
hbase(main):015:0> start_replication
2011-02-11 18:04:58,347 INFO org.apache.hadoop.hbase.replication.ReplicationZookeeper: Replication is now started
0 row(s) in 0.0500 seconds
hbase(main):016:0> put 'test', '2011-02-11 18:05:22,003 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [10.0.0.57,60020,1297437319991]
2011-02-11 18:05:22,016 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 10.0.0.57,60020,1297437319991
2011-02-11 18:05:22,016 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=10.0.0.57,60020,1297437319991 to dead servers, submitted shutdown handler to be executed, root=true, meta=true
I'm in the midst of trying to wrangle an HBase backup/restore to/from S3 or HDFS
built around export/backup of 1 table at a time
using org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684.
Just a reminder:
Usage: Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
In the psuedo code below:
persistant_store is some kind of non-HBase store in the Cloud that you can just
#!/bin/bash
# Free unused memory
flush_mem () {
sudo sync
echo 3 | sudo tee /proc/sys/vm/drop_caches
}
echo -e "\nMemory usage before purge:\n" && free -m
@bugcy013
bugcy013 / HBase Replication Notes
Last active December 23, 2015 11:38 — forked from larsgeorge/gist:825646
HBase Replication Notes
HBase 1:
506 wget http://apache.easy-webs.de//hbase/hbase-0.90.0/hbase-0.90.0.tar.gz
507 tar -zxvf hbase-0.90.0.tar.gz
508 cd hbase-0.90.0
522 cp -pR conf conf.2
523 vim conf/hbase-site.xml
524 cp conf/hbase-site.xml conf.2/
525 vim conf.2/hbase-site.xml
hbase(main):014:0> add_peer '1', 'localhost:2181:/hbase-2'
0 row(s) in 0.0580 seconds
hbase(main):015:0> start_replication
2011-02-11 18:04:58,347 INFO org.apache.hadoop.hbase.replication.ReplicationZookeeper: Replication is now started
0 row(s) in 0.0500 seconds
hbase(main):016:0> put 'test', '2011-02-11 18:05:22,003 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [10.0.0.57,60020,1297437319991]
2011-02-11 18:05:22,016 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 10.0.0.57,60020,1297437319991
2011-02-11 18:05:22,016 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=10.0.0.57,60020,1297437319991 to dead servers, submitted shutdown handler to be executed, root=true, meta=true
hbase(main):014:0> add_peer '1', 'localhost:2181:/hbase-2'
0 row(s) in 0.0580 seconds
hbase(main):015:0> start_replication
2011-02-11 18:04:58,347 INFO org.apache.hadoop.hbase.replication.ReplicationZookeeper: Replication is now started
0 row(s) in 0.0500 seconds
hbase(main):016:0> put 'test', '2011-02-11 18:05:22,003 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [10.0.0.57,60020,1297437319991]
2011-02-11 18:05:22,016 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for 10.0.0.57,60020,1297437319991
2011-02-11 18:05:22,016 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=10.0.0.57,60020,1297437319991 to dead servers, submitted shutdown handler to be executed, root=true, meta=true
#!/bin/bash
set -o nounset
set -o errexit
if [ $# -lt 1 ]; then
echo "Usage: $0 <User>@<Host>"
echo ""
echo " Copies your id_rsa.pub file to the remote host and adds it to the"
echo " authorized keys."
@bugcy013
bugcy013 / doit
Created October 2, 2013 17:30 — forked from stantonk/doit
#!/bin/bash
# Source: http://toomuchdata.com/2012/06/25/how-to-install-python-2-7-3-on-centos-6-2/
yum groupinstall "Development tools"
yum install zlib-devel
yum install bzip2-devel openssl-devel ncurses-devel
wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2
tar xf Python-2.7.3.tar.bz2
cd Python-2.7.3