Skip to content

Instantly share code, notes, and snippets.

View jeongho's full-sized avatar

Jeongho Park jeongho

  • Deception Island, Antarctica
View GitHub Profile
@jeongho
jeongho / graphite_collectd.txt
Created February 4, 2016 18:08
graphite+collectd on Centos
yum list installed | grep -i "graphite\|carbon\|whisper"
graphite-web.noarch 0.9.12-5.el6 @epel
graphite-web-selinux.noarch 0.9.12-5.el6 @epel
python-carbon.noarch 0.9.12-3.el6.1 @epel
python-whisper.noarch 0.9.12-1.el6 @epel
Graphite Install
1. Install dependencies
ansible-playbook -i hosts update_yum.yml
@jeongho
jeongho / run_testdfsio.sh
Created February 4, 2016 18:04
Hadoop benchmark 3. run testdfsio
#!/bin/bash
# TestDFS will be performed with the total file size of 1TB using different dfs.block.size variations.
# Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
#
# The test is designed with two variables
# 1) file_sizes_mb: file size variation with 1GB file x 1,000 = 1TB and 100MB file x 10,000 = 1TB
# this is to test large file and small file impact on HDFS
# 2) dfs.block.size (MB) variation: 512, 256, 128, 50 10
# this is to test impact of different block sizes.
#
@jeongho
jeongho / run_terasort.sh
Created February 4, 2016 18:04
Hadoop benchmark 2. run terasort
#!/bin/bash
# terasort benchmark
# Usage: hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>
#
# command to run nohup
# nohup bash ./run_terasort.sh > terasort.out 2>&1 &
# sudo -u hdfs nohup bash /tmp/run_terasort.sh > /tmp/terasort.out 2>&1 &
hadoop_jar=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
# TeraGen: 1TB = 1,000,000,000,000 = 1e12 BYTE = 100 BYTE * 1e10
@jeongho
jeongho / run_pi_job.sh
Created February 4, 2016 18:03
Hadoop benchmark 1. run pi job
#!/bin/bash
# mapreduce pi calculation to validate hadoop cluster setup
#
# command to run nohub
# nohup bash ./run_pi_job.sh > pi_job.out 2>&1 &
# sudo -u hdfs nohup bash /tmp/run_pi_job.sh > /tmp/pi_job.out 2>&1 &
#parcel
hadoop_jar=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
@jeongho
jeongho / local_ntp_setup.txt
Last active September 5, 2024 11:32
Local NTP server setup
ntp ref:
------------------------------
http://serverfault.com/questions/204082/using-ntp-to-sync-a-group-of-linux-servers-to-a-common-time-source/204138#204138
http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm
http://askubuntu.com/questions/14558/how-do-i-setup-a-local-ntp-server
http://www.thegeekstuff.com/2014/06/linux-ntp-server-client/
http://www.linuxsolutions.org/faqs/generic/ntpserver
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s1-Understanding_the_ntpd_Configuration_File.html
------------------------------
@jeongho
jeongho / empty_avro_from_schema.sh
Last active August 9, 2021 12:49
Create an empty avro file from avro schema - Pig doesn't like an empty directory
#1. create a sample avro schema
cat > example.avsc << EOF
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
@jeongho
jeongho / pom.xml
Created June 4, 2015 23:22
maven shade plugin example
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
@jeongho
jeongho / impala_start.sh
Last active February 25, 2016 17:51
impala start - CM API example
#!/usr/bin/env bash
# To enable debugging. Change debug to 1. This will not delete the temporary hosts file
debug=0
## User defined arguments
user="admin"
pass="admin"
## Hostname of the CM instance here:
scm="http://test-1.wonderland.com:7180/api/v6"
## Cluster name here. Replace spaces w/ %20 to comply w/ HTTP rules
@jeongho
jeongho / kerberos_kadmin_hack.txt
Last active February 25, 2016 17:56
modify kdc db max_renewable_life
-----
for p in `kadmin.local -q listprincs` ; do kadmin.local -q "modprinc -maxrenewlife 1000days $p" ; done
-----
kadmin.local -q "getprincs" > principals.txt
vi principals.txt
reemove the non-Hadoop principals from the principals.txt file, and then run this small script to update the existing principals:
for princ in `cat principals.txt`; do kadmin.local -q "modprinc -maxrenewlife 7day $princ"; done;
service krb5kdc restart
@jeongho
jeongho / hadoop-benchmark
Last active July 18, 2016 06:47
Hadoop benchmark
http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
## MR pi
https://gist.github.com/jeongho/371aaed47ab462d79851
## Terasort
https://gist.github.com/jeongho/3b8c028f5e8409c3a10a
## TestDFSIO