Skip to content

Instantly share code, notes, and snippets.

View yuta-imai's full-sized avatar

Yuta Imai yuta-imai

View GitHub Profile
@yuta-imai
yuta-imai / gist:dba47d581d8637d2b45cc6400f4fa325
Created November 9, 2016 23:23 — forked from sebsto/gist:19b99f1fa1f32cae5d00
Install Maven with Yum on Amazon Linux
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
mvn --version
@yuta-imai
yuta-imai / zeppelin_with_matplotlib.py
Last active October 31, 2017 08:43
Using matplotlib example on top of Apache Zeppelin
%pyspark
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import StringIO
matplotlib.use('Agg')
plt.rcdefaults()
@yuta-imai
yuta-imai / hive_orc_test.hql
Last active August 3, 2016 00:02
Test script for Hive with ORC. It mounts data on S3 which is provided at here: https://amplab.cs.berkeley.edu/benchmark/ as external table. Then we import it from external table to internal table.
CREATE EXTERNAL TABLE rankings_external (
pageURL VARCHAR(300),
pageRank INT,
avgDuration INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS SEQUENCEFILE
LOCATION 's3a://big-data-benchmark/pavlo/sequence/1node/rankings/';
CREATE TABLE rankings (
#!/bin/bash
aws ec2 describe-instances | jq --arg hostname `hostname` '.Reservations[].Instances[]| select(.PrivateDnsName | contains($hostname)) | .PublicDnsName'
#!/bin/bash
AMBARI="hostname:port"
CLUSTER="clustername"
EC2=".ec2-list"
COMPONENTS=".hdp-components"
aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | [.Tags[].Value, .PublicDnsName,.PrivateDnsName] | @tsv' > ${EC2}
curl -u admin:admin ${AMBARI}/api/v1/${CLUSTER}/factory/host_components | jq -r '.items[].HostRoles | [.host_name, .component_name] | @tsv' | perl -e 'my %hosts; for(<>){ chomp $_; my($host, $component) >
require 'aws-sdk-core'
ddb = Aws::DynamoDB::Client.new(region: "ap-northeast-1")
table_name = 'rangetest'
hash_key_str = 'test'
(1...100).each do |i|
ddb.put_item({
table_name: table_name,
var fs = require('fs');
var files = process.argv;
files.shift();
files.shift();
var operations = [];
files.forEach(function(filePath){
fs.readFile(filePath,'utf8',function(err,body){
var AWS = require("../aws-sdk-js");
var counter = 0;
setInterval(function(){
counter++;
var dynamodb = new AWS.DynamoDB({region:"ap-northeast-1"});
dynamodb.getItem({
Key: {key:{S:"test"}},
TableName: "ec2-metadata-test",
ProjectionExpression:"body"
require 'aws-sdk'
require 'json'
stream_name = "stream_handson"
kinesis = Aws::Kinesis::Client.new(region: "ap-northeast-1")
stream = kinesis.describe_stream(stream_name: stream_name)
shard_id_array = stream[:stream_description][:shards].map{|shard|
#!/usr/bin/env python
# [coinlocker]
#
# Copyright (c) 2014 Yuta Imai
#
# This software is released under the MIT License.
#
# http://opensource.org/licenses/mit-license.php