Skip to content

Instantly share code, notes, and snippets.

View shiumachi's full-sized avatar

Sho Shimauchi shiumachi

View GitHub Profile
@shiumachi
shiumachi / alternatives-hadoop.sh
Created November 28, 2018 06:20
alternatives like script for hadoop
#!/bin/sh
HOME_LIB_DIR=${HOME}/lib
# symlink list
HADOOP_SYMLINK=${HOME_LIB_DIR}/hadoop
HBASE_SYMLINK=${HOME_LIB_DIR}/hbase
ZOOKEEPER_SYMLINK=${HOME_LIB_DIR}/zookeeper
HIVE_SYMLINK=${HOME_LIB_DIR}/hive
PIG_SYMLINK=${HOME_LIB_DIR}/pig
@shiumachi
shiumachi / bloomfilter.py
Created November 26, 2018 07:59
Bloomfilter sample
#!/usr/bin/python
import hashlib
startKey = 2
endKey = 6
inputNum = 1000
testNum = 100000
def check_bl(bloom, a):
aa = hashlib.md5(a).hexdigest()[startKey:endKey]
@shiumachi
shiumachi / myargparse.py
Created November 26, 2018 07:58
argparse sample
#!/usr/bin/python
# -*- coding: utf-8 -*-
import argparse
import sys
class MyArgParse(object):
def __init__(self):
pass
def sum(self):
@shiumachi
shiumachi / bootstrap-master.sh
Created April 27, 2018 07:31
Kafka Kudu Demo (WIP)
#!/bin/sh
# logging stdout/stderr
set -x
exec >> /root/bootstrap-master-init.log 2>&1
date
# Master node identifier
touch /root/kafka-kudu-demo_edge-node.flag
@shiumachi
shiumachi / csv_to_parquet.py
Last active December 28, 2018 06:22
日付単位に分けられた複数のCSVファイルを月単位のParquetファイルに変換する
# This script compacts daily based csv files to monthly based parquet file.
# The CSV files should be named like "YYYY-MM-DD.csv" format.
#
# このスクリプトは日付毎のcsvファイルを月毎のparquetファイルに変換します。
# CSVファイルの名前は"YYYY-MM-DD.csv"の形式にしてください。
#
import pandas as pd
import numpy as np
import pyarrow as pa
@shiumachi
shiumachi / cosine_similarities.py
Last active January 5, 2018 17:20
tfidfとbigramによるコサイン類似度
import pandas as pd
import numpy as np
import MeCab
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
m = MeCab.Tagger("-Ochasen")
m2 = MeCab.Tagger("-Owakati")
@shiumachi
shiumachi / get_socks_proxy_command.sh
Created August 15, 2017 04:39
指定したAWSインスタンスに対するSOCKSプロキシを構築するコマンドを取得する
PROFILE=your_profile
INSTANCE_NAME=your_instance_name
SSH_KEYPATH=your_ssh_key_path
PUBLIC_HOSTNAME=`aws --profile ${PROFILE} ec2 describe-instances | jq -r ".Reservations[] | select(.Instances[0].Tags[].Value == \"${INSTANCE_NAME}\") | .Instances[0] | .PublicDnsName"`
echo "establish SOCKS proxy"
echo "ssh -i ${SSH_KEYPATH} -D 8157 -q ec2-user@${PUBLIC_HOSTNAME}"
@shiumachi
shiumachi / get_instance_hostname_and_ipaddress.sh
Created August 15, 2017 04:30
AWS上の特定のインスタンスのホスト名とIPアドレスを取得する
PROFILE=your_profile
INSTANCE_NAME=your_instance_name
aws --profile ${PROFILE} ec2 describe-instances | jq -r ".Reservations[] | select(.Instances[0].Tags[].Value == \"${INSTANCE_NAME}\") | .Instances[0] | {PrivateDnsName: .PrivateDnsName, PrivateIpAddress: .PrivateIpAddress, PublicDnsName: .PublicDnsName, PublicIpAddress: .PublicIpAddress}"
@shiumachi
shiumachi / dirs_compressor.py
Created January 28, 2017 12:29
Compress multiple directories into each archives
# dirs_compressor.py
#
# Usage:
# $ python dirs_compressor.py targed_dir
#
import sys
import os
import os.path
import logging
@shiumachi
shiumachi / tips_python_web_crawler.md
Last active January 4, 2017 01:17
PythonでのWebクローラ作成時に学んだことメモ

例外

try:
except (ConnectionError, ChunkedEncodingError, TooManyRedirects, NewConnectionError) as e:
                logging.warn("Skip URL {} Reason: {}".format(url, e))

SQLite