This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pushd . &> /dev/null | |
cd /tmp | |
git clone --depth=1 https://github.com/tmikolov/word2vec | |
cd word2vec | |
sed -i -e 's/malloc.h/stdlib.h/g' *.c | |
make | |
rm *.c* *.txt makefile LICENSE | |
cp * /usr/local/bin | |
popd &> /dev/null |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git /tmp/mecab-ipadic-neologd | |
bash /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -y | |
rm -rf /tmp/mecab-ipadic-neologd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import os | |
import re | |
from encodings.aliases import aliases | |
import nkf | |
import tornado | |
from tornado import httpclient, gen | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"401(K)s": ["Finance", "Investing", "Retirement Investments", "401(K)s"], | |
"Accommodations": ["Travel & Tourism", "Accommodations"], | |
"Accounting & Auditing": ["Finance", "Accounting & Auditing"], | |
"Acne": ["Health", "Health Conditions & Concerns", "Skin Conditions & Skin Health", "Acne"], | |
"Air Travel": ["Travel & Tourism", "Air Travel"], | |
"Airline Tickets, Fares & Flights": ["Travel & Tourism", "Air Travel", "Airline Tickets, Fares & Flights"], | |
"Alternative & Natural Medicine": ["Health", "Health Care Services", "Alternative & Natural Medicine"], | |
"Anti-Aging": ["Beauty & Personal Care", "Anti-Aging"], | |
"Anti-Virus Software": ["Computers", "Software", "Internet Software & Web Goodies", "Network Security Software", "Anti-Virus Software"], |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"Airline": "Airline Industry Services", | |
"American Restaurant": "New American Restaurant", | |
"Amusement Park Ride": "Roller Coaster", | |
"Amusement": "Arcade", | |
"Amusement": "Bingo Hall", | |
"Amusement": "Go Karting", | |
"Amusement": "Laser Tag", | |
"Antiques & Vintage": "Antique Store", | |
"Antiques & Vintage": "Auction House", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Lexical Density | |
http://web.archive.org/web/20110810174351/http://www.unisanet.unisa.edu.au/Resources/la/Readability/Content%20words%20and%20lexical%20density.htm | |
""" | |
from __future__ import division | |
import MeCab | |
CONTENT_WORD_POS = ('名詞', '動詞', '形容詞', '副詞') | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
private static final Pattern PAT_JAPANESE_CHARACTER = Pattern | |
.compile("[\\p{IsHiragana}\\p{IsKatakana}\\p{InCJKUnifiedIdeographs}]"); | |
private static boolean isJapanese(final String token) { | |
return PAT_JAPANESE_CHARACTER.matcher(token).find(); | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz | |
sudo mkdir /usr/local/scala | |
sudo tar xvf scala-*.tgz -C /usr/local/scala | |
echo "export SCALA_HOME=/usr/local/scala/scala-2.11.7" >> ./bashrc | |
echo "export PATH=$SCALA_HOME/bin:$PATH" >> ./bashrc | |
wget ftp://ftp.kddilabs.jp/infosystems/apache/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz | |
tar xf spark-1.4.0-bin-hadoop2.6.tgz | |
sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo yum install byobu -y --enablerepo=epel-testing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mail_address = '' | |
password = '' | |
from selenium import webdriver | |
UA = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0' | |
PHANTOMJS_ARG = {'phantomjs.page.settings.userAgent': UA} | |
driver = webdriver.PhantomJS(desired_capabilities=PHANTOMJS_ARG) | |
url = 'https://www.google.com/accounts/Login?hl=ja&continue=http://www.google.co.jp/' |