Skip to content

Instantly share code, notes, and snippets.

View ikegami-yukino's full-sized avatar

IKEGAMI Yukino ikegami-yukino

View GitHub Profile
@ikegami-yukino
ikegami-yukino / mac_word2vec_install.sh
Last active May 28, 2019 19:41
Install word2vec to Mac OS X later than 10.9
pushd . &> /dev/null
cd /tmp
git clone --depth=1 https://github.com/tmikolov/word2vec
cd word2vec
sed -i -e 's/malloc.h/stdlib.h/g' *.c
make
rm *.c* *.txt makefile LICENSE
cp * /usr/local/bin
popd &> /dev/null
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git /tmp/mecab-ipadic-neologd
bash /tmp/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -y
rm -rf /tmp/mecab-ipadic-neologd
@ikegami-yukino
ikegami-yukino / simple_crawler.py
Last active August 29, 2015 14:26
意識の低い単純クローラー
# -*- coding: utf-8 -*-
import os
import re
from encodings.aliases import aliases
import nkf
import tornado
from tornado import httpclient, gen
{
"401(K)s": ["Finance", "Investing", "Retirement Investments", "401(K)s"],
"Accommodations": ["Travel & Tourism", "Accommodations"],
"Accounting & Auditing": ["Finance", "Accounting & Auditing"],
"Acne": ["Health", "Health Conditions & Concerns", "Skin Conditions & Skin Health", "Acne"],
"Air Travel": ["Travel & Tourism", "Air Travel"],
"Airline Tickets, Fares & Flights": ["Travel & Tourism", "Air Travel", "Airline Tickets, Fares & Flights"],
"Alternative & Natural Medicine": ["Health", "Health Care Services", "Alternative & Natural Medicine"],
"Anti-Aging": ["Beauty & Personal Care", "Anti-Aging"],
"Anti-Virus Software": ["Computers", "Software", "Internet Software & Web Goodies", "Network Security Software", "Anti-Virus Software"],
@ikegami-yukino
ikegami-yukino / fb_categories.json
Created August 4, 2015 07:43
Facebook page category list
{
"Airline": "Airline Industry Services",
"American Restaurant": "New American Restaurant",
"Amusement Park Ride": "Roller Coaster",
"Amusement": "Arcade",
"Amusement": "Bingo Hall",
"Amusement": "Go Karting",
"Amusement": "Laser Tag",
"Antiques & Vintage": "Antique Store",
"Antiques & Vintage": "Auction House",
@ikegami-yukino
ikegami-yukino / japanese_lexical_density.py
Created July 31, 2015 05:00
Japanese Lexical Density
"""
Lexical Density
http://web.archive.org/web/20110810174351/http://www.unisanet.unisa.edu.au/Resources/la/Readability/Content%20words%20and%20lexical%20density.htm
"""
from __future__ import division
import MeCab
CONTENT_WORD_POS = ('名詞', '動詞', '形容詞', '副詞')
@ikegami-yukino
ikegami-yukino / japanese_matcher.java
Created July 28, 2015 02:27
Japanese character matcher for Java8
private static final Pattern PAT_JAPANESE_CHARACTER = Pattern
.compile("[\\p{IsHiragana}\\p{IsKatakana}\\p{InCJKUnifiedIdeographs}]");
private static boolean isJapanese(final String token) {
return PAT_JAPANESE_CHARACTER.matcher(token).find();
}
@ikegami-yukino
ikegami-yukino / install_ubuntu_spark.sh
Created June 25, 2015 10:41
Install Scala 2.11.7 and Spark 1.4.0 with Hadoop 2.6 to Ubuntu 14.04
wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz
sudo mkdir /usr/local/scala
sudo tar xvf scala-*.tgz -C /usr/local/scala
echo "export SCALA_HOME=/usr/local/scala/scala-2.11.7" >> ./bashrc
echo "export PATH=$SCALA_HOME/bin:$PATH" >> ./bashrc
wget ftp://ftp.kddilabs.jp/infosystems/apache/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
tar xf spark-1.4.0-bin-hadoop2.6.tgz
sudo mv spark-1.4.0-bin-hadoop2.6 /usr/local/spark
@ikegami-yukino
ikegami-yukino / install_byobu_yum.sh
Last active February 11, 2020 07:14
Install byobu to CentOS and Amazon Linux
sudo yum install byobu -y --enablerepo=epel-testing
@ikegami-yukino
ikegami-yukino / google_login.py
Created June 12, 2015 09:26
Automatically Google login by selenium
mail_address = ''
password = ''
from selenium import webdriver
UA = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0'
PHANTOMJS_ARG = {'phantomjs.page.settings.userAgent': UA}
driver = webdriver.PhantomJS(desired_capabilities=PHANTOMJS_ARG)
url = 'https://www.google.com/accounts/Login?hl=ja&continue=http://www.google.co.jp/'