Skip to content

Instantly share code, notes, and snippets.

View titipata's full-sized avatar
:octocat:

Titipat Achakulvisut titipata

:octocat:
View GitHub Profile
@titipata
titipata / bashscript.md
Last active August 29, 2015 14:13
Notes for some bash script that I use many times a day and forget

Notes on bash script

  • wc count number of line.
ls -1 | wc -l
  • grep command utilitiy to searches the input files for pattern. See more in history of grep. Well, it's intuitive if you remember it as g|re|p (global regular expression print) :)
@titipata
titipata / amazon.md
Created January 20, 2015 18:22
Quick note on amazon

Amazon EC2 Access

To access amazon ec2 cluster, if the cluster is ubuntu simply type:

ssh -i <path_to_keypair.pem> ubuntu@<ip>

To start ipython notebook

@titipata
titipata / git_rebase.md
Last active August 29, 2015 14:15
Rebase!

Memo for Rebase

If we are behind the remote repository, first we need to fetch from the remote repository.

git fetch

Then we commit the changes we have made in local repository and then we will rebase the origin to the master (make our commit header goes after the recent fetched header) i.e.

@titipata
titipata / gg_cloud.md
Last active August 29, 2015 14:15
access Google Cloud

Google Cloud

First, we need to install cloud sdk:

curl https://sdk.cloud.google.com | bash

To access Google cloud simply type:

ssh -i ~/.ssh/google_compute_engine @
@titipata
titipata / django_ec2.md
Last active August 29, 2015 14:15
how to access django on Amazon cluster

##Access Django on Amazon EC2

To runserver on EC2, we can do this line:

python manage.py runserver <public_dns>:8000

And access from local browser like:

http://:8000/
@titipata
titipata / github_hipster.md
Last active January 19, 2016 19:53
Add this line to .bash_profile or .bashrc

Show Github Branch in Terminal (Mac OSX)

First do (to load git-completion.bash):

curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-prompt.sh -o ~/.git-prompt.sh
curl -OL http://github.com/git/git/raw/master/contrib/completion/git-completion.bash

Then change the name of git-completion to .git-completion:

@titipata
titipata / happy_pi.ipynb
Last active August 29, 2015 14:17
Pi day!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@titipata
titipata / df_to_json.md
Last active August 29, 2015 14:17
Convert dataframe to json

Convert Pubmed dataframe to json

After I applied pubmed parser to Pubmed Open Access dataset. I want to create json file that has dictionary structure i.e. {{col_name1: "", col_name2: ""}, {col_name1: "", col_name2: ""}, ...}

So I select only subset of my dataframe, remove unused column and use orient='index' to convert dataframe to json file

pubmed_map = pd.read_csv('pubmed_city_cartodb.csv')
n_sel = np.random.randint(0, len(pubmed_map), size=100)
pubmed_map = pubmed_map.drop('Unnamed: 0', 1)

Run PySpark from Amazon EC2

Here is suggestion on how to run pyspark from Amazon EC2:

IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g

For help, we can do something like:

spark-1.2.0-bin-hadoop1/bin/pyspark --help
@titipata
titipata / gensim_word2vec.md
Last active August 29, 2015 14:17
Gensim note for training word2vec model

##Training word2vec using gensim

###Train word2vec model

full_text = map(lambda x: x.split(), list(preprocess_text)) # each element is a list of words in sentence

num_features = 500    # Word vector dimensionality                      
min_word_count = 40   # Minimum word count                        
num_workers = 4 # Number of threads to run in parallel