wc
count number of line.
ls -1 | wc -l
grep
command utilitiy to searches the input files for pattern. See more in history of grep. Well, it's intuitive if you remember it asg|re|p (global regular expression print)
:)
wc
count number of line.ls -1 | wc -l
grep
command utilitiy to searches the input files for pattern. See more in history of grep. Well, it's intuitive if you remember it as g|re|p (global regular expression print)
:)##Access Django on Amazon EC2
To runserver on EC2, we can do this line:
python manage.py runserver <public_dns>:8000
And access from local browser like:
http://:8000/
First do (to load git-completion.bash
):
curl https://raw.githubusercontent.com/git/git/master/contrib/completion/git-prompt.sh -o ~/.git-prompt.sh
curl -OL http://github.com/git/git/raw/master/contrib/completion/git-completion.bash
Then change the name of git-completion
to .git-completion
:
After I applied pubmed parser to Pubmed Open Access dataset. I want to create json file that has dictionary structure i.e. {{col_name1: "", col_name2: ""}, {col_name1: "", col_name2: ""}, ...}
So I select only subset of my dataframe, remove unused column and use orient='index'
to convert dataframe to json file
pubmed_map = pd.read_csv('pubmed_city_cartodb.csv')
n_sel = np.random.randint(0, len(pubmed_map), size=100)
pubmed_map = pubmed_map.drop('Unnamed: 0', 1)
Here is suggestion on how to run pyspark from Amazon EC2:
IPYTHON_OPTS="notebook --ip=* --no-browser" ~/spark-1.2.0-bin-hadoop1/bin/pyspark --master local[4] --driver-memory 4g --executor-memory 4g
For help, we can do something like:
spark-1.2.0-bin-hadoop1/bin/pyspark --help
##Training word2vec using gensim
###Train word2vec model
full_text = map(lambda x: x.split(), list(preprocess_text)) # each element is a list of words in sentence
num_features = 500 # Word vector dimensionality
min_word_count = 40 # Minimum word count
num_workers = 4 # Number of threads to run in parallel