Created
April 20, 2014 23:00
-
-
Save sangheestyle/11127428 to your computer and use it in GitHub Desktop.
scratchpad
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scratchpad |
Pattern
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. http://www.clips.ua.ac.be/pages/pattern
PyData
- PyData vimeo: http://vimeo.com/pydata
- PyData.org: http://pydata.org
Gensim
Topic model package written in Python
- gensim: http://radimrehurek.com/gensim
- experiments on the english wikipedia: http://radimrehurek.com/gensim/wiki.html
- gensim source code: https://github.com/piskvorky/gensim/
How to Install Accelerated BLAS Into a Python Virtualenv
Summary: Before installing numpy and scipy, you need to do the following in order to boost calculation speed up when you are using numpy.
$ sudo apt-get install
$ pip uninstall numpy ## only if numpy is already installed
$ pip uninstall scipy ## only if scipy is already installed
$ export BLAS=/usr/local/lib/libopenblas.a
$ export LAPACK=/usr/local/lib/libopenblas.a
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
$ export ATLAS=
$ pip install numpy
$ pip install scipy
http://williamjohnbert.com/2012/03/how-to-install-accelerated-blas-into-a-python-virtualenv/
Check numpy setup
$ python -c 'import numpy; numpy.show_config()'
For benchmark
>>> from numpy import *
>>> import time
>>> A = random.random((1000,1000))
>>> B = random.random((1000,1000))
>>> t = time.time(); dot(A,B); print time.time()-t
http://www.janeriksolem.net/2009/10/is-your-numpy-using-right-atlas.html
idea for implementation
한 개의 오브젝트를 만든다고 보자.
- 폴더 패스를 하나 주어 오브젝트를 만든다고 치면 (폴더패스 안에는 여러개의 문서들이 있다 or 각 라인이 도큐먼트인 파일이여도 좋고)
- 하이레벨로 한큐에 끝낼 수 있게
- 머신러닝
- 리포트 등...
- gensim 에서 다 처리
- pattern 을 사용할 수 있는 것은 무엇인가?
- setup.py 를 이용해서 설치 가능하게
Pythonic Preambulations
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Stemming and lemmatization
What is difference between stemming and lemmatization. In this case, lemmatization seems to be more proper approach to analyze texts.
http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html