Skip to content

Instantly share code, notes, and snippets.

@sangheestyle
Created April 20, 2014 23:00
Show Gist options
  • Save sangheestyle/11127428 to your computer and use it in GitHub Desktop.
Save sangheestyle/11127428 to your computer and use it in GitHub Desktop.
scratchpad
@sangheestyle
Copy link
Author

Stemming and lemmatization

What is difference between stemming and lemmatization. In this case, lemmatization seems to be more proper approach to analyze texts.

http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

@sangheestyle
Copy link
Author

Pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. http://www.clips.ua.ac.be/pages/pattern

https://github.com/clips/pattern

@sangheestyle
Copy link
Author

PyData

@sangheestyle
Copy link
Author

Gensim

Topic model package written in Python

@sangheestyle
Copy link
Author

How to Install Accelerated BLAS Into a Python Virtualenv

Summary: Before installing numpy and scipy, you need to do the following in order to boost calculation speed up when you are using numpy.

$ sudo apt-get install 
$ pip uninstall numpy ## only if numpy is already installed
$ pip uninstall scipy ## only if scipy is already installed
$ export BLAS=/usr/local/lib/libopenblas.a
$ export LAPACK=/usr/local/lib/libopenblas.a
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
$ export ATLAS=
$ pip install numpy
$ pip install scipy

http://williamjohnbert.com/2012/03/how-to-install-accelerated-blas-into-a-python-virtualenv/

Check numpy setup

$ python -c 'import numpy; numpy.show_config()'

For benchmark

>>> from numpy import *
>>> import time
>>> A = random.random((1000,1000))
>>> B = random.random((1000,1000))
>>> t = time.time(); dot(A,B); print time.time()-t

http://www.janeriksolem.net/2009/10/is-your-numpy-using-right-atlas.html

@sangheestyle
Copy link
Author

idea for implementation

한 개의 오브젝트를 만든다고 보자.

  • 폴더 패스를 하나 주어 오브젝트를 만든다고 치면 (폴더패스 안에는 여러개의 문서들이 있다 or 각 라인이 도큐먼트인 파일이여도 좋고)
  • 하이레벨로 한큐에 끝낼 수 있게
  • 머신러닝
  • 리포트 등...
  • gensim 에서 다 처리
  • pattern 을 사용할 수 있는 것은 무엇인가?
  • setup.py 를 이용해서 설치 가능하게

@sangheestyle
Copy link
Author

Pythonic Preambulations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment