Skip to content

Instantly share code, notes, and snippets.

@neymarsabin
Last active July 25, 2017 14:09
Show Gist options
  • Save neymarsabin/86c208ebb4ffccfd402e36af1379991e to your computer and use it in GitHub Desktop.
Save neymarsabin/86c208ebb4ffccfd402e36af1379991e to your computer and use it in GitHub Desktop.
Some references for finding Similarity between documents:
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. Sebastopol, CA: O’Reilly Media.
Liang, H. (2014). Coevolution of political discussion and common ground in web discussion forum. Social Science Computer Review, 32, 155-169. doi:10.1177/0894439313506844
Pang, B., & Lee, L. (2004). Sentiment polarity dataset version 2.0 [Data file]. Retrieved from http://www.nltk.org/nltk_data/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. http://www.jmlr.org/papers/v12/pedregosa11a.html
Perone, C. S. (September 18, 2011a). Machine learning :: Text feature extraction (tf-idf) – Part I [Blog]. Retrieved from http://blog.christianperone.com/2011/09/machine-learning-text-feature-extraction-tf-idf-part-i/
Perone, C. S. (October 3, 2011b). Machine learning :: Text feature extraction (tf-idf) – Part II [Blog]. Retrieved from http://blog.christianperone.com/2011/10/machine-learning-text-feature-extraction-tf-idf-part-ii/
Perone, C. S. (September 12, 2013). Machine learning :: Cosine similarity for vector space models (Part III) [Blog]. Retrieved from http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
Taken from:: https://sites.temple.edu/tudsc/2017/03/30/measuring-similarity-between-texts-in-python/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment