Skip to content

Instantly share code, notes, and snippets.

@4OH4
4OH4 / tfidf_basic.py
Created March 29, 2020 09:36
Basic TF-idf model using Scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
search_terms = 'fruit and vegetables'
documents = ['cars drive on the road', 'tomatoes are actually fruit']
doc_vectors = TfidfVectorizer().fit_transform([search_terms] + documents)
cosine_similarities = linear_kernel(doc_vectors[0:1], doc_vectors).flatten()
document_scores = [item.item() for item in cosine_similarities[1:]]
@4OH4
4OH4 / database.py
Last active November 15, 2019 22:13
Doctest case embedded in the class documentation for a DAO
import os
import sqlite3
class DAO(object):
"""
SQLite3 Data Access Object
Usage:
>>> dao = DAO('example.db')
Database connection initialised
@4OH4
4OH4 / nltk.py
Created February 2, 2019 15:20
Initialise NLTK
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet
print(wordnet.get_version())