Skip to content

Instantly share code, notes, and snippets.

@Abhayparashar31
Last active October 17, 2022 15:27
Show Gist options
  • Save Abhayparashar31/e428605ab513340a30d27d272f9edfe9 to your computer and use it in GitHub Desktop.
Save Abhayparashar31/e428605ab513340a30d27d272f9edfe9 to your computer and use it in GitHub Desktop.
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.tokenize import sent_tokenize,word_tokenize
sent_tokens = sent_tokenize(cleaned_data)
word_tokens = word_tokenize(cleaned_data)
word_frequency = {}
stopwords = set(stopwords.words("english"))
for word in word_tokens:
if word not in stopwords:
if word not in word_frequency.keys():
word_frequency[word]=1
else:
word_frequency[word] +=1
for word in word_frequency.keys():
word_frequency[word] = (word_frequency[word]/maximum_frequency)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment