This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Accuracy of SGDClassifier (support vector machine - SVM) - 91.27829560585884 | |
Accuracy (after tuning) of SGDClassifier (support vector machine - SVM) - 93.40878828229027 | |
Grid Search best score - | |
0.974302171023 | |
Grid Search best parameters - | |
{'clf__alpha': 0.0001, | |
'tfidf__smooth_idf': False, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Accuracy of MultinomialNB (naive Bayes) - 83.48868175765645 | |
Accuracy (after tuning) of MultinomialNB (naive Bayes) - 93.27563249001332 | |
Grid Search best score - | |
0.979618963226 | |
Grid Search best parameters - | |
{'clf__alpha': 0.001, | |
'clf__fit_prior': False, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Accuracy of MultinomialNB (naive Bayes) - 83.48868175765645 | |
Accuracy (after tuning) of MultinomialNB (naive Bayes) - 88.34886817576565 | |
Grid Search best score - | |
0.945059813912 | |
Grid Search best parameters - | |
{'clf__alpha': 0.5, | |
'clf__fit_prior': False, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# encoding: utf-8 | |
from sklearn.datasets import fetch_20newsgroups | |
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer | |
from sklearn.naive_bayes import MultinomialNB | |
from sklearn.model_selection import GridSearchCV | |
from sklearn.pipeline import Pipeline | |
from sklearn import metrics | |
import numpy as np |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Accuracy of SGDClassifier (support vector machine - SVM) - 91.27829560585884 | |
Accuracy (after tuning) of SGDClassifier (support vector machine - SVM) - 91.27829560585884 | |
Grid Search best score - | |
0.965440850687 | |
Grid Search best parameters - | |
{'clf__alpha': 0.001, 'tfidf__use_idf': True, 'vect__ngram_range': (1, 1)} | |
Metrics classification report | |
precision recall f1-score support | |
alt.atheism 0.95 0.81 0.87 319 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# encoding: utf-8 | |
from sklearn.datasets import fetch_20newsgroups | |
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer | |
from sklearn.linear_model import SGDClassifier | |
from sklearn.model_selection import GridSearchCV | |
from sklearn.pipeline import Pipeline | |
from sklearn import metrics | |
import numpy as np |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hilarious_text = ["Never interrupt your opponent while he's making a mistake.", "Sarcasm helps keep people from understanding you're saying what you really think of them.", "I once prayed to God for a bike, but quickly found out He didn't work that way—so I stole a bike and prayed for His forgiveness.", 'A train station is where the train stopsA bus station is where the bus stops. On my desk, I have a work station...', "You can't be late until you show up.", "War doesn't determine who's right—it determines who's left.", "If you think things can't get worse, it's probably only because you lack sufficient imagination.", 'Parents spend the first part of our lives teaching us to walk and talk and the rest of it telling us to sit down and shut up.', 'Expecting the world to treat you fairly because you are good is like expecting the bull not to charge because you are a vegetarian.', "Books have knowledge, knowledge is power, power corrupts, corruption is a crime, and crime doesn't paySo if you keep reading, you'll |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def linkify(html): | |
check = ['https://', 'http://', 'www.', '.jpg', '.png', '.jpeg', '.gif'] | |
html_source = html | |
replace = {} | |
text = [] | |
for letter in html: | |
if letter == '>': | |
start = True | |
continue | |
elif letter == '<': |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# you can write to stdout for debugging purposes, e.g. | |
# print "this is a debug message" | |
def solution(A): | |
# write your code in Python 2.7 | |
# input validation | |
# empty input | |
if not len(A): | |
return 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# you can write to stdout for debugging purposes, e.g. | |
# print "this is a debug message" | |
def solution(A): | |
# write your code in Python 2.7 | |
# input validation | |
# empty input | |
if not len(A): | |
return 1 |