π¨βπ»
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using DataFrames | |
using CSV | |
using Gadfly | |
using TextAnalysis | |
using MLJ | |
using Chain | |
using Pipe | |
using StableRNGs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df = CSV.read("spam_dataset.csv", DataFrames.DataFrame) | |
first(df, 10) |> pretty |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df = @chain df begin | |
DataFrames.transform(:Message => ByRow(x -> StringDocument(x)) => :Message2) | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
remove_case!.(df[:, :Message2]) | |
prepare!.(df[:, :Message2], strip_html_tags| strip_punctuation| strip_numbers) | |
stem!.(df[:, :Message2]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
crps = Corpus(df[:, :Message2]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
update_lexicon!(crps) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
m = DocumentTermMatrix(crps) | |
tfidf_mat = tf_idf(m) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
X, y = tfidf_mat, df[:, :Category] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree | |
tree_model = DecisionTreeClassifier() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tree = machine(tree_model, coerce(X, Continuous), coerce(y, Multiclass)) |