Skip to content

Instantly share code, notes, and snippets.

View lucifermorningstar1305's full-sized avatar
πŸ‘¨β€πŸ’»
Always Coding

Adityam Ghosh lucifermorningstar1305

πŸ‘¨β€πŸ’»
Always Coding
View GitHub Profile
@lucifermorningstar1305
lucifermorningstar1305 / pkg_import.jl
Created July 29, 2022 07:58
Import packages in Julia
using DataFrames
using CSV
using Gadfly
using TextAnalysis
using MLJ
using Chain
using Pipe
using StableRNGs
@lucifermorningstar1305
lucifermorningstar1305 / read_data.jl
Created July 29, 2022 08:13
Read a CSV file in Julia
df = CSV.read("spam_dataset.csv", DataFrames.DataFrame)
first(df, 10) |> pretty
@lucifermorningstar1305
lucifermorningstar1305 / transform_data.jl
Created July 29, 2022 12:00
To transform a particular column of a julia dataframe
df = @chain df begin
DataFrames.transform(:Message => ByRow(x -> StringDocument(x)) => :Message2)
end
@lucifermorningstar1305
lucifermorningstar1305 / text_preprocess.jl
Created July 29, 2022 12:09
Simple preprocessing of text in Julia
remove_case!.(df[:, :Message2])
prepare!.(df[:, :Message2], strip_html_tags| strip_punctuation| strip_numbers)
stem!.(df[:, :Message2])
@lucifermorningstar1305
lucifermorningstar1305 / gen_corpus.jl
Created July 29, 2022 12:27
Generate corpus of a text
crps = Corpus(df[:, :Message2])
@lucifermorningstar1305
lucifermorningstar1305 / build_vocab.jl
Created July 29, 2022 12:31
Build up the vocabulary
update_lexicon!(crps)
@lucifermorningstar1305
lucifermorningstar1305 / build_tfidf.jl
Created July 29, 2022 12:40
Build the TF-IDF matrix of the text data
m = DocumentTermMatrix(crps)
tfidf_mat = tf_idf(m)
@lucifermorningstar1305
lucifermorningstar1305 / def_x_y.jl
Created July 29, 2022 13:09
Define the X and y
X, y = tfidf_mat, df[:, :Category]
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
tree_model = DecisionTreeClassifier()
tree = machine(tree_model, coerce(X, Continuous), coerce(y, Multiclass))