Created
January 30, 2023 20:13
-
-
Save diamonaj/c64a9c16e0c883cb7af40f7706a3834e to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## We're going to be running regressions... | |
## If a predicted value is positive, we're going to say it's a prediction for hamilton authorship. | |
## If a predicted value is negative, we're going to say it's a prediction for madison authorship. | |
author <- rep(NA, nrow(dtm1)) # a vector with a missing value | |
author[hamilton] <- 1 # 1 if Hamilton | |
author[madison] <- -1 # -1 if Madison | |
## data frame for regression | |
author.data <- data.frame(author = author[c(hamilton, madison)], | |
tfm[c(hamilton, madison), ]) | |
## To predict the authorship, we use the term frequency of the 4 words | |
## selected based on our preliminary aanalysis ('upon', 'there', | |
## 'consequently', and 'whilst'. The data frame object we created above contains | |
## the term frequency of the 10 words including these 4. We estimate the | |
## coefficients using the 66 essays with known authorship. | |
hm.fit <- lm(author ~ enough + upon + although + whilst + always + commonly | |
+ consequently + considerable, data = author.data) | |
hm.fit |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment