Last active
November 11, 2018 19:16
-
-
Save rmflight/330ffc13435fc20b8a949503d7778693 to your computer and use it in GitHub Desktop.
removing duplicate entries across rows and columns
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ex_data = data.frame(A = c("A", "C", "E", "F", "G", "H", "I"), | |
| B = c("B", "D", "A", "E", "I", "J", "K"), | |
| C = "C", | |
| stringsAsFactors = FALSE) | |
| irow = 2 | |
| consider_cols = c("A", "B") | |
| all_entries = unlist(ex_data[1, consider_cols], use.names = FALSE) | |
| while (irow <= nrow(ex_data)) { | |
| message(c(irow, nrow(ex_data))) | |
| new_entries = unlist(ex_data[irow, consider_cols], use.names = FALSE) | |
| if (any(new_entries %in% all_entries)) { | |
| ex_data = ex_data[-irow, ] | |
| } else { | |
| all_entries = c(all_entries, new_entries) | |
| irow = irow + 1 | |
| } | |
| } | |
| print(ex_data) |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
So we start by saving
all_entriesis the first row, and we unlist it because it is a data.frame and really we just want the entries.Then, while
irowis less than the total rows (so this will go until the end, no matter how big, I think), we get the next rows entries, and check if any were in all the entries thus far. If yes, get rid of that row, and don't update the row incrementer, if no, add the entries to all, and then increment the counter to go to the next row.