Last active
February 4, 2016 15:55
-
-
Save bhive01/6698e012c4388c8d3c14 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(dplyr) | |
# input data frame | |
df1 <- data.frame(trait=rep(1:10, 4), Trait.Code = rep(LETTERS[1:4], each = 10), Alpha_Value= rep(c(NA, "1", NA, "Alphastuff"), each = 10), Number_Value = rep(c(3, NA, 6, NA), each = 10), stringsAsFactors= FALSE) | |
df1 %>% | |
mutate(n=row_number()) %>% #add index for later join | |
filter(!is.na(Alpha_Value), Alpha_Value != "NULL") %>% # Alpha_Value != NULL or NA | |
group_by(Trait.Code) %>% #group for mutate | |
mutate(Converted_Value = ifelse(any(is.na(as.numeric(.$Alpha_Value))), NA, as.numeric(Alpha_Value))) %>% | |
# if upon conversion within a group any values are not convertible to numeric, then return NA, else convert number to number | |
ungroup() %>% # ungroup for select to work | |
select(n, Converted_Value)%>% # select key columns for join | |
left_join(mutate(df1, n=row_number()), ., key = "n") %>% # join converted back | |
mutate(Number_Value = ifelse(is.na(Converted_Value), Number_Value, Converted_Value)) %>% # copy numeric values over to numeric column | |
select(-Converted_Value) | |
## Works! Thanks Ben | |
df1 %>% | |
filter(!is.na(Alpha_Value), Alpha_Value != "NULL") %>% # Alpha_Value != NULL or NA | |
group_by(Trait.Code) %>% | |
summarise(Converted_Value = ifelse(any(is.na(as.numeric(Alpha_Value))), NA, TRUE)) %>% | |
ungroup() %>% | |
left_join(df1, ., key = "Trait.Code") %>% | |
## needed to add this afterwards. If value is NA, ifelse returns NA not FALSE in final statement | |
mutate(Converted_Value = ifelse(is.na(Converted_Value), FALSE, Converted_Value)) %>% | |
mutate(Number_Value = ifelse(Converted_Value == TRUE, as.numeric(Alpha_Value), Number_Value)) |
@bpbond, easier but not really what I need. I'm trying to isolate poor data entry practices. In our DB, numeric data can be stored in "text" column. If all of the data under a given Trait.Code is numeric I want to convert that over to numeric. If all of the data is not numeric... I want to leave it as it is. Which is why I think I need the group_by and the conditional mutate.
Behavior still seems correct to me though. For example, this correctly identifies groups by the criterion:
df1 %>%
group_by(Trait.Code) %>%
summarise(something_not_numeric = any(is.na(as.numeric(Alpha_Value))))
Yup... combine that with a join to get that information back and it works. Just wasn't thinking about it in steps... trying to get everything in a single line. Thanks @bpbond!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Wouldn't it be simpler to just do
This seems to work fine.