Created
May 13, 2016 08:18
-
-
Save sibyvt/1968af54aa981cff02e09eea3757610a to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Selection: 5 | |
| | |
| | 0% | |
| Missing values play an important role in statistics and data analysis. Often, | |
| missing values must not be ignored, but rather they should be carefully | |
| studied to see if there's an underlying pattern or cause for their | |
| missingness. | |
... | |
| | |
|==== | 5% | |
| In R, NA is used to represent any value that is 'not available' or 'missing' | |
| (in the statistical sense). In this lesson, we'll explore missing values | |
| further. | |
... | |
| | |
|======= | 10% | |
| Any operation involving NA generally yields NA as the result. To illustrate, | |
| let's create a vector c(44, NA, 5, NA) and assign it to a variable x. | |
> x <- c(44, NA, 5, NA) | |
| Keep working like that and you'll get there! | |
| | |
|=========== | 15% | |
| Now, let's multiply x by 3. | |
> y <-x*3 | |
| Not quite! Try again. Or, type info() for more options. | |
| Try x * 3. | |
> x*3 | |
[1] 132 NA 15 NA | |
| Perseverance, that's the answer. | |
| | |
|============== | 20% | |
| Notice that the elements of the resulting vector that correspond with the NA | |
| values in x are also NA. | |
... | |
| | |
|================== | 25% | |
| To make things a little more interesting, lets create a vector containing | |
| 1000 draws from a standard normal distribution with y <- rnorm(1000). | |
> y <- rnorm(1000) | |
| You are really on a roll! | |
| | |
|===================== | 30% | |
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000). | |
> z <- rep(NA, 1000) | |
| You nailed it! Good job! | |
| | |
|========================= | 35% | |
| Finally, let's select 100 elements at random from these 2000 values | |
| (combining y and z) such that we don't know how many NAs we'll wind up with | |
| or what positions they'll occupy in our final vector -- my_data <- | |
| sample(c(y, z), 100). | |
> my_data <- | |
+ sample(c(y, z), 100) | |
| That's the answer I was looking for. | |
| | |
|============================ | 40% | |
| Let's first ask the question of where our NAs are located in our data. The | |
| is.na() function tells us whether each element of a vector is NA. Call | |
| is.na() on my_data and assign the result to my_na. | |
> | |
> is.na() | |
Error in is.na() : 0 arguments passed to 'is.na' which requires 1 | |
> my_na <- is.na() | |
Error in is.na() : 0 arguments passed to 'is.na' which requires 1 | |
> my_na <- is.na(my_data) | |
| You are really on a roll! | |
| | |
|================================ | 45% | |
| Now, print my_na to see what you came up with. | |
> my_na | |
[1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE | |
[13] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE | |
[25] FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE | |
[37] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE | |
[49] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE | |
[61] TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE | |
[73] FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE | |
[85] TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE | |
[97] TRUE TRUE FALSE FALSE | |
| Nice work! | |
| | |
|=================================== | 50% | |
| Everywhere you see a TRUE, you know the corresponding element of my_data is | |
| NA. Likewise, everywhere you see a FALSE, you know the corresponding element | |
| of my_data is one of our random draws from the standard normal distribution. | |
... | |
| | |
|====================================== | 55% | |
| In our previous discussion of logical operators, we introduced the `==` | |
| operator as a method of testing for equality between two objects. So, you | |
| might think the expression my_data == NA yields the same results as is.na(). | |
| Give it a try. | |
> | |
> my_data == NA | |
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | |
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | |
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | |
[76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | |
| You're the best! | |
| | |
|========================================== | 60% | |
| The reason you got a vector of all NAs is that NA is not really a value, but | |
| just a placeholder for a quantity that is not available. Therefore the | |
| logical expression is incomplete and R has no choice but to return a vector | |
| of the same length as my_data that contains all NAs. | |
...sum(my_na) | |
| | |
|============================================== | 65% | |
| Don't worry if that's a little confusing. The key takeaway is to be cautious | |
| when using logical expressions anytime NAs might creep in, since a single NA | |
| value can derail the entire thing. | |
... | |
| | |
|================================================= | 70% | |
| So, back to the task at hand. Now that we have a vector, my_na, that has a | |
| TRUE for every NA and FALSE for every numeric value, we can compute the total | |
| number of NAs in our data. | |
... | |
| | |
|==================================================== | 75% | |
| The trick is to recognize that underneath the surface, R represents TRUE as | |
| the number 1 and FALSE as the number 0. Therefore, if we take the sum of a | |
| bunch of TRUEs and FALSEs, we get the total number of TRUEs. | |
... | |
| | |
|======================================================== | 80% | |
| Let's give that a try here. Call the sum() function on my_na to count the | |
| total number of TRUEs in my_na, and thus the total number of NAs in my_data. | |
| Don't assign the result to a new variable. | |
> sum(my_na) | |
[1] 51 | |
| That's a job well done! | |
| | |
|============================================================ | 85% | |
| Pretty cool, huh? Finally, let's take a look at the data to convince | |
| ourselves that everything 'adds up'. Print my_data to the console. | |
> | |
> my_data | |
[1] NA NA 1.09462984 0.59778656 -0.90944656 NA | |
[7] -0.35787991 -0.03112132 0.26389767 NA NA NA | |
[13] NA NA NA NA 0.21294024 NA | |
[19] -1.48071872 NA 1.91079078 1.54674727 NA NA | |
[25] -0.59913377 NA NA -0.98306919 NA NA | |
[31] NA -0.27882350 -1.36030614 NA NA 2.64448799 | |
[37] -2.12506858 0.25815328 -0.28095853 0.40540060 -0.39859703 NA | |
[43] 0.22767670 1.17414183 NA 0.20077198 -0.25732459 1.47075231 | |
[49] 0.60883017 1.57914885 NA 0.67170119 0.67769134 0.46886597 | |
[55] 0.01949685 NA NA -0.59312866 0.17993845 -0.07268996 | |
[61] NA NA NA NA NA -1.19344507 | |
[67] NA -0.36172476 0.91197623 NA NA -1.13324675 | |
[73] -0.56856448 NA -1.70945322 -1.11652692 NA NA | |
[79] NA 0.81674190 -1.08081702 NA NA 0.67119044 | |
[85] NA -0.90857825 -0.72647148 NA 1.03472122 NA | |
[91] NA NA NA NA NA NA | |
[97] NA NA 0.17035257 1.05240690 | |
| All that hard work is paying off! | |
| | |
|=============================================================== | 90% | |
| Now that we've got NAs down pat, let's look at a second type of missing value | |
| -- NaN, which stands for 'not a number'. To generate NaN, try dividing (using | |
| a forward slash) 0 by 0 now. | |
> 0/0 | |
[1] NaN | |
| All that hard work is paying off! | |
| | |
|================================================================== | 95% | |
| Let's do one more, just for fun. In R, Inf stands for infinity. What happens | |
| if you subtract Inf from Inf? | |
> Inf-Inf | |
[1] NaN | |
| That's correct! | |
| | |
|======================================================================| 100% | |
| Would you like to receive credit for completing this course on Coursera.org? | |
1: No | |
2: Yes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How to proceed further