Created
May 13, 2016 17:48
-
-
Save sibyvt/208b01bd559f9a0b250f07df95d4f267 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Selection: 6 | |
| | |
| | 0% | |
| In this lesson, we'll see how to extract elements from a vector based on some | |
| conditions that we specify. | |
... | |
| | |
|== | 3% | |
| For example, we may only be interested in the first 20 elements of a vector, | |
| or only the elements that are not NA, or only those that are positive or | |
| correspond to a specific variable of interest. By the end of this lesson, | |
| you'll know how to handle each of these scenarios. | |
... | |
| | |
|==== | 5% | |
| I've created for you a vector called x that contains a random ordering of 20 | |
| numbers (from a standard normal distribution) and 20 NAs. Type x now to see | |
| what it looks like. | |
> x | |
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 NA 0.63801302 | |
[7] NA NA NA -0.02668931 NA NA | |
[13] 1.30983997 -3.51235543 NA -0.52339350 NA NA | |
[19] NA -0.71988909 NA NA NA NA | |
[25] NA -0.96860318 NA NA NA -0.09162274 | |
[31] 1.57282019 -0.46906331 NA 1.52285277 0.22627761 1.32663565 | |
[37] 0.15567008 NA -0.21661389 -0.71730483 | |
| That's the answer I was looking for. | |
| | |
|===== | 8% | |
| The way you tell R that you want to select some particular elements (i.e. a | |
| 'subset') from a vector is by placing an 'index vector' in square brackets | |
| immediately following the name of the vector. | |
... | |
| | |
|======= | 10% | |
| For a simple example, try x[1:10] to view the first ten elements of x. | |
> x{1:10} | |
Error: unexpected '{' in "x{" | |
> x{1:10} | |
Error: unexpected '{' in "x{" | |
> x(1:10) | |
Error: could not find function "x" | |
> x[1:10] | |
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 NA 0.63801302 | |
[7] NA NA NA -0.02668931 | |
| Excellent job! | |
| | |
|========= | 13% | |
| Index vectors come in four different flavors -- logical vectors, vectors of | |
| positive integers, vectors of negative integers, and vectors of character | |
| strings -- each of which we'll cover in this lesson. | |
... | |
| | |
|=========== | 15% | |
| Let's start by indexing with logical vectors. One common scenario when | |
| working with real-world data is that we want to extract all elements of a | |
| vector that are not NA (i.e. missing data). Recall that is.na(x) yields a | |
| vector of logical values the same length as x, with TRUEs corresponding to NA | |
| values in x and FALSEs corresponding to non-NA values in x. | |
... | |
| | |
|============= | 18% | |
| What do you think x[is.na(x)] will give you? | |
1: A vector of TRUEs and FALSEs | |
2: A vector with no NAs | |
3: A vector of all NAs | |
4: A vector of length 0 | |
Selection: 3 | |
| That's correct! | |
| | |
|============== | 21% | |
| Prove it to yourself by typing x[is.na(x)]. | |
> x[is.na(x)] | |
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | |
| Your dedication is inspiring! | |
| | |
|================ | 23% | |
| Recall that `!` gives us the negation of a logical expression, so !is.na(x) | |
| can be read as 'is not NA'. Therefore, if we want to create a vector called y | |
| that contains all of the non-NA values from x, we can use y <- x[!is.na(x)]. | |
| Give it a try. | |
> y <- x[!is.na(x)] | |
| You got it right! | |
| | |
|================== | 26% | |
| Print y to the console. | |
> y | |
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 0.63801302 -0.02668931 | |
[7] 1.30983997 -3.51235543 -0.52339350 -0.71988909 -0.96860318 -0.09162274 | |
[13] 1.57282019 -0.46906331 1.52285277 0.22627761 1.32663565 0.15567008 | |
[19] -0.21661389 -0.71730483 | |
| All that hard work is paying off! | |
| | |
|==================== | 28% | |
| Now that we've isolated the non-missing values of x and put them in y, we can | |
| subset y as we please. | |
... | |
| | |
|====================== | 31% | |
| Recall that the expression y > 0 will give us a vector of logical values the | |
| same length as y, with TRUEs corresponding to values of y that are greater | |
| than zero and FALSEs corresponding to values of y that are less than or equal | |
| to zero. What do you think y[y > 0] will give you? | |
1: A vector of all the positive elements of y | |
2: A vector of TRUEs and FALSEs | |
3: A vector of all NAs | |
4: A vector of length 0 | |
5: A vector of all the negative elements of y | |
Selection: 1 | |
| You are amazing! | |
| | |
|======================= | 33% | |
| Type y[y > 0] to see that we get all of the positive elements of y, which are | |
| also the positive elements of our original vector x. | |
> y[y > 0] | |
[1] 0.6380130 1.3098400 1.5728202 1.5228528 0.2262776 1.3266356 0.1556701 | |
| Keep up the great work! | |
| | |
|========================= | 36% | |
| You might wonder why we didn't just start with x[x > 0] to isolate the | |
| positive elements of x. Try that now to see why. | |
> x[x > 0] | |
[1] NA 0.6380130 NA NA NA NA NA | |
[8] 1.3098400 NA NA NA NA NA NA | |
[15] NA NA NA NA NA NA 1.5728202 | |
[22] NA 1.5228528 0.2262776 1.3266356 0.1556701 NA | |
| Excellent job! | |
| | |
|=========================== | 38% | |
| Since NA is not a value, but rather a placeholder for an unknown quantity, | |
| the expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in | |
| with our positive numbers when we do this. | |
... | |
| | |
|============================= | 41% | |
| Combining our knowledge of logical operators with our new knowledge of | |
| subsetting, we could do this -- x[!is.na(x) & x > 0]. Try it out. | |
> x[!is.na(x) & x > 0] | |
[1] 0.6380130 1.3098400 1.5728202 1.5228528 0.2262776 1.3266356 0.1556701 | |
| Great job! | |
| | |
|=============================== | 44% | |
| In this case, we request only values of x that are both non-missing AND | |
| greater than zero. | |
... | |
| | |
|================================ | 46% | |
| I've already shown you how to subset just the first ten values of x using | |
| x[1:10]. In this case, we're providing a vector of positive integers inside | |
| of the square brackets, which tells R to return only the elements of x | |
| numbered 1 through 10. | |
... | |
| | |
|================================== | 49% | |
| Many programming languages use what's called 'zero-based indexing', which | |
| means that the first element of a vector is considered element 0. R uses | |
| 'one-based indexing', which (you guessed it!) means the first element of a | |
| vector is considered element 1. | |
... | |
| | |
|==================================== | 51% | |
| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint | |
| -- Use the c() function to specify the element numbers as a numeric vector. | |
> c(3, 5, 7) | |
[1] 3 5 7 | |
| One more time. You can do it! Or, type info() for more options. | |
| Create a vector of indexes with c(3, 5, 7), then put that inside of the | |
| square brackets. | |
> x<- c(3, 5, 7) | |
| Give it another try. Or, type info() for more options. | |
| Create a vector of indexes with c(3, 5, 7), then put that inside of the | |
| square brackets. | |
> [x]] | |
Error: unexpected '[' in "[" | |
> [x] | |
Error: unexpected '[' in "[" | |
> x[c(3, 5, 7)] | |
[1] 7 NA NA | |
| You got it! | |
| | |
|====================================== | 54% | |
| It's important that when using integer vectors to subset our vector x, we | |
| stick with the set of indexes {1, 2, ..., 40} since x only has 40 elements. | |
| What happens if we ask for the zeroth element of x (i.e. x[0])? Give it a | |
| try. | |
> x[0] | |
numeric(0) | |
| You are doing so well! | |
| | |
|======================================= | 56% | |
| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent | |
| us from doing this. What if we ask for the 3000th element of x? Try it out. | |
> x[3000] | |
[1] NA | |
| You're the best! | |
| | |
|========================================= | 59% | |
| Again, nothing useful, but R doesn't prevent us from asking for it. This | |
| should be a cautionary tale. You should always make sure that what you are | |
| asking for is within the bounds of the vector you're working with. | |
... | |
| | |
|=========================================== | 62% | |
| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It | |
| would be pretty tedious to construct a vector containing all numbers 1 | |
| through 40 EXCEPT 2 and 10. | |
...[x1:40, -2, -10] | |
| | |
|============================================= | 64% | |
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us | |
| ONLY the 2nd and 10th elements of x, x[c(-2, -10)] gives us all elements of x | |
| EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see this. | |
> x[c(-2, -10)] | |
[1] 3 7 | |
| You nailed it! Good job! | |
| | |
|=============================================== | 67% | |
| A shorthand way of specifying multiple negative numbers is to put the | |
| negative sign out in front of the vector of positive numbers. Type x[-c(2, | |
| 10)] to get the exact same result. | |
> | |
> x[-c(2, | |
+ 10)] | |
[1] 3 7 | |
| Excellent work! | |
| | |
|================================================ | 69% | |
| So far, we've covered three types of index vectors -- logical, positive | |
| integer, and negative integer. The only remaining type requires us to | |
| introduce the concept of 'named' elements. | |
... | |
| | |
|================================================== | 72% | |
| Create a numeric vector with three named elements using vect <- c(foo = 11, | |
| bar = 2, norf = NA). | |
> vect <- c(foo = 11, | |
+ bar = 2, norf = NA) | |
| Keep working like that and you'll get there! | |
| | |
|==================================================== | 74% | |
| When we print vect to the console, you'll see that each element has a name. | |
| Try it out. | |
> | |
> | |
> vect <- c(foo = 11, bar = 2, norf = NA) | |
| You almost had it, but not quite. Try again. Or, type info() for more | |
| options. | |
| Type vect to view its contents. | |
> vect <- c(foo = 11, bar = 2, norf = NA) | |
| That's not the answer I was looking for, but try again. Or, type info() for | |
| more options. | |
| Type vect to view its contents. | |
> vect <-c(foo = 11, bar = 2, norf = NA) | |
| That's not the answer I was looking for, but try again. Or, type info() for | |
| more options. | |
| Type vect to view its contents. | |
> vect <- c(foo = 11, bar = 2, norf = NA)') | |
+ vect <- c(foo = 11, bar = 2, norf = NA)') | |
Error: unexpected string constant in: | |
"vect <- c(foo = 11, bar = 2, norf = NA)') | |
vect <- c(foo = 11, bar = 2, norf = NA)'" | |
> vect <- c(foo = 11, bar = 2, norf = NA) | |
| Almost! Try again. Or, type info() for more options. | |
| Type vect to view its contents. | |
> vect <- c(foo= 11, bar= 2, norf= NA) | |
| Not quite right, but keep trying. Or, type info() for more options. | |
| Type vect to view its contents. | |
> vect<-c(foo= 11, bar= 2, norf= NA) | |
| Almost! Try again. Or, type info() for more options. | |
| Type vect to view its contents. | |
> vect <-c(foo= 11, bar= 2, norf= NA) | |
| You're close...I can feel it! Try it again. Or, type info() for more options. | |
| Type vect to view its contents. | |
> vect <-c(foo = 11, bar = 2, norf = NA) | |
| Almost! Try again. Or, type info() for more options. | |
| Type vect to view its contents. | |
> vect <- c(foo = 11, bar = 2, norf = NA) | |
| One more time. You can do it! Or, type info() for more options. | |
| Type vect to view its contents. | |
> | |
> x[!is.na(x) & x > 0] | |
[1] 3 5 7 | |
> vect[c(foo = 11, bar = 2, norf = NA)] | |
<NA> bar <NA> | |
NA 2 NA | |
| You almost had it, but not quite. Try again. Or, type info() for more | |
| options. | |
| Type vect to view its contents. | |
> vect | |
foo bar norf | |
11 2 NA | |
| Your dedication is inspiring! | |
| | |
|====================================================== | 77% | |
| We can also get the names of vect by passing vect as an argument to the | |
| names() function. Give that a try. | |
> name(vect) | |
Error: could not find function "name" | |
> names(vect) | |
[1] "foo" "bar" "norf" | |
| You're the best! | |
| | |
|======================================================== | 79% | |
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do | |
| that now. | |
> vect2[c(11,2,NA)] | |
Error: object 'vect2' not found | |
> vect2 <- c(11, 2, NA) | |
| Excellent work! | |
| | |
|========================================================= | 82% | |
| Then, we can add the `names` attribute to vect2 after the fact with | |
| names(vect2) <- c("foo", "bar", "norf"). Go ahead. | |
> names(vect2) <- c("foo", "bar", "norf") | |
| That's correct! | |
| | |
|=========================================================== | 85% | |
| Now, let's check that vect and vect2 are the same by passing them as | |
| arguments to the identical() function. | |
> identical(vect, vect2) | |
[1] TRUE | |
| You got it right! | |
| | |
|============================================================= | 87% | |
| Indeed, vect and vect2 are identical named vectors. | |
... | |
| | |
|=============================================================== | 90% | |
| Now, back to the matter of subsetting a vector by named elements. Which of | |
| the following commands do you think would give us the second element of vect? | |
1: vect[bar] | |
2: vect["2"] | |
3: vect["bar"] | |
Selection: 2 | |
| Not quite, but you're learning! Try again. | |
| If we want the element named "bar" (i.e. the second element of vect), which | |
| command would get us that? | |
1: vect["2"] | |
2: vect["bar"] | |
3: vect[bar] | |
Selection: 2 | |
| That's the answer I was looking for. | |
| | |
|================================================================= | 92% | |
| Now, try it out. | |
> vect["bar"] | |
bar | |
2 | |
| Your dedication is inspiring! | |
| | |
|================================================================== | 95% | |
| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it | |
| out. | |
> vect[c("foo", "bar")] | |
foo bar | |
11 2 | |
| You nailed it! Good job! | |
| | |
|==================================================================== | 97% | |
| Now you know all four methods of subsetting data from vectors. Different | |
| approaches are best in different scenarios and when in doubt, try it out! | |
... | |
| | |
|======================================================================| 100% | |
| Would you like to receive credit for completing this course on Coursera.org? | |
1: Yes | |
2: No | |
Selection: 1 | |
What is your email address? [email protected] | |
What is your assignment token? BMrMEZf4ICtZTRCq | |
Grade submission succeeded! | |
| You got it! | |
How is x generated ?
I have the same question. The vector "x" does not load automatically if you run the swirl library on RStudio :/
Got it!....it runs automatically
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How is x generated ?