Skip to content

Instantly share code, notes, and snippets.

@alanmarazzi
Last active August 2, 2016 12:17
Show Gist options
  • Save alanmarazzi/8f0b32d5b60c07b695e8cbab1c1005dd to your computer and use it in GitHub Desktop.
Save alanmarazzi/8f0b32d5b60c07b695e8cbab1c1005dd to your computer and use it in GitHub Desktop.
Basic R
# As a primer you saw also earlier that I wrote
# some text in the code, but it wasn't evaluated.
# These are comments, you can use them by simply
# using "#" before the text.
###### I can add also more "#" to draw the attention
# on that comment.
# Assign names
# We can do math with R as we would do in a simple calculator
5 + 6 # Returns 11
2 * 5 # Returns 10
2 ^ 2 # Returns 2 to the power of 2 = 4
10 / 2 # Returns 5
# But this is an inefficient way to deal with data: think about a 1000 rows dataset,
# and you want to find the sum.
# It's going to take a bit to sum by hand every value in it
# So we can assign names to values
x <- 5
x # This prints the value of x, so we will se 5
x <- 7
x # As you can see the value of x has changed to 7
x = 5 # Is the same as above, you can use the one you prefer either "<-" or "="
x
# We can store whatever we want in a name
x <- "hello world"
x
x <- 5 * 5 # R will evaluate the expression "5 * 5" and then store its result in x
x
# We can do operations with variables
x <- 5
y <- 2
x * y # Returns 10
x + y # Returns 7
x # The values of x and y are unchanged
y
z <- x + y * x / y
z
x
y
# Getting back to our previous example, to manage many elements we need vectors
x <- c(1, 2, 3, 4, 5) # We create vectors with the c() function
x
# Vectors can contain integers (as above), floats or strings
x <- c(1.5, 2.3, 7.4) # Floats
y <- c("hello", "world", "bye", "moon") # Strings of text
# We can combine vectors,
# but remember that one vector can contain only one type among floats or strings
z <- c(x, 3, 5)
z
z <- c(x, y) # Here R converts the floats in x to strings
z
# Operations with vectors
x <- c(1, 2, 3)
y <- c(4, 5, 6)
x + y # R sums every value of x with the corresponding value of y
x * y
# What if we want to sum only the first value of x with the second value of y?
# We use indexing
x[1] # This prints the first element of x
y[2] # This prints the second element of y
x[1] + y[2] # Returns 6
x[1:2] # Returns the first two elements of x
x[1:2] + y[2:3] # Returns 6, 8
# Above we used ":" between two indexes
# we can use it also to generate sequences of values
x <- 1:10
x # Returns 1 2 3 4 5 6 7 8 9 10
x <- 10:1
x # Returns the same sequence as before, but inverted
# Returning to the same example, 1000 values to sum.
# We will use a couple of functions to demonstrate
# the speed and the capability of vectors and R
set.seed(123)
x <- runif(1000)
# runif() is a function, a function takes various
# arguments and will do some operation with them.
# When in doubt go with help "?functionName".
# In this case we generate 1000 random numbers.
# set.seed() is another function that blocks the seed
# for random generation, we need it otherwise you wouldn't get
# the same results as me
# (see previous sections Install R and Install RStudio)
sum(x) # sum() is a function that sums every element in a vector
# We can do the same for larger vectors
x <- runif(100000)
sum(x) # Returns 49931.65
x <- runif(1000000)
sum(x) # Returns 499638.4
# Data Frame
# Usually when we deal with data we want them in tabular format:
# organized by columns and rows.
# R has an internal method to deal with tabular data: data frames
x <- 1:10
y <- letters[1:10] # We take the first 10 letters of the alphabet
z <- rep(c("male", "female"), 5) # rep() replicates the first argument x times
df <- data.frame(id = x, status = y, sex = z)
df
# We created a dataframe from 3 vectors,
# as you can see it's organized as a table, and every column is a vector.
# We can call single or groups of rows and columns and do operations on them
df[1,1] # Returns the first row in the first column
df[1, ] # Returns the whole first row
df[ ,1] # Returns the first column
df[ ,"sex"] # Returns the column "sex"
# Another way to select columns is with the "$" operator
df$sex # Returns column "sex" as well
sum(df$id)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment