Download R here.
Run R. You will be greeted by the R console. You can interactively type in commands and it will respond, allowing you to experiment quickly.
An introduction to R can be found here.
As a quick aside, spacing, tabs, and newlines are optional. For now, feel free to use any "whitespace" you need in order to make the code readable for you.
R has a concept of variables, which are holding areas for simple values like numbers and strings, or complex values like vectors and matrices.
Start by storing a filename in a variable named "file". Type or paste this code into R.
file <- "/Users/groybal/Dropbox/galaxy1014.tabular"
Variables can be printed out like so:
print(file)
The R manual, found in the Help Menu under "R Help", is actually pretty useful.
You can have a script print something to the user using the print
method, like so: print('Hello, world!')
. You can ask the user for something using readline
, like so: string <- readline()
.
R lets you load in all kinds of table-based data such as spreadsheets. This data may or may not have headers, be separated by spaces or tabs, and can be of any size.
First, try loading in a table from a file. Later we'll run some calculations on some of the columns and spit out a new table.
table <- read.table(file, header = TRUE, sep = "\t", fill = TRUE, as.is = TRUE)
The code above loads data from a file referred to by the "file" variable. This particular file happens to have headers (header = TRUE
), tabs for separators (sep = "\t"
), has some empty cells (fill = TRUE
), and should be read in without doing any preparations on the table before handing it back to you (as.is = TRUE
).
Once you enter the above code, R will output a summary of some of the data.
You can also load some data from the clipboard. You can literally select some cells in Excel or another spreadsheet app, copy it, and use the following code to load the data into R.
table <- read.table( pipe('pbpaste', 'r'), header = TRUE, sep = "\t", fill = TRUE, as.is = TRUE )
You can inspect individual columns in R. You do this by picking the column by number. If you'd like to inspect the first column, you can type in table[1]
. This will retrieve the first column of the table referenced by the variable "table". Likewise, you can inspect the tenth column by typing in table[10]
.
You can perform math on whole columns. For example, to get the sum for each item between columns 11 and 12, do the following:
table[11] + table[12]
You can loop through rows like this:
column <- vector( mode = "numeric", length = nrow(table) ) for( i in 1:nrow(table) ) { column[i] <- ( table[i, 11] + table[i, 12] ) * 0.5 } print(column)
The above code creates a new column which is the average of the previous two.
Frames are a way of grabbing a slice of an existing table. Frames can also be used to gather together some pre-existing columns. Let's look how to gather together two columns from the variable "table" and add in our new column for good measure.
frame <- data.frame(a = table[11], b = table[12], c = column)
Once you've created a frame, you can use it to output your data to a file or to the clipboard. (You can also run calculations and other things on frames).
"a", "b", and "c" are heading names and can be replaced with any name you'd like. You can also have as many headings/columns as you would like.
Output the frame to the clipboard in a form suitable to paste into Excel.
clipboard <- pipe('pbcopy', 'w') write.csv(frame, clipboard, row.names=FALSE) close(clipboard)
Output the frame to a file in a tabular form.
write.table(frame, "/Users/groybal/Dropbox/galaxy2.tabular")
Tables can also be directly output to a file.
This example reads a table from the clipboard, then spits out a new table to the clipboard containing 3 columns with headers. The first column is column 11 from the first table; the second column is column 12 from the first table; the third is a sum of the previous two columns.
table <- read.table(pipe('pbpaste', 'r'), header = TRUE, sep = "\t", fill = TRUE) column <- table[11] + table[12] frame <- data.frame( a = table[11], b = table[12], c = column) clipboard <- pipe('pbcopy', 'w') write.csv(frame, clipboard, row.names=FALSE) close(clipboard)