Skip to content

Instantly share code, notes, and snippets.

@rohitfarmer
Last active March 4, 2024 17:18
Show Gist options
  • Save rohitfarmer/6b3228628bb42530868c429ad687ca2a to your computer and use it in GitHub Desktop.
Save rohitfarmer/6b3228628bb42530868c429ad687ca2a to your computer and use it in GitHub Desktop.
My R Cookbook

Contents

  1. Package Installation
  2. Create a Formula from a String
  3. Data frame manipulation
  4. Plots
  5. Parallel processing
  6. Files and Folders

Package Installation

install.packages('arrow', repos='http://cran.us.r-project.org', dependencies=TRUE)

Create a Formula from a String

In the most basic case, use as.formula():

This returns a string:

"y ~ x1 + x2"
> [1] "y ~ x1 + x2"

This returns a formula:

as.formula("y ~ x1 + x2")
#> y ~ x1 + x2
#> <environment: 0x3361710>

Here is an example of how it might be used: These are the variable names:

measurevar <- "y"
groupvars  <- c("x1","x2","x3")

This creates the appropriate string:

paste(measurevar, paste(groupvars, collapse=" + "), sep=" ~ ")
> [1] "y ~ x1 + x2 + x3"

This returns the formula:

as.formula(paste(measurevar, paste(groupvars, collapse=" + "), sep=" ~ "))
> y ~ x1 + x2 + x3
> <environment: 0x3361710>

Data frame manipulation

Gather and Spread

Gather is to convert a table from wide to long-form by putting column names in a single column titled with a chosen “key” and all the values from those columns to an adjacent column title with a chosen “value”. If there are columns that should be kept the way they are then use -Name of the column of -c(vector with column names).

Df <- gather(data, key =Markers”, value =Expression”, -CellPopulation)
  mate <- dplyr::select(comb_dat, cell_population, stim_type, estimate) %>%
    spread(key = stim_type, value = estimate) %>%
    column_to_rownames(var = "cell_population") %>%
    as.matrix()

Examples
Select columns from a dataframe and coerce the data to a matrix. Also, group the data and perform min-max scaling per group.

cytof_mat <- cytof_m_ranks$attribute_stats %>%
  mutate(cell_population = paste0(cell_population, " CYTOF")) %>%
  dplyr::select(cell_population, state_marker, meanImp) %>%
  dplyr::group_by(cell_population) %>%
  mutate(min_max = (meanImp - min(meanImp)) /(max(meanImp)-min(meanImp))) %>%
  dplyr::select(cell_population, state_marker, min_max) %>%
  spread(key = state_marker, value = min_max) %>% 
  column_to_rownames(var = "cell_population") %>%
  as.matrix()

Mutate

Mutate at

The example below formats and round off values to six decimal places in all the columns specified by vars(). You do not need to select columns to use mutate_at(). It performs the operation on the specified columns keeping the rest of the data as it is.

manual_dat %>%  mutate_at(vars(all_of(markers)), ~ as.numeric(format(round(., 6))))

In a dataframe conditionally replace values in all numeric columns. This was helpful in creating the p.value matrix with * for values < 0.05 for ComplexHeatmaps.

mutate(across(where(is.numeric), ~ifelse(. < 0.05, "*", "")))

Coalesce

Replace NA values in the second column with values from the first column

my_tibble_updated <- my_tibble %>%
  mutate(col2 = coalesce(col2, col1))

Separate

Separate values of a column by a delimiter and add additional columns with the separated values.

df %>% separate(subject_visit, into = c("subject", "visit"), sep = "_")

Plots

Colorblind-friendly palette

# The palette with grey:
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# The palette with black:
cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# To use for fills, add
  scale_fill_manual(values=cbPalette)

# To use for line and point colors, add
  scale_colour_manual(values=cbPalette)

Main article: http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/

Complex Heatmaps

Put Text in the Cells

hmap <- ComplexHeatmap::Heatmap(mat, cluster_rows = TRUE, cluster_columns = TRUE,
                                heatmap_legend_param = list(title = "Mean\nImportance"),
                                column_split = c_breaks,
                                cell_fun = function(j, i, x, y, width, height, fill) {
                                  if(mat_decision[i, j] == "Confirmed"){
                                    grid.text("C", x, y, gp = gpar(fontsize = 6))
                                  }else if(mat_decision[i, j] == "Tentative"){
                                    grid.text("T", x, y, gp = gpar(fontsize = 6))
                                  }
                                  
                                },
                                column_names_gp = gpar(fontsize = 10),
                                row_names_gp = gpar(fontsize = 10)
)

# Save the plot
plot_file <- file.path("figures", "heatpmap.png")
png(filename = plot_file, width = 12 * 300, height = 14 * 300, res = 300)
raw(hmap_1, padding = unit(c(3.5, 1, 1, 1), "cm")) # Bottom, left, top, right
dev.off() 

Parallel processing

doMC

library(doMC)
cores <- detectCores(all.tests = FALSE, logical = FALSE)
registerDoMC(cores)

foreach(i=1:3, .combine=rbind) %dopar% {
  sqrt(i)
 }

Files and Folders

List file names in a directory

filenames <- Sys.glob(file.path(selected_data_path,"*.rds"))
files <- list.files(file.path(dir), pattern = "*.feather", full.names = TRUE/FALSE)

# full.names is a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. 
# If FALSE, the file names (rather than paths) are returned. 

Remove file extension

tools::file_path_sans_ext(file.tsv)

Extract the terminal file name and the path leading upto the terminal file name

basename(path)
dirname(path)

# basename removes all of the path up to and including the last path separator (if any).
# dirname returns the part of the path up to but excluding the last path separator, or "." if there is no path separator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment