Skip to content

Instantly share code, notes, and snippets.

@pizofreude
Created June 26, 2025 16:55
Show Gist options
  • Save pizofreude/b31ff4ca8dff94c8bad2b14114820426 to your computer and use it in GitHub Desktop.
Save pizofreude/b31ff4ca8dff94c8bad2b14114820426 to your computer and use it in GitHub Desktop.
Cheatsheet for R, RStudio Cloud, RStudio Desktop

📚 R & RStudio: Working Directory Cheatsheet

This cheatsheet covers how to control and troubleshoot the working directory in R, RStudio Desktop, and RStudio Cloud. A correct working directory makes data import, script sourcing, and project management much smoother.


1️⃣ RStudio Desktop: Setting the Working Directory

A. Launch from Terminal with Correct Directory

Instead of just:

rstudio .

Use:

rstudio --cwd /path/to/your/directory

Example:

rstudio --cwd /c/workspace/My_Projects/alarm-projects

This ensures RStudio starts in the specified directory.


B. Change Directory Inside RStudio

  • Menu: SessionSet Working DirectoryChoose Directory...
  • Shortcut: Ctrl + Shift + H
  • R Console Command:
    setwd("C:/workspace/My_Projects/alarm-projects")

C. Set a Default Working Directory (for all new sessions)

  1. Go to ToolsGlobal OptionsGeneral
  2. Under Default working directory, set your path (e.g., C:/workspace/My_Projects/alarm-projects)
  3. Click Apply and restart RStudio

D. Use RStudio Projects for Best Practice

RStudio Projects automatically set the working directory to the project folder.

  1. FileNew ProjectExisting Directory
  2. Select your folder (e.g., C:/workspace/My_Projects/alarm-projects)
  3. RStudio creates a .Rproj file—always open this file to launch the project with the right directory!

2️⃣ RStudio Cloud: Working Directory Tips

  • RStudio Cloud always starts in the project’s root directory.
  • For reproducibility, always use RStudio Projects in the cloud too.
  • To check your current directory:
    getwd()
  • To change it:
    setwd("/cloud/project/subfolder")
  • Upload files to /cloud/project for easy access.

3️⃣ R Console (Base R): Set or Check Working Directory

  • Check current directory:
    getwd()
  • Set working directory:
    setwd("/path/to/your/directory")

4️⃣ Common Troubleshooting

  • Paths on Windows: use either / or double backslashes \\ (never single \).
  • Always check your current directory with getwd() if file loading fails.
  • Use Projects whenever possible—they save a ton of headaches!

5️⃣ Reference


Pro Tip:
Always use RStudio Projects for each analysis or codebase. They save window layouts, history, and—most importantly—set your working directory automatically!


Last updated: 2025-06-26

@pizofreude
Copy link
Author

pizofreude commented Jul 9, 2025

VS Code Settings for R

To implement the specified R terminal options in VS Code settings, follow these steps:

  1. Open VS Code settings: Press Ctrl + Shift + P (Windows/Linux) or Cmd + Shift + P (Mac) to open the Command Palette. Type "Open User Settings (JSON)" and select the option.
  2. Add R terminal options: In the settings.json file, add the following code:
{
    "r.rterm.option": [
        "--r-binary=/c/Program Files/R/R-4.5.1/bin/R", // Replace with your R path as set PATH
        "--no-save", // Optional: Prevent saving the workspace at the end of the session
        "--no-restore", // Optional: Prevent restoring workspaces
    ],
    "r.rterm.windows": "C:\\Program Files\\R\\R-4.5.1\\bin\\R.exe", // Replace with your R path
    "r.rterm.linux": "/usr/bin/R", // Replace with your R path on Linux
    "r.rterm.mac": "/usr/local/bin/R" // Replace with your R path on macOS
}
  1. Save the settings: Save the settings.json file to apply the changes.

Note: Make sure to replace "/c/Program Files/R/R-4.5.1/bin/R" with the actual path to your R executable as set in your system's PATH environment variable.

By adding these options, you can control the behavior of the R terminal in VS Code, such as preventing workspace saving and restoring.

Quick R Test Run in VS Code

# File: demo.R
example <- 123

example_data <- data.frame(
    ID = 1:10,
    Age = sample(18:50, 10, replace = TRUE),
    Score = round(runif(10, 50, 100), 1)
)

print(example_data)

hist(example_data$Age,
    main = "Histogram of Ages",
    xlab = "Age",
    ylab = "Frequency",
    col = "lightblue", 
    border = "black"
)

@pizofreude
Copy link
Author

pizofreude commented Jul 11, 2025

R Packages for Data Analytics & Engineering

This list of R packages is an excellent starting point for a professional data analyst. It covers a wide range of essential tasks, from data manipulation and visualization to reporting and project management. Here's a detailed breakdown of the list and some additional recommendations.

High-Quality Package Selection

The packages listed are widely recognized and frequently used in the data analysis community. Here's a look at their primary functions:

Core Data Science Workflow:

  • tidyverse: This is a powerful collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures. It's an essential installation, and it's worth noting that it already includes several of the other packages on the list: ggplot2, dplyr, readr, stringr, forcats, and tibble.
  • lubridate: This package simplifies working with dates and times, a common and often tricky task in data analysis.
  • janitor: Excellent for cleaning and examining data, with functions for tidying up column names and finding duplicate records.
  • skimr: Provides a quick and comprehensive overview of your dataset, offering more detailed summary statistics than the base R summary() function.

Data Manipulation and Visualization:

  • dplyr: A core component of the tidyverse, it provides a consistent set of verbs to solve the most common data manipulation challenges.
  • ggplot2: The go-to package for creating a wide variety of static and interactive data visualizations in R.
  • readr: Part of the tidyverse, it provides a fast and friendly way to read rectangular data like CSV files.
  • stringr: This tidyverse package offers a cohesive set of functions for working with strings, which is crucial for handling text data.
  • forcats: A tidyverse package that provides tools for working with categorical variables (factors).
  • tibble: A modern take on data frames, also part of the tidyverse. They offer a more user-friendly printing method and are stricter in their behavior, which helps to catch errors earlier.

Reporting and Project Management:

  • knitr and rmarkdown: These packages are fundamental for creating dynamic and reproducible reports, presentations, and dashboards that combine code, output, and narrative text.
  • testthat: A package for unit testing your code. While not directly for data analysis, it's a critical tool for ensuring the quality and reliability of your work, especially in professional settings.
  • arrow: This package is particularly useful for working with large datasets, enabling efficient data reading and writing.
  • here: Simplifies the process of managing file paths in your projects, making your work more portable and reproducible.
  • packrat: A tool for managing project-specific package dependencies, ensuring that your projects are isolated and reproducible. However, it is worth noting that renv is now the recommended successor.

Specialized Analysis:

  • SimDesign: A package designed for conducting Monte Carlo simulations. While powerful, its necessity depends on the specific requirements of your data analysis role.

Additional Essential R Packages to Consider

The above list is very thorough, but here are a few more packages that are highly recommended for a professional data analyst:

  • tidyr: This is a core member of the tidyverse. It provides functions to help you create "tidy" data, where each variable is a column, each observation is a row, and each type of observational unit is a table. It's indispensable for data cleaning and reshaping.
  • data.table: An alternative to dplyr for data manipulation. It is renowned for its high performance and memory efficiency, making it a great choice for working with very large datasets.
  • shiny: If you need to create interactive web applications and dashboards directly from your R code, shiny is the go-to package.
  • quarto: A next-generation open-source scientific and technical publishing system from Posit (formerly RStudio). It can be considered a successor to R Markdown and offers broader language support and more features.
  • renv: As mentioned, renv is the modern successor to packrat for project dependency management. It is now the recommended tool for creating reproducible R environments.

Packages for Machine Learning

If your role as a data analyst extends to predictive modeling and machine learning, you should also consider:

  • tidymodels: A collection of packages for modeling and machine learning that follows the principles of the tidyverse. It provides a consistent and flexible framework for the entire modeling process.
  • caret: An older but still very popular package that provides a unified interface to a vast number of machine learning algorithms.
  • mlr3: Another powerful and extensible framework for machine learning in R.

Presentation Tools

Use modern, professional business slides in R, Quarto + reveal.js which is arguably the best overall.

@pizofreude
Copy link
Author

pizofreude commented Jul 11, 2025

Example Usage of R Packages for Data Analytics & Engineering

Based on our specific use cases for the Divvy bike data engineering project, here's a curated list of R packages optimized for our needs: I've created a curated list of 16 essential R packages specifically tailored for our Divvy data engineering project. Here's why this selection is optimized for our needs:

Key Highlights:

Core Advantages:

  • Minimal Package Count: Only 16 packages vs. the 15+ mentioned in the full listings, avoiding bloat
  • Project-Specific: Focused on bike share data analysis, revenue calculations, and Redshift connectivity
  • Professional Presentation: Quarto + reveal.js for modern business slides
  • ELT Integration: Database connectivity packages for our Redshift-based architecture

What I Excluded and Why:

  • arrow - We're using Redshift, not direct Parquet manipulation
  • data.table - Tidyverse is sufficient for our analysis scale
  • shiny - We're using Tableau Public for final dashboards
  • testthat - dbt handles our data testing needs
  • Machine Learning packages - Not required for our business analysis focus

Project-Specific Inclusions:

  • DBI, RPostgres, dbplyr - Essential for connecting R to our Redshift data warehouse
  • scales - Perfect for formatting revenue calculations ($0.19/minute, percentages)
  • plotly - Interactive exploration of station utilization patterns
  • lubridate - Critical for analyzing trip timestamps and duration calculations

Installation Strategy:

The artifact provides a phased installation approach so we can install packages as needed, plus the complete renv workflow for reproducible environments.


Essential R Packages for Divvy Data Engineering Project

Core Installation Command for renv

# Initialize renv environment
renv::init()

# Install essential packages
renv::install(c(
  # Core tidyverse (includes ggplot2, dplyr, readr, stringr, forcats, tibble)
  "tidyverse",
  "tidyr",
  
  # Data manipulation and analysis
  "lubridate",
  "janitor",
  "skimr",
  
  # Database connectivity (for Redshift)
  "DBI",
  "RPostgres",
  "dbplyr",
  
  # Visualization and rapid prototyping
  "plotly",
  "scales",
  "viridis",
  "patchwork",
  
  # Presentation and reporting
  "quarto",
  "knitr",
  "rmarkdown",
  
  # Project management
  "here",
  "renv"
))

# Snapshot the environment
renv::snapshot()

Package Categories and Justifications

1. Core Data Science Workflow (Essential)

  • tidyverse - Comprehensive suite including ggplot2, dplyr, readr, stringr, forcats, tibble
  • tidyr - Data reshaping and cleaning (not automatically included in tidyverse)
  • lubridate - Date/time manipulation (crucial for bike trip timestamps)
  • janitor - Data cleaning and column name standardization
  • skimr - Quick dataset overviews and summary statistics

2. Database Connectivity (Project-Specific)

  • DBI - Database interface foundation
  • RPostgres - PostgreSQL/Redshift connectivity
  • dbplyr - dplyr syntax for database queries (essential for Redshift integration)

3. Visualization and Rapid Prototyping (Core Need)

  • plotly - Interactive visualizations for exploration
  • scales - Scale functions for ggplot2 (revenue formatting, percentages)
  • viridis - Color scales that are colorblind-friendly
  • patchwork - Combining multiple ggplot2 plots

4. Presentation Tools (Your Preference)

  • quarto - Modern publishing system with reveal.js integration
  • knitr - Code chunk processing (required by quarto)
  • rmarkdown - Markdown processing (quarto dependency)

5. Project Management (Professional Standards)

  • here - Robust file path management
  • renv - Package dependency management (already chosen)

Packages NOT Recommended for Your Use Case

Skip These (Not Needed):

  • arrow - You're using Redshift, not Parquet files directly in R
  • data.table - tidyverse approach is sufficient for your analysis scale
  • testthat - dbt handles data testing; R code will be exploratory
  • SimDesign - Monte Carlo simulations not relevant to bike share analysis
  • shiny - Using Tableau Public for final dashboards
  • tidymodels/caret/mlr3 - No machine learning requirements mentioned
  • packrat - Superseded by renv

Installation Strategy

Phase 1: Core Setup

# Essential packages for immediate work
core_packages <- c("tidyverse", "tidyr", "lubridate", "janitor", "skimr", "here")
renv::install(core_packages)
# OR
install.packages(core_packages)

Phase 2: Database Integration

# Database connectivity for Redshift
db_packages <- c("DBI", "RPostgres", "dbplyr")
renv::install(db_packages)
# OR
install.packages(db_packages)

Phase 3: Visualization Enhancement

# Advanced visualization capabilities
viz_packages <- c("plotly", "scales", "viridis", "patchwork")
renv::install(viz_packages)
# OR
install.packages(viz_packages)

Phase 4: Presentation Tools

# Modern presentation system
presentation_packages <- c("quarto", "knitr", "rmarkdown")
renv::install(presentation_packages)
# OR
install.packages(presentation_packages)

Project-Specific Considerations

For Divvy Data Analysis:

  • lubridate - Essential for trip start/end time analysis
  • scales - Format revenue calculations (dollar signs, percentages)
  • dbplyr - Write dplyr code that translates to SQL for Redshift
  • plotly - Interactive exploration of station utilization patterns

For Business Presentations:

  • quarto + reveal.js - Professional slide presentations
  • viridis - Accessible color palettes for executive presentations
  • patchwork - Combine multiple revenue/usage charts

renv Workflow

# Initialize project
renv::init()

# Install packages as needed
renv::install("package_name")
# OR
install.packages("package_name")

# Snapshot current state
renv::snapshot()

# Share project (others can restore with)
renv::restore()

Total Package Count: 16 Essential Packages

This curated list focuses on your specific needs while avoiding bloat. The selection prioritizes:

  1. Redshift connectivity for ELT pipeline integration
  2. Rapid visualization prototyping with ggplot2 ecosystem
  3. Professional presentation capabilities with Quarto
  4. Business-focused analysis tools for revenue and operational metrics

This streamlined approach ensures fast installation, minimal dependency conflicts, and focused functionality for your Divvy bike data engineering project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment