Skip to content

Instantly share code, notes, and snippets.

@pizofreude
Created June 26, 2025 16:55
Show Gist options
  • Save pizofreude/b31ff4ca8dff94c8bad2b14114820426 to your computer and use it in GitHub Desktop.
Save pizofreude/b31ff4ca8dff94c8bad2b14114820426 to your computer and use it in GitHub Desktop.
Cheatsheet for R, RStudio Cloud, RStudio Desktop

📚 R & RStudio: Working Directory Cheatsheet

This cheatsheet covers how to control and troubleshoot the working directory in R, RStudio Desktop, and RStudio Cloud. A correct working directory makes data import, script sourcing, and project management much smoother.


1️⃣ RStudio Desktop: Setting the Working Directory

A. Launch from Terminal with Correct Directory

Instead of just:

rstudio .

Use:

rstudio --cwd /path/to/your/directory

Example:

rstudio --cwd /c/workspace/My_Projects/alarm-projects

This ensures RStudio starts in the specified directory.


B. Change Directory Inside RStudio

  • Menu: SessionSet Working DirectoryChoose Directory...
  • Shortcut: Ctrl + Shift + H
  • R Console Command:
    setwd("C:/workspace/My_Projects/alarm-projects")

C. Set a Default Working Directory (for all new sessions)

  1. Go to ToolsGlobal OptionsGeneral
  2. Under Default working directory, set your path (e.g., C:/workspace/My_Projects/alarm-projects)
  3. Click Apply and restart RStudio

D. Use RStudio Projects for Best Practice

RStudio Projects automatically set the working directory to the project folder.

  1. FileNew ProjectExisting Directory
  2. Select your folder (e.g., C:/workspace/My_Projects/alarm-projects)
  3. RStudio creates a .Rproj file—always open this file to launch the project with the right directory!

2️⃣ RStudio Cloud: Working Directory Tips

  • RStudio Cloud always starts in the project’s root directory.
  • For reproducibility, always use RStudio Projects in the cloud too.
  • To check your current directory:
    getwd()
  • To change it:
    setwd("/cloud/project/subfolder")
  • Upload files to /cloud/project for easy access.

3️⃣ R Console (Base R): Set or Check Working Directory

  • Check current directory:
    getwd()
  • Set working directory:
    setwd("/path/to/your/directory")

4️⃣ Common Troubleshooting

  • Paths on Windows: use either / or double backslashes \\ (never single \).
  • Always check your current directory with getwd() if file loading fails.
  • Use Projects whenever possible—they save a ton of headaches!

5️⃣ Reference


Pro Tip:
Always use RStudio Projects for each analysis or codebase. They save window layouts, history, and—most importantly—set your working directory automatically!


Last updated: 2025-06-26

@pizofreude
Copy link
Author

pizofreude commented Jul 11, 2025

Example Usage of R Packages for Data Analytics & Engineering

Based on our specific use cases for the Divvy bike data engineering project, here's a curated list of R packages optimized for our needs: I've created a curated list of 16 essential R packages specifically tailored for our Divvy data engineering project. Here's why this selection is optimized for our needs:

Key Highlights:

Core Advantages:

  • Minimal Package Count: Only 16 packages vs. the 15+ mentioned in the full listings, avoiding bloat
  • Project-Specific: Focused on bike share data analysis, revenue calculations, and Redshift connectivity
  • Professional Presentation: Quarto + reveal.js for modern business slides
  • ELT Integration: Database connectivity packages for our Redshift-based architecture

What I Excluded and Why:

  • arrow - We're using Redshift, not direct Parquet manipulation
  • data.table - Tidyverse is sufficient for our analysis scale
  • shiny - We're using Tableau Public for final dashboards
  • testthat - dbt handles our data testing needs
  • Machine Learning packages - Not required for our business analysis focus

Project-Specific Inclusions:

  • DBI, RPostgres, dbplyr - Essential for connecting R to our Redshift data warehouse
  • scales - Perfect for formatting revenue calculations ($0.19/minute, percentages)
  • plotly - Interactive exploration of station utilization patterns
  • lubridate - Critical for analyzing trip timestamps and duration calculations

Installation Strategy:

The artifact provides a phased installation approach so we can install packages as needed, plus the complete renv workflow for reproducible environments.


Essential R Packages for Divvy Data Engineering Project

Core Installation Command for renv

# Initialize renv environment
renv::init()

# Install essential packages
renv::install(c(
  # Core tidyverse (includes ggplot2, dplyr, readr, stringr, forcats, tibble)
  "tidyverse",
  "tidyr",
  
  # Data manipulation and analysis
  "lubridate",
  "janitor",
  "skimr",
  
  # Database connectivity (for Redshift)
  "DBI",
  "RPostgres",
  "dbplyr",
  
  # Visualization and rapid prototyping
  "plotly",
  "scales",
  "viridis",
  "patchwork",
  
  # Presentation and reporting
  "quarto",
  "knitr",
  "rmarkdown",
  
  # Project management
  "here",
  "renv"
))

# Snapshot the environment
renv::snapshot()

Package Categories and Justifications

1. Core Data Science Workflow (Essential)

  • tidyverse - Comprehensive suite including ggplot2, dplyr, readr, stringr, forcats, tibble
  • tidyr - Data reshaping and cleaning (not automatically included in tidyverse)
  • lubridate - Date/time manipulation (crucial for bike trip timestamps)
  • janitor - Data cleaning and column name standardization
  • skimr - Quick dataset overviews and summary statistics

2. Database Connectivity (Project-Specific)

  • DBI - Database interface foundation
  • RPostgres - PostgreSQL/Redshift connectivity
  • dbplyr - dplyr syntax for database queries (essential for Redshift integration)

3. Visualization and Rapid Prototyping (Core Need)

  • plotly - Interactive visualizations for exploration
  • scales - Scale functions for ggplot2 (revenue formatting, percentages)
  • viridis - Color scales that are colorblind-friendly
  • patchwork - Combining multiple ggplot2 plots

4. Presentation Tools (Your Preference)

  • quarto - Modern publishing system with reveal.js integration
  • knitr - Code chunk processing (required by quarto)
  • rmarkdown - Markdown processing (quarto dependency)

5. Project Management (Professional Standards)

  • here - Robust file path management
  • renv - Package dependency management (already chosen)

Packages NOT Recommended for Your Use Case

Skip These (Not Needed):

  • arrow - You're using Redshift, not Parquet files directly in R
  • data.table - tidyverse approach is sufficient for your analysis scale
  • testthat - dbt handles data testing; R code will be exploratory
  • SimDesign - Monte Carlo simulations not relevant to bike share analysis
  • shiny - Using Tableau Public for final dashboards
  • tidymodels/caret/mlr3 - No machine learning requirements mentioned
  • packrat - Superseded by renv

Installation Strategy

Phase 1: Core Setup

# Essential packages for immediate work
core_packages <- c("tidyverse", "tidyr", "lubridate", "janitor", "skimr", "here")
renv::install(core_packages)
# OR
install.packages(core_packages)

Phase 2: Database Integration

# Database connectivity for Redshift
db_packages <- c("DBI", "RPostgres", "dbplyr")
renv::install(db_packages)
# OR
install.packages(db_packages)

Phase 3: Visualization Enhancement

# Advanced visualization capabilities
viz_packages <- c("plotly", "scales", "viridis", "patchwork")
renv::install(viz_packages)
# OR
install.packages(viz_packages)

Phase 4: Presentation Tools

# Modern presentation system
presentation_packages <- c("quarto", "knitr", "rmarkdown")
renv::install(presentation_packages)
# OR
install.packages(presentation_packages)

Project-Specific Considerations

For Divvy Data Analysis:

  • lubridate - Essential for trip start/end time analysis
  • scales - Format revenue calculations (dollar signs, percentages)
  • dbplyr - Write dplyr code that translates to SQL for Redshift
  • plotly - Interactive exploration of station utilization patterns

For Business Presentations:

  • quarto + reveal.js - Professional slide presentations
  • viridis - Accessible color palettes for executive presentations
  • patchwork - Combine multiple revenue/usage charts

renv Workflow

# Initialize project
renv::init()

# Install packages as needed
renv::install("package_name")
# OR
install.packages("package_name")

# Snapshot current state
renv::snapshot()

# Share project (others can restore with)
renv::restore()

Total Package Count: 16 Essential Packages

This curated list focuses on your specific needs while avoiding bloat. The selection prioritizes:

  1. Redshift connectivity for ELT pipeline integration
  2. Rapid visualization prototyping with ggplot2 ecosystem
  3. Professional presentation capabilities with Quarto
  4. Business-focused analysis tools for revenue and operational metrics

This streamlined approach ensures fast installation, minimal dependency conflicts, and focused functionality for your Divvy bike data engineering project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment