pizofreude/R_RStudioCloud_RStudioDesktop.md

Created June 26, 2025 16:55

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/pizofreude/b31ff4ca8dff94c8bad2b14114820426.js"></script>
Save pizofreude/b31ff4ca8dff94c8bad2b14114820426 to your computer and use it in GitHub Desktop.

Download ZIP

Cheatsheet for R, RStudio Cloud, RStudio Desktop

Raw

R_RStudioCloud_RStudioDesktop.md

📚 R & RStudio: Working Directory Cheatsheet

This cheatsheet covers how to control and troubleshoot the working directory in R, RStudio Desktop, and RStudio Cloud. A correct working directory makes data import, script sourcing, and project management much smoother.

1️⃣ RStudio Desktop: Setting the Working Directory

A. Launch from Terminal with Correct Directory

Instead of just:

rstudio .

Use:

rstudio --cwd /path/to/your/directory

Example:

rstudio --cwd /c/workspace/My_Projects/alarm-projects

This ensures RStudio starts in the specified directory.

B. Change Directory Inside RStudio

Menu: Session → Set Working Directory → Choose Directory...
Shortcut: Ctrl + Shift + H

R Console Command:

setwd("C:/workspace/My_Projects/alarm-projects")

C. Set a Default Working Directory (for all new sessions)

Go to Tools → Global Options → General
Under Default working directory, set your path (e.g., C:/workspace/My_Projects/alarm-projects)
Click Apply and restart RStudio

D. Use RStudio Projects for Best Practice

RStudio Projects automatically set the working directory to the project folder.

File → New Project → Existing Directory
Select your folder (e.g., C:/workspace/My_Projects/alarm-projects)
RStudio creates a .Rproj file—always open this file to launch the project with the right directory!

2️⃣ RStudio Cloud: Working Directory Tips

RStudio Cloud always starts in the project’s root directory.
For reproducibility, always use RStudio Projects in the cloud too.
To check your current directory:
```
getwd()
```
To change it:
```
setwd("/cloud/project/subfolder")
```
Upload files to /cloud/project for easy access.

3️⃣ R Console (Base R): Set or Check Working Directory

Check current directory:
```
getwd()
```
Set working directory:
```
setwd("/path/to/your/directory")
```

4️⃣ Common Troubleshooting

Paths on Windows: use either / or double backslashes \\ (never single \).
Always check your current directory with getwd() if file loading fails.
Use Projects whenever possible—they save a ton of headaches!

5️⃣ Reference

Pro Tip:
Always use RStudio Projects for each analysis or codebase. They save window layouts, history, and—most importantly—set your working directory automatically!

Last updated: 2025-06-26

Author

pizofreude commented Jul 9, 2025 •

edited

Loading

VS Code Settings for R

To implement the specified R terminal options in VS Code settings, follow these steps:

Open VS Code settings: Press Ctrl + Shift + P (Windows/Linux) or Cmd + Shift + P (Mac) to open the Command Palette. Type "Open User Settings (JSON)" and select the option.
Add R terminal options: In the settings.json file, add the following code:

{
    "r.rterm.option": [
        "--r-binary=/c/Program Files/R/R-4.5.1/bin/R", // Replace with your R path as set PATH
        "--no-save", // Optional: Prevent saving the workspace at the end of the session
        "--no-restore", // Optional: Prevent restoring workspaces
    ],
    "r.rterm.windows": "C:\\Program Files\\R\\R-4.5.1\\bin\\R.exe", // Replace with your R path
    "r.rterm.linux": "/usr/bin/R", // Replace with your R path on Linux
    "r.rterm.mac": "/usr/local/bin/R" // Replace with your R path on macOS
}

Save the settings: Save the settings.json file to apply the changes.

Note: Make sure to replace "/c/Program Files/R/R-4.5.1/bin/R" with the actual path to your R executable as set in your system's PATH environment variable.

By adding these options, you can control the behavior of the R terminal in VS Code, such as preventing workspace saving and restoring.

Quick R Test Run in VS Code

# File: demo.R
example <- 123

example_data <- data.frame(
    ID = 1:10,
    Age = sample(18:50, 10, replace = TRUE),
    Score = round(runif(10, 50, 100), 1)
)

print(example_data)

hist(example_data$Age,
    main = "Histogram of Ages",
    xlab = "Age",
    ylab = "Frequency",
    col = "lightblue", 
    border = "black"
)

Author

pizofreude commented Jul 11, 2025 •

edited

Loading

R Packages for Data Analytics & Engineering

This list of R packages is an excellent starting point for a professional data analyst. It covers a wide range of essential tasks, from data manipulation and visualization to reporting and project management. Here's a detailed breakdown of the list and some additional recommendations.

High-Quality Package Selection

The packages listed are widely recognized and frequently used in the data analysis community. Here's a look at their primary functions:

Core Data Science Workflow:

tidyverse: This is a powerful collection of R packages designed for data science that share an underlying design philosophy, grammar, and data structures. It's an essential installation, and it's worth noting that it already includes several of the other packages on the list: ggplot2, dplyr, readr, stringr, forcats, and tibble.
lubridate: This package simplifies working with dates and times, a common and often tricky task in data analysis.
janitor: Excellent for cleaning and examining data, with functions for tidying up column names and finding duplicate records.
skimr: Provides a quick and comprehensive overview of your dataset, offering more detailed summary statistics than the base R summary() function.

Data Manipulation and Visualization:

dplyr: A core component of the tidyverse, it provides a consistent set of verbs to solve the most common data manipulation challenges.
ggplot2: The go-to package for creating a wide variety of static and interactive data visualizations in R.
readr: Part of the tidyverse, it provides a fast and friendly way to read rectangular data like CSV files.
stringr: This tidyverse package offers a cohesive set of functions for working with strings, which is crucial for handling text data.
forcats: A tidyverse package that provides tools for working with categorical variables (factors).
tibble: A modern take on data frames, also part of the tidyverse. They offer a more user-friendly printing method and are stricter in their behavior, which helps to catch errors earlier.

Reporting and Project Management:

knitr and rmarkdown: These packages are fundamental for creating dynamic and reproducible reports, presentations, and dashboards that combine code, output, and narrative text.
testthat: A package for unit testing your code. While not directly for data analysis, it's a critical tool for ensuring the quality and reliability of your work, especially in professional settings.
arrow: This package is particularly useful for working with large datasets, enabling efficient data reading and writing.
here: Simplifies the process of managing file paths in your projects, making your work more portable and reproducible.
packrat: A tool for managing project-specific package dependencies, ensuring that your projects are isolated and reproducible. However, it is worth noting that renv is now the recommended successor.

Specialized Analysis:

SimDesign: A package designed for conducting Monte Carlo simulations. While powerful, its necessity depends on the specific requirements of your data analysis role.

Additional Essential R Packages to Consider

The above list is very thorough, but here are a few more packages that are highly recommended for a professional data analyst:

tidyr: This is a core member of the tidyverse. It provides functions to help you create "tidy" data, where each variable is a column, each observation is a row, and each type of observational unit is a table. It's indispensable for data cleaning and reshaping.
data.table: An alternative to dplyr for data manipulation. It is renowned for its high performance and memory efficiency, making it a great choice for working with very large datasets.
shiny: If you need to create interactive web applications and dashboards directly from your R code, shiny is the go-to package.
quarto: A next-generation open-source scientific and technical publishing system from Posit (formerly RStudio). It can be considered a successor to R Markdown and offers broader language support and more features.
renv: As mentioned, renv is the modern successor to packrat for project dependency management. It is now the recommended tool for creating reproducible R environments.

Packages for Machine Learning

If your role as a data analyst extends to predictive modeling and machine learning, you should also consider:

tidymodels: A collection of packages for modeling and machine learning that follows the principles of the tidyverse. It provides a consistent and flexible framework for the entire modeling process.
caret: An older but still very popular package that provides a unified interface to a vast number of machine learning algorithms.
mlr3: Another powerful and extensible framework for machine learning in R.

Presentation Tools

Use modern, professional business slides in R, Quarto + reveal.js which is arguably the best overall.

Author

pizofreude commented Jul 11, 2025 •

edited

Loading

Example Usage of R Packages for Data Analytics & Engineering

Based on our specific use cases for the Divvy bike data engineering project, here's a curated list of R packages optimized for our needs: I've created a curated list of 16 essential R packages specifically tailored for our Divvy data engineering project. Here's why this selection is optimized for our needs:

Key Highlights:

Core Advantages:

Minimal Package Count: Only 16 packages vs. the 15+ mentioned in the full listings, avoiding bloat
Project-Specific: Focused on bike share data analysis, revenue calculations, and Redshift connectivity
Professional Presentation: Quarto + reveal.js for modern business slides
ELT Integration: Database connectivity packages for our Redshift-based architecture

What I Excluded and Why:

arrow - We're using Redshift, not direct Parquet manipulation
data.table - Tidyverse is sufficient for our analysis scale
shiny - We're using Tableau Public for final dashboards
testthat - dbt handles our data testing needs
Machine Learning packages - Not required for our business analysis focus

Project-Specific Inclusions:

DBI, RPostgres, dbplyr - Essential for connecting R to our Redshift data warehouse
scales - Perfect for formatting revenue calculations ($0.19/minute, percentages)
plotly - Interactive exploration of station utilization patterns
lubridate - Critical for analyzing trip timestamps and duration calculations

Installation Strategy:

The artifact provides a phased installation approach so we can install packages as needed, plus the complete renv workflow for reproducible environments.

Essential R Packages for Divvy Data Engineering Project

Core Installation Command for renv

# Initialize renv environment
renv::init()

# Install essential packages
renv::install(c(
  # Core tidyverse (includes ggplot2, dplyr, readr, stringr, forcats, tibble)
  "tidyverse",
  "tidyr",
  
  # Data manipulation and analysis
  "lubridate",
  "janitor",
  "skimr",
  
  # Database connectivity (for Redshift)
  "DBI",
  "RPostgres",
  "dbplyr",
  
  # Visualization and rapid prototyping
  "plotly",
  "scales",
  "viridis",
  "patchwork",
  
  # Presentation and reporting
  "quarto",
  "knitr",
  "rmarkdown",
  
  # Project management
  "here",
  "renv"
))

# Snapshot the environment
renv::snapshot()

Package Categories and Justifications

1. Core Data Science Workflow (Essential)

tidyverse - Comprehensive suite including ggplot2, dplyr, readr, stringr, forcats, tibble
tidyr - Data reshaping and cleaning (not automatically included in tidyverse)
lubridate - Date/time manipulation (crucial for bike trip timestamps)
janitor - Data cleaning and column name standardization
skimr - Quick dataset overviews and summary statistics

2. Database Connectivity (Project-Specific)

DBI - Database interface foundation
RPostgres - PostgreSQL/Redshift connectivity
dbplyr - dplyr syntax for database queries (essential for Redshift integration)

3. Visualization and Rapid Prototyping (Core Need)

plotly - Interactive visualizations for exploration
scales - Scale functions for ggplot2 (revenue formatting, percentages)
viridis - Color scales that are colorblind-friendly
patchwork - Combining multiple ggplot2 plots

4. Presentation Tools (Your Preference)

quarto - Modern publishing system with reveal.js integration
knitr - Code chunk processing (required by quarto)
rmarkdown - Markdown processing (quarto dependency)

5. Project Management (Professional Standards)

here - Robust file path management
renv - Package dependency management (already chosen)

Packages NOT Recommended for Your Use Case

Skip These (Not Needed):

arrow - You're using Redshift, not Parquet files directly in R
data.table - tidyverse approach is sufficient for your analysis scale
testthat - dbt handles data testing; R code will be exploratory
SimDesign - Monte Carlo simulations not relevant to bike share analysis
shiny - Using Tableau Public for final dashboards
tidymodels/caret/mlr3 - No machine learning requirements mentioned
packrat - Superseded by renv

Installation Strategy

Phase 1: Core Setup

# Essential packages for immediate work
core_packages <- c("tidyverse", "tidyr", "lubridate", "janitor", "skimr", "here")
renv::install(core_packages)
# OR
install.packages(core_packages)

Phase 2: Database Integration

# Database connectivity for Redshift
db_packages <- c("DBI", "RPostgres", "dbplyr")
renv::install(db_packages)
# OR
install.packages(db_packages)

Phase 3: Visualization Enhancement

# Advanced visualization capabilities
viz_packages <- c("plotly", "scales", "viridis", "patchwork")
renv::install(viz_packages)
# OR
install.packages(viz_packages)

Phase 4: Presentation Tools

# Modern presentation system
presentation_packages <- c("quarto", "knitr", "rmarkdown")
renv::install(presentation_packages)
# OR
install.packages(presentation_packages)

Project-Specific Considerations

For Divvy Data Analysis:

lubridate - Essential for trip start/end time analysis
scales - Format revenue calculations (dollar signs, percentages)
dbplyr - Write dplyr code that translates to SQL for Redshift
plotly - Interactive exploration of station utilization patterns

For Business Presentations:

quarto + reveal.js - Professional slide presentations
viridis - Accessible color palettes for executive presentations
patchwork - Combine multiple revenue/usage charts

renv Workflow

# Initialize project
renv::init()

# Install packages as needed
renv::install("package_name")
# OR
install.packages("package_name")

# Snapshot current state
renv::snapshot()

# Share project (others can restore with)
renv::restore()

Total Package Count: 16 Essential Packages

This curated list focuses on your specific needs while avoiding bloat. The selection prioritizes:

Redshift connectivity for ELT pipeline integration
Rapid visualization prototyping with ggplot2 ecosystem
Professional presentation capabilities with Quarto
Business-focused analysis tools for revenue and operational metrics

This streamlined approach ensures fast installation, minimal dependency conflicts, and focused functionality for your Divvy bike data engineering project.

pizofreude/R_RStudioCloud_RStudioDesktop.md

📚 R & RStudio: Working Directory Cheatsheet

1️⃣ RStudio Desktop: Setting the Working Directory

A. Launch from Terminal with Correct Directory

B. Change Directory Inside RStudio

C. Set a Default Working Directory (for all new sessions)

D. Use RStudio Projects for Best Practice

2️⃣ RStudio Cloud: Working Directory Tips

3️⃣ R Console (Base R): Set or Check Working Directory

4️⃣ Common Troubleshooting

5️⃣ Reference

pizofreude commented Jul 9, 2025 •

edited

Loading

Uh oh!

pizofreude commented Jul 11, 2025 •

edited

Loading

Uh oh!

pizofreude commented Jul 11, 2025 •

edited

Loading

Uh oh!

pizofreude/R_RStudioCloud_RStudioDesktop.md

📚 R & RStudio: Working Directory Cheatsheet

1️⃣ RStudio Desktop: Setting the Working Directory

A. Launch from Terminal with Correct Directory

B. Change Directory Inside RStudio

C. Set a Default Working Directory (for all new sessions)

D. Use RStudio Projects for Best Practice

2️⃣ RStudio Cloud: Working Directory Tips

3️⃣ R Console (Base R): Set or Check Working Directory

4️⃣ Common Troubleshooting

5️⃣ Reference

pizofreude commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VS Code Settings for R

Quick R Test Run in VS Code

Uh oh!

pizofreude commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

R Packages for Data Analytics & Engineering

High-Quality Package Selection

Additional Essential R Packages to Consider

Packages for Machine Learning

Presentation Tools

Uh oh!

pizofreude commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example Usage of R Packages for Data Analytics & Engineering

Key Highlights:

Core Advantages:

What I Excluded and Why:

Project-Specific Inclusions:

Installation Strategy:

Essential R Packages for Divvy Data Engineering Project

Core Installation Command for renv

Package Categories and Justifications

1. Core Data Science Workflow (Essential)

2. Database Connectivity (Project-Specific)

3. Visualization and Rapid Prototyping (Core Need)

4. Presentation Tools (Your Preference)

5. Project Management (Professional Standards)

Packages NOT Recommended for Your Use Case

Skip These (Not Needed):

Installation Strategy

Phase 1: Core Setup

Phase 2: Database Integration

Phase 3: Visualization Enhancement

Phase 4: Presentation Tools

Project-Specific Considerations

For Divvy Data Analysis:

For Business Presentations:

renv Workflow

Total Package Count: 16 Essential Packages

Uh oh!

pizofreude commented Jul 9, 2025 •

edited

Loading

pizofreude commented Jul 11, 2025 •

edited

Loading

pizofreude commented Jul 11, 2025 •

edited

Loading