Skip to content

Instantly share code, notes, and snippets.

@jonshaffer
Last active December 3, 2025 00:37
Show Gist options
  • Select an option

  • Save jonshaffer/2b47742db42faf75744ae03877bb88f1 to your computer and use it in GitHub Desktop.

Select an option

Save jonshaffer/2b47742db42faf75744ae03877bb88f1 to your computer and use it in GitHub Desktop.

Pull Request Review Time Analysis

Dataset: 76,227 merged PRs from 25 open-source repositories Analysis Date: December 2025 Target Variable: review_time_hours (time from PR creation to merge)


Executive Summary

This analysis investigates what factors predict how long a pull request takes to be reviewed and merged. Using XGBoost models with 70 features (excluding data leakage), we found:

  1. Author experience is the strongest predictor - Prior PR count and historical merge time account for ~21% of feature importance
  2. NLP features provide real predictive value - Sentiment and readability together contribute ~26% of importance
  3. Feature importance varies dramatically by repository - Some projects weight NLP at 40%, others at 12%

Overall Model Performance

Model Task MAE Accuracy F1
XGBoost Regression 0.377 101.3h - -
XGBoost Classification - - 69.1% 0.676
Random Forest Regression 0.361 103.0h - -
Random Forest Classification - - 68.5% 0.660

Classification Target: fast (<24h), medium (24h-168h), slow (>168h)


Research Hypothesis Results

H1: Do NLP sentiment features provide predictive value?

Result: SUPPORTED

Metric Value
NLP Sentiment Importance 12.74%
Code Metrics Importance 17.57%
Ratio (NLP/Code) 72.5%

NLP sentiment features contribute meaningful predictive power, achieving 72.5% of the importance of traditional code metrics.

H2: Do readability features correlate with review speed?

Result: SUPPORTED

Metric Value
NLP Readability Importance 12.89%

Readability metrics (Flesch, Gunning Fog, word count) are equally important as sentiment features.


Feature Category Rankings (Overall)

Rank Category Importance Key Features
1 Author Metrics 21.02% prior_pr_count, avg_time_to_merge
2 Code Metrics 17.57% commits_count, lines_added
3 PR Structure 12.92% description_sections, has_tests
4 NLP Readability 12.89% flesch_reading_ease, word_count
5 NLP Sentiment 12.74% body_vader_neg, title_polarity
6 Temporal 3.67% day_of_week, hour_utc
7 Review Events 2.31% reviewers_requested_count

Top 15 Individual Features

Rank Feature Importance Category
1 author_prior_pr_count 13.5% Author
2 author_avg_time_to_merge_days 7.5% Author
3 commits_count 7.2% Code
4 lines_added 5.4% Code
5 description_sections_count 2.5% Structure
6 reviewers_requested_count 2.3% Review
7 test_to_code_ratio 2.2% Quality
8 title_has_docs_keywords 2.2% Classification
9 body_vader_neg 2.1% NLP Sentiment
10 has_tests_added 2.0% Structure
11 body_sentence_count 2.0% NLP Readability
12 ci_passed 2.0% Quality
13 touches_core_code 2.0% Intent
14 body_word_count 1.7% NLP Readability
15 has_only_docs_files 1.7% Classification

Per-Repository Analysis

Repository Overview

Repository PRs Median Review Top Feature
microsoft/vscode 8,703 0.7h -10.72 author_prior_pr_count
home-assistant/core 8,447 5.9h -0.10 commits_count
elastic/elasticsearch 8,286 8.6h 0.23 commits_count
rust-lang/rust 6,845 47.9h -0.47 body_sentence_count
flutter/flutter 5,310 18.0h 0.44 description_sections_count
vercel/next.js 5,302 17.5h 0.05 commits_count
grafana/grafana 4,630 20.9h 0.18 commits_count
n8n-io/n8n 4,286 21.7h 0.17 author_prior_pr_count
huggingface/transformers 3,939 48.4h -0.43 commits_count
kubernetes/kubernetes 3,182 128.6h -0.27 author_prior_pr_count
langchain-ai/langchain 3,124 1.9h -0.45 author_prior_pr_count
neovim/neovim 2,656 5.2h 0.04 has_only_test_files
freeCodeCamp/freeCodeCamp 2,403 17.9h 0.27 commits_count
facebook/react 1,600 13.4h -0.46 author_prior_pr_count
numpy/numpy 1,422 8.4h -0.08 commits_count
pandas-dev/pandas 1,288 23.8h 0.00 commits_count
scikit-learn/scikit-learn 1,124 48.1h -0.22 commits_count
microsoft/PowerToys 1,107 50.2h 0.10 has_issue_reference
ansible/ansible 983 19.9h 0.02 commits_count
django/django 843 73.4h -0.12 has_checklist
fastapi/fastapi 515 141.3h 0.02 description_length

Note: Negative R² indicates the model performs worse than predicting the mean - these repos have unique patterns.


Category Importance by Repository

Repository Author Code NLP Sentiment NLP Readability PR Structure
flutter 10.2% 9.1% 12.2% 39.5% 🏆 14.1%
freeCodeCamp 3.7% 30.3% 🏆 20.5% 15.6% 9.2%
react 8.7% 12.4% 24.5% 🏆 17.5% 7.3%
django 10.8% 12.1% 18.5% 16.8% 24.4% 🏆
vscode 20.8% 🏆 10.0% 21.6% 15.3% 10.9%
home-assistant 11.6% 28.2% 18.3% 12.1% 10.9%
elasticsearch 5.9% 25.6% 21.0% 13.3% 14.9%
rust 11.9% 15.0% 14.8% 23.0% 12.0%
next.js 6.9% 12.9% 17.0% 15.4% 17.9%
grafana 9.3% 15.4% 19.3% 12.0% 17.2%
n8n 15.2% 14.7% 17.2% 15.1% 9.6%
transformers 10.2% 21.8% 16.6% 13.1% 11.9%
kubernetes 10.1% 10.6% 19.8% 13.7% 15.4%
langchain 12.5% 14.6% 16.6% 20.1% 8.6%
neovim 9.7% 13.0% 22.7% 15.1% 14.2%
numpy 9.0% 19.6% 15.8% 18.5% 12.8%
pandas 5.0% 19.5% 19.4% 14.8% 10.4%
scikit-learn 6.4% 22.4% 15.8% 11.4% 17.5%
PowerToys 7.2% 12.3% 17.1% 13.7% 22.2%
ansible 10.3% 22.0% 20.5% 18.2% 8.4%
fastapi 18.8% 11.7% 18.3% 21.8% 8.9%

Fastest Day of Week by Repository

Repository Fastest Day Fastest (hrs) Slowest Day Slowest (hrs) Penalty
microsoft/vscode Wed 0.5 Sat 3.5 7x
elastic/elasticsearch Fri 5.7 Sat 54.0 9.5x
flutter/flutter Mon 7.9 Sat 52.9 6.7x
vercel/next.js Tue 9.7 Sat 57.2 5.9x
kubernetes/kubernetes Tue 101.1 Sun 193.4 1.9x
langchain-ai/langchain Thu 0.7 Sat 38.4 55x
ansible/ansible Mon 1.4 Sun 277.1 198x
fastapi/fastapi Sat 28.7 Mon 288.7 10x

fastapi is unique - Saturday is their fastest day!


Key Insights by Project Type

UI/Frontend Projects (flutter, react, vscode)

  • NLP features matter more (21-40% combined)
  • Description quality and readability are critical
  • Visual/UX projects need clear communication

Data Science Projects (pandas, numpy, scikit-learn)

  • Code metrics dominate (19-22%)
  • Commit count and lines changed are key
  • Technical correctness over communication

Infrastructure Projects (kubernetes, elasticsearch)

  • Slower overall (median 8-128h)
  • More even distribution across categories
  • Process and structure matter more

Fast-Moving Projects (langchain, vscode)

  • Author experience critical (12-21%)
  • Median review times under 2h
  • Trust in experienced contributors

Recommendations for Faster PR Reviews

Based on feature importance analysis:

For All Projects

  1. Build reputation - Author prior PR count is #1 predictor
  2. Keep commits focused - Fewer commits = faster review
  3. Limit scope - Smaller PRs (fewer lines) review faster
  4. Pass CI first - ci_passed matters across projects

For Documentation-Heavy Projects

  1. Structure your description - Use headers/sections
  2. Mind readability - Simpler language reviews faster
  3. Avoid negative tone - body_vader_neg correlates with delays

For Code-Heavy Projects

  1. Include tests - has_tests_added speeds reviews
  2. Link issues - has_issue_reference helps (especially PowerToys, scikit-learn)
  3. Use checklists - Critical for Django-style projects

Timing Recommendations

  • Avoid weekends - Saturday is slowest for 20/24 repos
  • Mid-week is best - Wednesday and Monday are most common fastest days
  • Project-specific: Check your repo's pattern (fastapi is opposite!)

Methodology Notes

Data Collection

  • Source: GitHub API v3
  • Time period: Last 13 months
  • Filter: Merged PRs only (no abandoned/closed)
  • Bot exclusion: dependabot, renovate, etc. removed

Feature Engineering

  • 100 total features extracted
  • 70 used for modeling (30 excluded for data leakage)
  • NLP: VADER + TextBlob sentiment, textstat readability

Data Leakage Exclusions

Features occurring during/after review were excluded:

  • pr_age_at_merge_days (target in different units)
  • time_to_first_review_hours
  • review_cycles_count, approvals_count
  • review_* sentiment features

Limitations

  1. Proxy validity: Merge time ≠ quality
  2. Domain mismatch: VADER trained on social media
  3. Confounding: Reviewer availability not captured
  4. Selection bias: Only merged PRs analyzed

Files Generated

File Description
data/processed/pr_features.csv Full dataset (76,227 rows, 100 columns)
data/processed/modeling_report.json Complete analysis results
data/processed/kaggle/ Kaggle-ready dataset package
ANALYSIS_REPORT.md This report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment