Pull Request Review Time Analysis

Dataset: 76,227 merged PRs from 25 open-source repositories Analysis Date: December 2025 Target Variable: review_time_hours (time from PR creation to merge)

Executive Summary

This analysis investigates what factors predict how long a pull request takes to be reviewed and merged. Using XGBoost models with 70 features (excluding data leakage), we found:

Author experience is the strongest predictor - Prior PR count and historical merge time account for ~21% of feature importance
NLP features provide real predictive value - Sentiment and readability together contribute ~26% of importance
Feature importance varies dramatically by repository - Some projects weight NLP at 40%, others at 12%

Overall Model Performance

Model	Task	R²	MAE	Accuracy	F1
XGBoost	Regression	0.377	101.3h	-	-
XGBoost	Classification	-	-	69.1%	0.676
Random Forest	Regression	0.361	103.0h	-	-
Random Forest	Classification	-	-	68.5%	0.660

Classification Target: fast (<24h), medium (24h-168h), slow (>168h)

Research Hypothesis Results

H1: Do NLP sentiment features provide predictive value?

Result: SUPPORTED ✅

Metric	Value
NLP Sentiment Importance	12.74%
Code Metrics Importance	17.57%
Ratio (NLP/Code)	72.5%

NLP sentiment features contribute meaningful predictive power, achieving 72.5% of the importance of traditional code metrics.

H2: Do readability features correlate with review speed?

Result: SUPPORTED ✅

Metric	Value
NLP Readability Importance	12.89%

Readability metrics (Flesch, Gunning Fog, word count) are equally important as sentiment features.

Feature Category Rankings (Overall)

Rank	Category	Importance	Key Features
1	Author Metrics	21.02%	prior_pr_count, avg_time_to_merge
2	Code Metrics	17.57%	commits_count, lines_added
3	PR Structure	12.92%	description_sections, has_tests
4	NLP Readability	12.89%	flesch_reading_ease, word_count
5	NLP Sentiment	12.74%	body_vader_neg, title_polarity
6	Temporal	3.67%	day_of_week, hour_utc
7	Review Events	2.31%	reviewers_requested_count

Top 15 Individual Features

Rank	Feature	Importance	Category
1	`author_prior_pr_count`	13.5%	Author
2	`author_avg_time_to_merge_days`	7.5%	Author
3	`commits_count`	7.2%	Code
4	`lines_added`	5.4%	Code
5	`description_sections_count`	2.5%	Structure
6	`reviewers_requested_count`	2.3%	Review
7	`test_to_code_ratio`	2.2%	Quality
8	`title_has_docs_keywords`	2.2%	Classification
9	`body_vader_neg`	2.1%	NLP Sentiment
10	`has_tests_added`	2.0%	Structure
11	`body_sentence_count`	2.0%	NLP Readability
12	`ci_passed`	2.0%	Quality
13	`touches_core_code`	2.0%	Intent
14	`body_word_count`	1.7%	NLP Readability
15	`has_only_docs_files`	1.7%	Classification

Per-Repository Analysis

Repository Overview

Repository	PRs	Median Review	R²	Top Feature
microsoft/vscode	8,703	0.7h	-10.72	author_prior_pr_count
home-assistant/core	8,447	5.9h	-0.10	commits_count
elastic/elasticsearch	8,286	8.6h	0.23	commits_count
rust-lang/rust	6,845	47.9h	-0.47	body_sentence_count
flutter/flutter	5,310	18.0h	0.44	description_sections_count
vercel/next.js	5,302	17.5h	0.05	commits_count
grafana/grafana	4,630	20.9h	0.18	commits_count
n8n-io/n8n	4,286	21.7h	0.17	author_prior_pr_count
huggingface/transformers	3,939	48.4h	-0.43	commits_count
kubernetes/kubernetes	3,182	128.6h	-0.27	author_prior_pr_count
langchain-ai/langchain	3,124	1.9h	-0.45	author_prior_pr_count
neovim/neovim	2,656	5.2h	0.04	has_only_test_files
freeCodeCamp/freeCodeCamp	2,403	17.9h	0.27	commits_count
facebook/react	1,600	13.4h	-0.46	author_prior_pr_count
numpy/numpy	1,422	8.4h	-0.08	commits_count
pandas-dev/pandas	1,288	23.8h	0.00	commits_count
scikit-learn/scikit-learn	1,124	48.1h	-0.22	commits_count
microsoft/PowerToys	1,107	50.2h	0.10	has_issue_reference
ansible/ansible	983	19.9h	0.02	commits_count
django/django	843	73.4h	-0.12	has_checklist
fastapi/fastapi	515	141.3h	0.02	description_length

Note: Negative R² indicates the model performs worse than predicting the mean - these repos have unique patterns.

Category Importance by Repository

Repository	Author	Code	NLP Sentiment	NLP Readability	PR Structure
flutter	10.2%	9.1%	12.2%	39.5% 🏆	14.1%
freeCodeCamp	3.7%	30.3% 🏆	20.5%	15.6%	9.2%
react	8.7%	12.4%	24.5% 🏆	17.5%	7.3%
django	10.8%	12.1%	18.5%	16.8%	24.4% 🏆
vscode	20.8% 🏆	10.0%	21.6%	15.3%	10.9%
home-assistant	11.6%	28.2%	18.3%	12.1%	10.9%
elasticsearch	5.9%	25.6%	21.0%	13.3%	14.9%
rust	11.9%	15.0%	14.8%	23.0%	12.0%
next.js	6.9%	12.9%	17.0%	15.4%	17.9%
grafana	9.3%	15.4%	19.3%	12.0%	17.2%
n8n	15.2%	14.7%	17.2%	15.1%	9.6%
transformers	10.2%	21.8%	16.6%	13.1%	11.9%
kubernetes	10.1%	10.6%	19.8%	13.7%	15.4%
langchain	12.5%	14.6%	16.6%	20.1%	8.6%
neovim	9.7%	13.0%	22.7%	15.1%	14.2%
numpy	9.0%	19.6%	15.8%	18.5%	12.8%
pandas	5.0%	19.5%	19.4%	14.8%	10.4%
scikit-learn	6.4%	22.4%	15.8%	11.4%	17.5%
PowerToys	7.2%	12.3%	17.1%	13.7%	22.2%
ansible	10.3%	22.0%	20.5%	18.2%	8.4%
fastapi	18.8%	11.7%	18.3%	21.8%	8.9%

Fastest Day of Week by Repository

Repository	Fastest Day	Fastest (hrs)	Slowest Day	Slowest (hrs)	Penalty
microsoft/vscode	Wed	0.5	Sat	3.5	7x
elastic/elasticsearch	Fri	5.7	Sat	54.0	9.5x
flutter/flutter	Mon	7.9	Sat	52.9	6.7x
vercel/next.js	Tue	9.7	Sat	57.2	5.9x
kubernetes/kubernetes	Tue	101.1	Sun	193.4	1.9x
langchain-ai/langchain	Thu	0.7	Sat	38.4	55x
ansible/ansible	Mon	1.4	Sun	277.1	198x
fastapi/fastapi	Sat ⭐	28.7	Mon	288.7	10x

fastapi is unique - Saturday is their fastest day!

Key Insights by Project Type

UI/Frontend Projects (flutter, react, vscode)

NLP features matter more (21-40% combined)
Description quality and readability are critical
Visual/UX projects need clear communication

Data Science Projects (pandas, numpy, scikit-learn)

Code metrics dominate (19-22%)
Commit count and lines changed are key
Technical correctness over communication

Infrastructure Projects (kubernetes, elasticsearch)

Slower overall (median 8-128h)
More even distribution across categories
Process and structure matter more

Fast-Moving Projects (langchain, vscode)

Author experience critical (12-21%)
Median review times under 2h
Trust in experienced contributors

Recommendations for Faster PR Reviews

Based on feature importance analysis:

For All Projects

Build reputation - Author prior PR count is #1 predictor
Keep commits focused - Fewer commits = faster review
Limit scope - Smaller PRs (fewer lines) review faster
Pass CI first - ci_passed matters across projects

For Documentation-Heavy Projects

Structure your description - Use headers/sections
Mind readability - Simpler language reviews faster
Avoid negative tone - body_vader_neg correlates with delays

For Code-Heavy Projects

Include tests - has_tests_added speeds reviews
Link issues - has_issue_reference helps (especially PowerToys, scikit-learn)
Use checklists - Critical for Django-style projects

Timing Recommendations

Avoid weekends - Saturday is slowest for 20/24 repos
Mid-week is best - Wednesday and Monday are most common fastest days
Project-specific: Check your repo's pattern (fastapi is opposite!)

Methodology Notes

Data Collection

Source: GitHub API v3
Time period: Last 13 months
Filter: Merged PRs only (no abandoned/closed)
Bot exclusion: dependabot, renovate, etc. removed

Feature Engineering

100 total features extracted
70 used for modeling (30 excluded for data leakage)
NLP: VADER + TextBlob sentiment, textstat readability

Data Leakage Exclusions

Features occurring during/after review were excluded:

pr_age_at_merge_days (target in different units)
time_to_first_review_hours
review_cycles_count, approvals_count
review_* sentiment features

Limitations

Proxy validity: Merge time ≠ quality
Domain mismatch: VADER trained on social media
Confounding: Reviewer availability not captured
Selection bias: Only merged PRs analyzed

Files Generated

File	Description
`data/processed/pr_features.csv`	Full dataset (76,227 rows, 100 columns)
`data/processed/modeling_report.json`	Complete analysis results
`data/processed/kaggle/`	Kaggle-ready dataset package
`ANALYSIS_REPORT.md`	This report

jonshaffer/PRTimeAnalysis.md

Select an option

No results found