Dataset: 76,227 merged PRs from 25 open-source repositories
Analysis Date: December 2025
Target Variable: review_time_hours (time from PR creation to merge)
This analysis investigates what factors predict how long a pull request takes to be reviewed and merged. Using XGBoost models with 70 features (excluding data leakage), we found:
- Author experience is the strongest predictor - Prior PR count and historical merge time account for ~21% of feature importance
- NLP features provide real predictive value - Sentiment and readability together contribute ~26% of importance
- Feature importance varies dramatically by repository - Some projects weight NLP at 40%, others at 12%
| Model | Task | R² | MAE | Accuracy | F1 |
|---|---|---|---|---|---|
| XGBoost | Regression | 0.377 | 101.3h | - | - |
| XGBoost | Classification | - | - | 69.1% | 0.676 |
| Random Forest | Regression | 0.361 | 103.0h | - | - |
| Random Forest | Classification | - | - | 68.5% | 0.660 |
Classification Target: fast (<24h), medium (24h-168h), slow (>168h)
Result: SUPPORTED ✅
| Metric | Value |
|---|---|
| NLP Sentiment Importance | 12.74% |
| Code Metrics Importance | 17.57% |
| Ratio (NLP/Code) | 72.5% |
NLP sentiment features contribute meaningful predictive power, achieving 72.5% of the importance of traditional code metrics.
Result: SUPPORTED ✅
| Metric | Value |
|---|---|
| NLP Readability Importance | 12.89% |
Readability metrics (Flesch, Gunning Fog, word count) are equally important as sentiment features.
| Rank | Category | Importance | Key Features |
|---|---|---|---|
| 1 | Author Metrics | 21.02% | prior_pr_count, avg_time_to_merge |
| 2 | Code Metrics | 17.57% | commits_count, lines_added |
| 3 | PR Structure | 12.92% | description_sections, has_tests |
| 4 | NLP Readability | 12.89% | flesch_reading_ease, word_count |
| 5 | NLP Sentiment | 12.74% | body_vader_neg, title_polarity |
| 6 | Temporal | 3.67% | day_of_week, hour_utc |
| 7 | Review Events | 2.31% | reviewers_requested_count |
| Rank | Feature | Importance | Category |
|---|---|---|---|
| 1 | author_prior_pr_count |
13.5% | Author |
| 2 | author_avg_time_to_merge_days |
7.5% | Author |
| 3 | commits_count |
7.2% | Code |
| 4 | lines_added |
5.4% | Code |
| 5 | description_sections_count |
2.5% | Structure |
| 6 | reviewers_requested_count |
2.3% | Review |
| 7 | test_to_code_ratio |
2.2% | Quality |
| 8 | title_has_docs_keywords |
2.2% | Classification |
| 9 | body_vader_neg |
2.1% | NLP Sentiment |
| 10 | has_tests_added |
2.0% | Structure |
| 11 | body_sentence_count |
2.0% | NLP Readability |
| 12 | ci_passed |
2.0% | Quality |
| 13 | touches_core_code |
2.0% | Intent |
| 14 | body_word_count |
1.7% | NLP Readability |
| 15 | has_only_docs_files |
1.7% | Classification |
| Repository | PRs | Median Review | R² | Top Feature |
|---|---|---|---|---|
| microsoft/vscode | 8,703 | 0.7h | -10.72 | author_prior_pr_count |
| home-assistant/core | 8,447 | 5.9h | -0.10 | commits_count |
| elastic/elasticsearch | 8,286 | 8.6h | 0.23 | commits_count |
| rust-lang/rust | 6,845 | 47.9h | -0.47 | body_sentence_count |
| flutter/flutter | 5,310 | 18.0h | 0.44 | description_sections_count |
| vercel/next.js | 5,302 | 17.5h | 0.05 | commits_count |
| grafana/grafana | 4,630 | 20.9h | 0.18 | commits_count |
| n8n-io/n8n | 4,286 | 21.7h | 0.17 | author_prior_pr_count |
| huggingface/transformers | 3,939 | 48.4h | -0.43 | commits_count |
| kubernetes/kubernetes | 3,182 | 128.6h | -0.27 | author_prior_pr_count |
| langchain-ai/langchain | 3,124 | 1.9h | -0.45 | author_prior_pr_count |
| neovim/neovim | 2,656 | 5.2h | 0.04 | has_only_test_files |
| freeCodeCamp/freeCodeCamp | 2,403 | 17.9h | 0.27 | commits_count |
| facebook/react | 1,600 | 13.4h | -0.46 | author_prior_pr_count |
| numpy/numpy | 1,422 | 8.4h | -0.08 | commits_count |
| pandas-dev/pandas | 1,288 | 23.8h | 0.00 | commits_count |
| scikit-learn/scikit-learn | 1,124 | 48.1h | -0.22 | commits_count |
| microsoft/PowerToys | 1,107 | 50.2h | 0.10 | has_issue_reference |
| ansible/ansible | 983 | 19.9h | 0.02 | commits_count |
| django/django | 843 | 73.4h | -0.12 | has_checklist |
| fastapi/fastapi | 515 | 141.3h | 0.02 | description_length |
Note: Negative R² indicates the model performs worse than predicting the mean - these repos have unique patterns.
| Repository | Author | Code | NLP Sentiment | NLP Readability | PR Structure |
|---|---|---|---|---|---|
| flutter | 10.2% | 9.1% | 12.2% | 39.5% 🏆 | 14.1% |
| freeCodeCamp | 3.7% | 30.3% 🏆 | 20.5% | 15.6% | 9.2% |
| react | 8.7% | 12.4% | 24.5% 🏆 | 17.5% | 7.3% |
| django | 10.8% | 12.1% | 18.5% | 16.8% | 24.4% 🏆 |
| vscode | 20.8% 🏆 | 10.0% | 21.6% | 15.3% | 10.9% |
| home-assistant | 11.6% | 28.2% | 18.3% | 12.1% | 10.9% |
| elasticsearch | 5.9% | 25.6% | 21.0% | 13.3% | 14.9% |
| rust | 11.9% | 15.0% | 14.8% | 23.0% | 12.0% |
| next.js | 6.9% | 12.9% | 17.0% | 15.4% | 17.9% |
| grafana | 9.3% | 15.4% | 19.3% | 12.0% | 17.2% |
| n8n | 15.2% | 14.7% | 17.2% | 15.1% | 9.6% |
| transformers | 10.2% | 21.8% | 16.6% | 13.1% | 11.9% |
| kubernetes | 10.1% | 10.6% | 19.8% | 13.7% | 15.4% |
| langchain | 12.5% | 14.6% | 16.6% | 20.1% | 8.6% |
| neovim | 9.7% | 13.0% | 22.7% | 15.1% | 14.2% |
| numpy | 9.0% | 19.6% | 15.8% | 18.5% | 12.8% |
| pandas | 5.0% | 19.5% | 19.4% | 14.8% | 10.4% |
| scikit-learn | 6.4% | 22.4% | 15.8% | 11.4% | 17.5% |
| PowerToys | 7.2% | 12.3% | 17.1% | 13.7% | 22.2% |
| ansible | 10.3% | 22.0% | 20.5% | 18.2% | 8.4% |
| fastapi | 18.8% | 11.7% | 18.3% | 21.8% | 8.9% |
| Repository | Fastest Day | Fastest (hrs) | Slowest Day | Slowest (hrs) | Penalty |
|---|---|---|---|---|---|
| microsoft/vscode | Wed | 0.5 | Sat | 3.5 | 7x |
| elastic/elasticsearch | Fri | 5.7 | Sat | 54.0 | 9.5x |
| flutter/flutter | Mon | 7.9 | Sat | 52.9 | 6.7x |
| vercel/next.js | Tue | 9.7 | Sat | 57.2 | 5.9x |
| kubernetes/kubernetes | Tue | 101.1 | Sun | 193.4 | 1.9x |
| langchain-ai/langchain | Thu | 0.7 | Sat | 38.4 | 55x |
| ansible/ansible | Mon | 1.4 | Sun | 277.1 | 198x |
| fastapi/fastapi | Sat ⭐ | 28.7 | Mon | 288.7 | 10x |
fastapi is unique - Saturday is their fastest day!
- NLP features matter more (21-40% combined)
- Description quality and readability are critical
- Visual/UX projects need clear communication
- Code metrics dominate (19-22%)
- Commit count and lines changed are key
- Technical correctness over communication
- Slower overall (median 8-128h)
- More even distribution across categories
- Process and structure matter more
- Author experience critical (12-21%)
- Median review times under 2h
- Trust in experienced contributors
Based on feature importance analysis:
- Build reputation - Author prior PR count is #1 predictor
- Keep commits focused - Fewer commits = faster review
- Limit scope - Smaller PRs (fewer lines) review faster
- Pass CI first -
ci_passedmatters across projects
- Structure your description - Use headers/sections
- Mind readability - Simpler language reviews faster
- Avoid negative tone -
body_vader_negcorrelates with delays
- Include tests -
has_tests_addedspeeds reviews - Link issues -
has_issue_referencehelps (especially PowerToys, scikit-learn) - Use checklists - Critical for Django-style projects
- Avoid weekends - Saturday is slowest for 20/24 repos
- Mid-week is best - Wednesday and Monday are most common fastest days
- Project-specific: Check your repo's pattern (fastapi is opposite!)
- Source: GitHub API v3
- Time period: Last 13 months
- Filter: Merged PRs only (no abandoned/closed)
- Bot exclusion: dependabot, renovate, etc. removed
- 100 total features extracted
- 70 used for modeling (30 excluded for data leakage)
- NLP: VADER + TextBlob sentiment, textstat readability
Features occurring during/after review were excluded:
pr_age_at_merge_days(target in different units)time_to_first_review_hoursreview_cycles_count,approvals_countreview_*sentiment features
- Proxy validity: Merge time ≠ quality
- Domain mismatch: VADER trained on social media
- Confounding: Reviewer availability not captured
- Selection bias: Only merged PRs analyzed
| File | Description |
|---|---|
data/processed/pr_features.csv |
Full dataset (76,227 rows, 100 columns) |
data/processed/modeling_report.json |
Complete analysis results |
data/processed/kaggle/ |
Kaggle-ready dataset package |
ANALYSIS_REPORT.md |
This report |