An advanced command-line tool for a deep-dive analysis of any GitHub user's public activity. This version moves beyond basic reporting to provide sophisticated, SOTA metrics and a persona-based final verdict, offering a true analytical perspective on a developer's profile.
It uses a rich, color-coded terminal interface for a beautiful and highly readable user experience.
- Rich & Beautiful Terminal UI: Presents data in elegant tables, panels, and color-coded text using
rich. - Persona-Based Final Verdict: Interprets metrics in combination to assign a developer "persona" (e.g., Seasoned Architect, Curious Explorer), providing a holistic and nuanced summary.
- Advanced SOTA Metrics: Calculates insightful metrics you won't find elsewhere:
- Consistency Score: Measures the regularity of contributions.
- Commit Discipline Score: Grades the quality of commit messages based on length, format, and clarity.
- Learning Trajectory: Quantifies continuous learning by tracking the adoption of new languages over time.
- Impact Factor (H-Index): A robust measure of a developer's influence in the open-source community.
- Intelligent Caching: Saves results locally to make subsequent analyses on the same user instantaneous.
- Robust Error & Rate Limit Handling: Gracefully waits and retries on API rate limits, ensuring completion.
- Visualizations: Generates an optional Matplotlib chart for month-over-month activity.
Here's a preview of the new, more analytical output in your terminal:
βββββββββββββββββββββββββββ High-Level Overview for userX ββββββββββββββββββββββββββββ
β β
β Total Commits Analyzed: 8,451 β
β Contributed to: 42 unique public repos β
β Commit History Spans: From 2018-03-15 to 2023-11-21 β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ Advanced Developer DNA ββββββββββββββββββββββββββββββββββ
β Metric β Value β Interpretation β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β Consistency Score β 82.5 / 100 β Measures regularity of commits (lower varianceβ¦ β
β Commit Discipline Score β 88.1 / 100 β Quality of commit messages (conventional formatβ¦ β
β Learning Trajectory β 4 new langs β Languages adopted after the first year of actiβ¦ β
β Impact Factor (H-Index) β 12 β Has 12 repos with at least 12 stars each. β
βββββββββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββ Final Verdict ββββββββββββββββββββββββββββββββββββββ
β β
β ποΈ The Seasoned Architect β
β β
β This profile exhibits strong signs of a lead developer or architect. A high H-Index β
β shows significant community impact, while a top-tier discipline score indicates a β
β focus on code quality, maintainability, and clear communication. Their work is both β
β influential and professionally crafted. β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Analysis Complete.
- Python 3.7+
- The
pippackage manager
This version requires numpy for statistical calculations. Open your terminal and run:
pip install PyGithub pandas matplotlib rich tqdm numpyA PAT is essential for this script to work.
- Go to github.com/settings/tokens and click "Generate new token (classic)".
- Give it a name (e.g., "GitHub Analyzer Script").
- Check the box for
public_reposcope. - Click "Generate token" and copy the token immediately.
Never hardcode your token. The script reads it from an environment variable named GITHUB_TOKEN.
- macOS / Linux:
# Add this line to your ~/.zshrc or ~/.bashrc export GITHUB_TOKEN="your_pasted_token_here" # Restart your terminal or run `source ~/.zshrc`
- Windows:
# This command sets the variable permanently setx GITHUB_TOKEN "your_pasted_token_here" # Restart your terminal for the change to take effect
Save the code below as github_analyzer_pro_v2.py.
python github_analyzer_pro_v2.py <github_username>usage: github_analyzer_pro_v2.py [-h] [--limit-repos LIMIT_REPOS] [--no-plots] [--no-cache] username
# ... (same options as before)
This script's power comes from its unique metrics. Here's what they mean:
| Metric | How It's Calculated | What It Tells You |
|---|---|---|
| Consistency Score | Based on the standard deviation of monthly commits relative to the mean. A lower deviation results in a higher score. | Is this developer a steady contributor or someone who works in bursts? High scores indicate consistent, reliable engagement. |
| Commit Discipline | A weighted score (0-100) from: β’ Avg. message length β’ % of Conventional Commits β’ % of non-lazy messages |
How professional and communicative is their development process? High scores reflect a mature developer who writes clean, useful commit histories. |
| Learning Trajectory | Counts the number of new programming languages used in commits after their first year of activity on GitHub. | Does this developer actively learn and apply new technologies, or do they stick to a core set of skills? A high number shows adaptability. |
| Impact Factor (H-Index) | A developer has an h-index of h if they have h repositories with at least h stars each. |
A robust measure of influence. It rewards a portfolio of consistently valuable projects over a single viral hit or many unpopular ones. |
The script analyzes the advanced metrics to assign one of the following personas:
| Persona | Key Indicators |
|---|---|
| ποΈ Seasoned Architect | High H-Index and high Commit Discipline. |
| π§ Curious Explorer | High Learning Trajectory, often with lower consistency (bursts of activity). |
| π¬ Specialist Craftsman | Very high Consistency and Discipline, but low language diversity. |
| π The Rising Star | A high percentage of total commits and impact occurred in the last year. |
| π Productivity Powerhouse | Exceptionally high commit volume (mean) and high consistency. |
| π‘ Hobbyist Contributor | Lower scores across the board, indicating sporadic but passionate involvement. |
Filename: github_analyzer_pro_v2.py
"""
GitHub Profile Analyzer Pro (v2)
A SOTA script to analyze a GitHub user's public activity, providing deep insights,
advanced metrics, and a persona-based final verdict on their development profile.
"""
import os
import sys
import re
import pickle
import time
import argparse
from datetime import datetime, timedelta
from collections import Counter
from typing import List, Dict, Any, Tuple
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from github import Github, GithubException, RateLimitExceededException
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from tqdm import tqdm
# --- CONFIGURATION & CONSTANTS ---
SECURITY_PATTERNS = {
"generic_api_key": re.compile(r'["\']?([a-zA-Z]{3,10}_)?(api|access|secret)_?(key|token)["\']?\s*[:=]\s*["\']?([a-zA-Z0-9\-_]{20,})["\']?', re.IGNORECASE),
"aws_access_key": re.compile(r'AKIA[0-9A-Z]{16}', re.IGNORECASE),
"private_key_header": re.compile(r'-----BEGIN ((RSA|OPENSSH|EC|PGP) )?PRIVATE KEY-----', re.IGNORECASE),
}
CACHE_DIR = "cache"
os.makedirs(CACHE_DIR, exist_ok=True)
CONVENTIONAL_COMMIT_REGEX = re.compile(r'^\w+(\(\w+\))?(!)?:\s.*')
console = Console()
# --- DATA FETCHING & CACHING ---
def fetch_user_data(g: Github, username: str, repo_limit: int) -> Tuple[List[Dict[str, Any]], List[Any]]:
"""Fetches all commits from all public repositories for a given GitHub user."""
try:
user = g.get_user(username)
except GithubException:
console.print(f"[bold red]Error: User '{username}' not found.[/bold red]")
sys.exit(1)
all_commits_data, repos_list = [], list(user.get_repos())
total_repos = len(repos_list)
if repo_limit:
total_repos = min(total_repos, repo_limit)
console.print(f"[yellow]Limiting analysis to the first {total_repos} repositories.[/yellow]")
repo_iterator = repos_list[:total_repos]
with tqdm(total=total_repos, desc="[cyan]Processing Repos[/cyan]", unit="repo") as pbar:
for repo in repo_iterator:
pbar.set_postfix_str(repo.name)
try:
# Limit commits per repo to avoid extremely long waits on monolithic repos
commits = repo.get_commits(author=user.login)
for commit in commits.reversed[:1000]: # Analyze latest 1000 commits per repo
full_commit = get_commit_with_retry(repo, commit.sha)
if full_commit and full_commit.stats:
stats = full_commit.stats
commit_data = {
"repo_name": repo.name, "sha": commit.sha,
"message": commit.commit.message, "date": commit.commit.author.date,
"additions": stats.additions, "deletions": stats.deletions,
"total_changes": stats.total,
}
all_commits_data.append(commit_data)
except RateLimitExceededException:
console.print("\n[bold yellow]Rate limit exceeded. Waiting for reset...[/bold yellow]")
reset_time = g.get_rate_limit().core.reset
sleep_duration = (reset_time - datetime.utcnow()).total_seconds() + 10
if sleep_duration > 0:
time.sleep(sleep_duration)
console.print("[green]Resuming...[/green]")
except GithubException as e:
pass
pbar.update(1)
return all_commits_data, repo_iterator
def get_commit_with_retry(repo: Any, sha: str, max_retries: int = 3, delay: int = 5) -> Any:
"""Fetches a single commit with retry logic for transient network errors."""
for attempt in range(max_retries):
try:
return repo.get_commit(sha)
except GithubException as e:
if e.status == 404 or attempt >= max_retries - 1: return None
time.sleep(delay * (attempt + 1))
def load_or_fetch_data(g: Github, username: str, repo_limit: int, use_cache: bool) -> Tuple[pd.DataFrame, List[Any]]:
"""Loads data from cache if available, otherwise fetches from API."""
cache_file = os.path.join(CACHE_DIR, f"{username}_data.pkl")
if use_cache and os.path.exists(cache_file):
console.print(f"[green]Loading data from cache file: {cache_file}[/green]")
with open(cache_file, "rb") as f: cached_data = pickle.load(f)
return cached_data['df'], cached_data['repos']
console.print(f"[bold cyan]Fetching fresh data for user: {username}...[/bold cyan]")
commit_data, repos = fetch_user_data(g, username, repo_limit)
if not commit_data:
console.print(f"[bold red]No public commits found for user '{username}'. Exiting.[/bold red]")
sys.exit(0)
df = pd.DataFrame(commit_data)
df['date'] = pd.to_datetime(df['date'])
if use_cache:
with open(cache_file, "wb") as f: pickle.dump({'df': df, 'repos': repos}, f)
console.print(f"[green]Data saved to cache: {cache_file}[/green]")
return df, repos
# --- ANALYSIS & REPORTING (v2) ---
def calculate_advanced_metrics(df: pd.DataFrame, repos: List[Any]) -> Dict[str, Any]:
"""Calculates sophisticated metrics for a more accurate profile assessment."""
metrics = {}
total_commits = len(df)
if total_commits == 0: return {}
# 1. Consistency Metrics
df['month'] = df['date'].dt.to_period('M')
monthly_commits = df.groupby('month').size()
metrics['monthly_commit_std_dev'] = monthly_commits.std()
metrics['monthly_commit_mean'] = monthly_commits.mean()
consistency_ratio = metrics['monthly_commit_std_dev'] / metrics['monthly_commit_mean'] if metrics['monthly_commit_mean'] > 0 else 1
metrics['consistency_score'] = max(0, 1 - consistency_ratio) * 100
# 2. Commit Discipline Score
df['msg_len'] = df['message'].str.len()
df['is_conventional'] = df['message'].apply(lambda x: bool(CONVENTIONAL_COMMIT_REGEX.match(x)))
df['is_lazy'] = df['message'].str.strip().str.split().str.len() <= 2
avg_len_score = min(df['msg_len'].mean() / 70, 1) * 100
conventional_pct = df['is_conventional'].mean() * 100
non_lazy_pct = (1 - df['is_lazy'].mean()) * 100
metrics['commit_discipline_score'] = (avg_len_score * 0.25) + (conventional_pct * 0.5) + (non_lazy_pct * 0.25)
# 3. Learning Trajectory
repo_langs = {repo.name: repo.language for repo in repos if repo.language}
df['language'] = df['repo_name'].map(repo_langs)
df_sorted = df.dropna(subset=['language']).sort_values('date')
first_commit_date = df['date'].min()
one_year_marker = first_commit_date + timedelta(days=365)
seen_languages_first_year = set(df_sorted[df_sorted['date'] <= one_year_marker]['language'])
seen_languages_all_time = set(df_sorted['language'])
metrics['new_langs_after_first_year'] = len(seen_languages_all_time - seen_languages_first_year)
metrics['total_languages'] = len(seen_languages_all_time)
# 4. Impact Factor (H-Index)
stars = sorted([r.stargazers_count for r in repos], reverse=True)
h_index = 0
for i, s in enumerate(stars):
if s >= i + 1: h_index = i + 1
else: break
metrics['h_index'] = h_index
metrics['total_stars'] = sum(stars)
# 5. Recency
metrics['commits_last_year'] = df[df['date'] > (datetime.now(df['date'].dt.tz) - timedelta(days=365))].shape[0]
metrics['recency_ratio'] = metrics['commits_last_year'] / total_commits if total_commits > 0 else 0
return metrics
def report_advanced_metrics(metrics: Dict[str, Any]):
table = Table(title="[bold yellow]Advanced Developer DNA[/bold yellow]", show_header=False, padding=(0, 1))
table.add_column("Metric", style="cyan")
table.add_column("Value", style="magenta")
table.add_column("Interpretation", style="default")
table.add_row("Consistency Score", f"{metrics['consistency_score']:.1f} / 100", "Measures regularity of commits (lower variance is better).")
table.add_row("Commit Discipline Score", f"{metrics['commit_discipline_score']:.1f} / 100", "Quality of commit messages (format, length, detail).")
table.add_row("Learning Trajectory", f"{metrics['new_langs_after_first_year']} new langs", "Languages adopted after the first year of activity.")
table.add_row("Impact Factor (H-Index)", f"{metrics['h_index']}", f"Has {metrics['h_index']} repos with at least {metrics['h_index']} stars each.")
console.print(table)
def generate_final_verdict(metrics: Dict[str, Any]):
persona, description = "π‘ The Hobbyist Contributor", "This developer contributes to open source with passion, though perhaps not with the high frequency or wide impact of a full-time professional. Their work shows dedication and a love for coding."
if metrics['h_index'] >= 10 and metrics['commit_discipline_score'] >= 75:
persona, description = "ποΈ The Seasoned Architect", "This profile shows strong signs of a lead developer. A high H-Index indicates significant community impact, while a top-tier discipline score suggests a focus on code quality, maintainability, and clear communication. Their work is influential and professionally crafted."
elif metrics['new_langs_after_first_year'] >= 4 and metrics['consistency_score'] < 70:
persona, description = "π§ The Curious Explorer", "This developer is a quintessential learner, constantly picking up new technologies. The high number of languages adopted, combined with bursty activity, suggests a passion for experimentation, prototyping, and exploring new frontiers in tech."
elif metrics['consistency_score'] >= 80 and metrics['commit_discipline_score'] >= 80 and metrics['total_languages'] <= 3:
persona, description = "π¬ The Specialist Craftsman", "Deeply focused and highly professional, this developer has mastered their chosen tools. Their exceptional consistency and commit discipline in a narrow set of technologies point to an expert who builds robust, high-quality software within their domain."
elif metrics['recency_ratio'] >= 0.5 and metrics['h_index'] > 5:
persona, description = "π The Rising Star", "This developer's profile shows a dramatic acceleration. With a significant portion of their impact and activity occurring recently, they are on a steep upward trajectory, gaining recognition and contributing valuable work to the community at an increasing rate."
elif metrics['monthly_commit_mean'] >= 100 and metrics['consistency_score'] >= 75:
persona, description = "π The Productivity Powerhouse", "With an exceptionally high and consistent volume of contributions, this developer is an engine of productivity. This profile is typical of core maintainers of large projects or individuals with an incredible work ethic and passion for open source."
panel = Panel(
f"[bold]{persona}[/bold]\n\n{description}",
title="[bold magenta]Final Verdict[/bold magenta]",
border_style="magenta",
padding=(1, 2)
)
console.print(panel)
def report_overview(df: pd.DataFrame, username: str):
total_commits = len(df)
total_repos_analyzed = df['repo_name'].nunique()
first_commit_date = df['date'].min().strftime('%Y-%m-%d')
last_commit_date = df['date'].max().strftime('%Y-%m-%d')
panel = Panel(f"""
[bold]Total Commits Analyzed[/bold]: [cyan]{total_commits:,}[/cyan]
[bold]Contributed to[/bold]: [cyan]{total_repos_analyzed}[/cyan] unique public repos
[bold]Commit History Spans[/bold]: From [cyan]{first_commit_date}[/cyan] to [cyan]{last_commit_date}[/cyan]""",
title=f"[bold yellow]High-Level Overview for {username}[/bold yellow]", border_style="yellow")
console.print(panel)
# --- MAIN EXECUTION ---
def main():
parser = argparse.ArgumentParser(description="Analyze a GitHub user's public contributions.")
parser.add_argument("username", type=str, help="GitHub username to analyze.")
parser.add_argument("--limit-repos", type=int, default=0, help="Limit analysis to the first N repositories for a quick scan.")
parser.add_argument("--no-plots", action="store_true", help="Disable displaying matplotlib plots.")
parser.add_argument("--no-cache", action="store_true", help="Force a fresh fetch of data from the GitHub API.")
args = parser.parse_args()
github_token = os.getenv("GITHUB_TOKEN")
if not github_token:
console.print("[bold red]Error: GITHUB_TOKEN environment variable not set.[/bold red]")
sys.exit(1)
g = Github(github_token)
df, repos = load_or_fetch_data(g, args.username, args.limit_repos, not args.no_cache)
df.attrs['username'] = args.username
report_overview(df, args.username)
advanced_metrics = calculate_advanced_metrics(df, repos)
if not advanced_metrics:
console.print("[yellow]Could not generate advanced metrics due to lack of commit data.[/yellow]")
sys.exit(0)
report_advanced_metrics(advanced_metrics)
generate_final_verdict(advanced_metrics)
console.print("\n[bold green]Analysis Complete.[/bold green]")
if __name__ == "__main__":
main()- The analysis is performed on public repositories only and is based on the commit history authored by the specified user.
- The Security Scan feature has been removed in V2 to focus on developer profile metrics. For security analysis, always use dedicated tools like Gitleaks or TruffleHog.
- For users with thousands of commits, the initial data fetch can be slow. The caching mechanism is designed to mitigate this on subsequent runs.