GitHub Profile Analyzer Pro

An advanced command-line tool for a deep-dive analysis of any GitHub user's public activity. This version moves beyond basic reporting to provide sophisticated, SOTA metrics and a persona-based final verdict, offering a true analytical perspective on a developer's profile.

It uses a rich, color-coded terminal interface for a beautiful and highly readable user experience.

✨ Features

Rich & Beautiful Terminal UI: Presents data in elegant tables, panels, and color-coded text using rich.
Persona-Based Final Verdict: Interprets metrics in combination to assign a developer "persona" (e.g., Seasoned Architect, Curious Explorer), providing a holistic and nuanced summary.
Advanced SOTA Metrics: Calculates insightful metrics you won't find elsewhere:
- Consistency Score: Measures the regularity of contributions.
- Commit Discipline Score: Grades the quality of commit messages based on length, format, and clarity.
- Learning Trajectory: Quantifies continuous learning by tracking the adoption of new languages over time.
- Impact Factor (H-Index): A robust measure of a developer's influence in the open-source community.
Intelligent Caching: Saves results locally to make subsequent analyses on the same user instantaneous.
Robust Error & Rate Limit Handling: Gracefully waits and retries on API rate limits, ensuring completion.
Visualizations: Generates an optional Matplotlib chart for month-over-month activity.

📸 Sample Output

Here's a preview of the new, more analytical output in your terminal:

┌────────────────────────── High-Level Overview for userX ───────────────────────────┐
│                                                                                    │
│ Total Commits Analyzed: 8,451                                                      │
│ Contributed to: 42 unique public repos                                             │
│ Commit History Spans: From 2018-03-15 to 2023-11-21                                │
│                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Advanced Developer DNA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                  │ Value          │ Interpretation                                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Consistency Score       │ 82.5 / 100     │ Measures regularity of commits (lower variance… │
│ Commit Discipline Score │ 88.1 / 100     │ Quality of commit messages (conventional format… │
│ Learning Trajectory     │ 4 new langs    │ Languages adopted after the first year of acti… │
│ Impact Factor (H-Index) │ 12             │ Has 12 repos with at least 12 stars each.       │
└─────────────────────────┴────────────────┴─────────────────────────────────────────────────┘

┌──────────────────────────────────── Final Verdict ─────────────────────────────────────┐
│                                                                                        │
│                               🏛️ The Seasoned Architect                                │
│                                                                                        │
│ This profile exhibits strong signs of a lead developer or architect. A high H-Index    │
│ shows significant community impact, while a top-tier discipline score indicates a      │
│ focus on code quality, maintainability, and clear communication. Their work is both    │
│ influential and professionally crafted.                                                │
│                                                                                        │
└────────────────────────────────────────────────────────────────────────────────────────┘

Analysis Complete.

⚙️ Setup

1. Prerequisites

Python 3.7+
The pip package manager

2. Install Required Libraries

This version requires numpy for statistical calculations. Open your terminal and run:

pip install PyGithub pandas matplotlib rich tqdm numpy

3. Get a GitHub Personal Access Token (PAT)

A PAT is essential for this script to work.

Go to github.com/settings/tokens and click "Generate new token (classic)".
Give it a name (e.g., "GitHub Analyzer Script").
Check the box for public_repo scope.
Click "Generate token" and copy the token immediately.

4. Set the Token as an Environment Variable

Never hardcode your token. The script reads it from an environment variable named GITHUB_TOKEN.

macOS / Linux:

# Add this line to your ~/.zshrc or ~/.bashrc
export GITHUB_TOKEN="your_pasted_token_here"
# Restart your terminal or run `source ~/.zshrc`

Windows:

# This command sets the variable permanently
setx GITHUB_TOKEN "your_pasted_token_here"
# Restart your terminal for the change to take effect

🚀 Usage

Save the code below as github_analyzer_pro_v2.py.

Basic Command

python github_analyzer_pro_v2.py <github_username>

Command-Line Options

usage: github_analyzer_pro_v2.py [-h] [--limit-repos LIMIT_REPOS] [--no-plots] [--no-cache] username

# ... (same options as before)

🧠 Deep Dive into Advanced Metrics

This script's power comes from its unique metrics. Here's what they mean:

Metric	How It's Calculated	What It Tells You
Consistency Score	Based on the standard deviation of monthly commits relative to the mean. A lower deviation results in a higher score.	Is this developer a steady contributor or someone who works in bursts? High scores indicate consistent, reliable engagement.
Commit Discipline	A weighted score (0-100) from: • Avg. message length • % of Conventional Commits • % of non-lazy messages	How professional and communicative is their development process? High scores reflect a mature developer who writes clean, useful commit histories.
Learning Trajectory	Counts the number of new programming languages used in commits after their first year of activity on GitHub.	Does this developer actively learn and apply new technologies, or do they stick to a core set of skills? A high number shows adaptability.
Impact Factor (H-Index)	A developer has an h-index of `h` if they have `h` repositories with at least `h` stars each.	A robust measure of influence. It rewards a portfolio of consistently valuable projects over a single viral hit or many unpopular ones.

🎭 Developer Personas

The script analyzes the advanced metrics to assign one of the following personas:

Persona	Key Indicators
🏛️ Seasoned Architect	High H-Index and high Commit Discipline.
🧭 Curious Explorer	High Learning Trajectory, often with lower consistency (bursts of activity).
🔬 Specialist Craftsman	Very high Consistency and Discipline, but low language diversity.
🌟 The Rising Star	A high percentage of total commits and impact occurred in the last year.
🚀 Productivity Powerhouse	Exceptionally high commit volume (mean) and high consistency.
💡 Hobbyist Contributor	Lower scores across the board, indicating sporadic but passionate involvement.

🐍 The Code (V2)

Filename: github_analyzer_pro_v2.py

"""
GitHub Profile Analyzer Pro (v2)
A SOTA script to analyze a GitHub user's public activity, providing deep insights,
advanced metrics, and a persona-based final verdict on their development profile.
"""
import os
import sys
import re
import pickle
import time
import argparse
from datetime import datetime, timedelta
from collections import Counter
from typing import List, Dict, Any, Tuple
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
from github import Github, GithubException, RateLimitExceededException
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from tqdm import tqdm

# --- CONFIGURATION & CONSTANTS ---
SECURITY_PATTERNS = {
    "generic_api_key": re.compile(r'["\']?([a-zA-Z]{3,10}_)?(api|access|secret)_?(key|token)["\']?\s*[:=]\s*["\']?([a-zA-Z0-9\-_]{20,})["\']?', re.IGNORECASE),
    "aws_access_key": re.compile(r'AKIA[0-9A-Z]{16}', re.IGNORECASE),
    "private_key_header": re.compile(r'-----BEGIN ((RSA|OPENSSH|EC|PGP) )?PRIVATE KEY-----', re.IGNORECASE),
}
CACHE_DIR = "cache"
os.makedirs(CACHE_DIR, exist_ok=True)
CONVENTIONAL_COMMIT_REGEX = re.compile(r'^\w+(\(\w+\))?(!)?:\s.*')
console = Console()

# --- DATA FETCHING & CACHING ---
def fetch_user_data(g: Github, username: str, repo_limit: int) -> Tuple[List[Dict[str, Any]], List[Any]]:
    """Fetches all commits from all public repositories for a given GitHub user."""
    try:
        user = g.get_user(username)
    except GithubException:
        console.print(f"[bold red]Error: User '{username}' not found.[/bold red]")
        sys.exit(1)

    all_commits_data, repos_list = [], list(user.get_repos())
    total_repos = len(repos_list)
    
    if repo_limit:
        total_repos = min(total_repos, repo_limit)
        console.print(f"[yellow]Limiting analysis to the first {total_repos} repositories.[/yellow]")

    repo_iterator = repos_list[:total_repos]

    with tqdm(total=total_repos, desc="[cyan]Processing Repos[/cyan]", unit="repo") as pbar:
        for repo in repo_iterator:
            pbar.set_postfix_str(repo.name)
            try:
                # Limit commits per repo to avoid extremely long waits on monolithic repos
                commits = repo.get_commits(author=user.login)
                for commit in commits.reversed[:1000]: # Analyze latest 1000 commits per repo
                    full_commit = get_commit_with_retry(repo, commit.sha)
                    if full_commit and full_commit.stats:
                        stats = full_commit.stats
                        commit_data = {
                            "repo_name": repo.name, "sha": commit.sha,
                            "message": commit.commit.message, "date": commit.commit.author.date,
                            "additions": stats.additions, "deletions": stats.deletions,
                            "total_changes": stats.total,
                        }
                        all_commits_data.append(commit_data)
            except RateLimitExceededException:
                console.print("\n[bold yellow]Rate limit exceeded. Waiting for reset...[/bold yellow]")
                reset_time = g.get_rate_limit().core.reset
                sleep_duration = (reset_time - datetime.utcnow()).total_seconds() + 10
                if sleep_duration > 0:
                    time.sleep(sleep_duration)
                console.print("[green]Resuming...[/green]")
            except GithubException as e:
                pass
            pbar.update(1)
            
    return all_commits_data, repo_iterator

def get_commit_with_retry(repo: Any, sha: str, max_retries: int = 3, delay: int = 5) -> Any:
    """Fetches a single commit with retry logic for transient network errors."""
    for attempt in range(max_retries):
        try:
            return repo.get_commit(sha)
        except GithubException as e:
            if e.status == 404 or attempt >= max_retries - 1: return None
            time.sleep(delay * (attempt + 1))

def load_or_fetch_data(g: Github, username: str, repo_limit: int, use_cache: bool) -> Tuple[pd.DataFrame, List[Any]]:
    """Loads data from cache if available, otherwise fetches from API."""
    cache_file = os.path.join(CACHE_DIR, f"{username}_data.pkl")
    if use_cache and os.path.exists(cache_file):
        console.print(f"[green]Loading data from cache file: {cache_file}[/green]")
        with open(cache_file, "rb") as f: cached_data = pickle.load(f)
        return cached_data['df'], cached_data['repos']

    console.print(f"[bold cyan]Fetching fresh data for user: {username}...[/bold cyan]")
    commit_data, repos = fetch_user_data(g, username, repo_limit)
    
    if not commit_data:
        console.print(f"[bold red]No public commits found for user '{username}'. Exiting.[/bold red]")
        sys.exit(0)

    df = pd.DataFrame(commit_data)
    df['date'] = pd.to_datetime(df['date'])

    if use_cache:
        with open(cache_file, "wb") as f: pickle.dump({'df': df, 'repos': repos}, f)
        console.print(f"[green]Data saved to cache: {cache_file}[/green]")

    return df, repos

# --- ANALYSIS & REPORTING (v2) ---
def calculate_advanced_metrics(df: pd.DataFrame, repos: List[Any]) -> Dict[str, Any]:
    """Calculates sophisticated metrics for a more accurate profile assessment."""
    metrics = {}
    total_commits = len(df)
    if total_commits == 0: return {}

    # 1. Consistency Metrics
    df['month'] = df['date'].dt.to_period('M')
    monthly_commits = df.groupby('month').size()
    metrics['monthly_commit_std_dev'] = monthly_commits.std()
    metrics['monthly_commit_mean'] = monthly_commits.mean()
    consistency_ratio = metrics['monthly_commit_std_dev'] / metrics['monthly_commit_mean'] if metrics['monthly_commit_mean'] > 0 else 1
    metrics['consistency_score'] = max(0, 1 - consistency_ratio) * 100

    # 2. Commit Discipline Score
    df['msg_len'] = df['message'].str.len()
    df['is_conventional'] = df['message'].apply(lambda x: bool(CONVENTIONAL_COMMIT_REGEX.match(x)))
    df['is_lazy'] = df['message'].str.strip().str.split().str.len() <= 2
    
    avg_len_score = min(df['msg_len'].mean() / 70, 1) * 100
    conventional_pct = df['is_conventional'].mean() * 100
    non_lazy_pct = (1 - df['is_lazy'].mean()) * 100
    metrics['commit_discipline_score'] = (avg_len_score * 0.25) + (conventional_pct * 0.5) + (non_lazy_pct * 0.25)

    # 3. Learning Trajectory
    repo_langs = {repo.name: repo.language for repo in repos if repo.language}
    df['language'] = df['repo_name'].map(repo_langs)
    df_sorted = df.dropna(subset=['language']).sort_values('date')
    
    first_commit_date = df['date'].min()
    one_year_marker = first_commit_date + timedelta(days=365)
    
    seen_languages_first_year = set(df_sorted[df_sorted['date'] <= one_year_marker]['language'])
    seen_languages_all_time = set(df_sorted['language'])
    metrics['new_langs_after_first_year'] = len(seen_languages_all_time - seen_languages_first_year)
    metrics['total_languages'] = len(seen_languages_all_time)

    # 4. Impact Factor (H-Index)
    stars = sorted([r.stargazers_count for r in repos], reverse=True)
    h_index = 0
    for i, s in enumerate(stars):
        if s >= i + 1: h_index = i + 1
        else: break
    metrics['h_index'] = h_index
    metrics['total_stars'] = sum(stars)
    
    # 5. Recency
    metrics['commits_last_year'] = df[df['date'] > (datetime.now(df['date'].dt.tz) - timedelta(days=365))].shape[0]
    metrics['recency_ratio'] = metrics['commits_last_year'] / total_commits if total_commits > 0 else 0

    return metrics

def report_advanced_metrics(metrics: Dict[str, Any]):
    table = Table(title="[bold yellow]Advanced Developer DNA[/bold yellow]", show_header=False, padding=(0, 1))
    table.add_column("Metric", style="cyan")
    table.add_column("Value", style="magenta")
    table.add_column("Interpretation", style="default")
    table.add_row("Consistency Score", f"{metrics['consistency_score']:.1f} / 100", "Measures regularity of commits (lower variance is better).")
    table.add_row("Commit Discipline Score", f"{metrics['commit_discipline_score']:.1f} / 100", "Quality of commit messages (format, length, detail).")
    table.add_row("Learning Trajectory", f"{metrics['new_langs_after_first_year']} new langs", "Languages adopted after the first year of activity.")
    table.add_row("Impact Factor (H-Index)", f"{metrics['h_index']}", f"Has {metrics['h_index']} repos with at least {metrics['h_index']} stars each.")
    console.print(table)

def generate_final_verdict(metrics: Dict[str, Any]):
    persona, description = "💡 The Hobbyist Contributor", "This developer contributes to open source with passion, though perhaps not with the high frequency or wide impact of a full-time professional. Their work shows dedication and a love for coding."
    
    if metrics['h_index'] >= 10 and metrics['commit_discipline_score'] >= 75:
        persona, description = "🏛️ The Seasoned Architect", "This profile shows strong signs of a lead developer. A high H-Index indicates significant community impact, while a top-tier discipline score suggests a focus on code quality, maintainability, and clear communication. Their work is influential and professionally crafted."
    elif metrics['new_langs_after_first_year'] >= 4 and metrics['consistency_score'] < 70:
        persona, description = "🧭 The Curious Explorer", "This developer is a quintessential learner, constantly picking up new technologies. The high number of languages adopted, combined with bursty activity, suggests a passion for experimentation, prototyping, and exploring new frontiers in tech."
    elif metrics['consistency_score'] >= 80 and metrics['commit_discipline_score'] >= 80 and metrics['total_languages'] <= 3:
        persona, description = "🔬 The Specialist Craftsman", "Deeply focused and highly professional, this developer has mastered their chosen tools. Their exceptional consistency and commit discipline in a narrow set of technologies point to an expert who builds robust, high-quality software within their domain."
    elif metrics['recency_ratio'] >= 0.5 and metrics['h_index'] > 5:
        persona, description = "🌟 The Rising Star", "This developer's profile shows a dramatic acceleration. With a significant portion of their impact and activity occurring recently, they are on a steep upward trajectory, gaining recognition and contributing valuable work to the community at an increasing rate."
    elif metrics['monthly_commit_mean'] >= 100 and metrics['consistency_score'] >= 75:
        persona, description = "🚀 The Productivity Powerhouse", "With an exceptionally high and consistent volume of contributions, this developer is an engine of productivity. This profile is typical of core maintainers of large projects or individuals with an incredible work ethic and passion for open source."

    panel = Panel(
        f"[bold]{persona}[/bold]\n\n{description}",
        title="[bold magenta]Final Verdict[/bold magenta]",
        border_style="magenta",
        padding=(1, 2)
    )
    console.print(panel)

def report_overview(df: pd.DataFrame, username: str):
    total_commits = len(df)
    total_repos_analyzed = df['repo_name'].nunique()
    first_commit_date = df['date'].min().strftime('%Y-%m-%d')
    last_commit_date = df['date'].max().strftime('%Y-%m-%d')
    panel = Panel(f"""
[bold]Total Commits Analyzed[/bold]: [cyan]{total_commits:,}[/cyan]
[bold]Contributed to[/bold]: [cyan]{total_repos_analyzed}[/cyan] unique public repos
[bold]Commit History Spans[/bold]: From [cyan]{first_commit_date}[/cyan] to [cyan]{last_commit_date}[/cyan]""",
        title=f"[bold yellow]High-Level Overview for {username}[/bold yellow]", border_style="yellow")
    console.print(panel)

# --- MAIN EXECUTION ---
def main():
    parser = argparse.ArgumentParser(description="Analyze a GitHub user's public contributions.")
    parser.add_argument("username", type=str, help="GitHub username to analyze.")
    parser.add_argument("--limit-repos", type=int, default=0, help="Limit analysis to the first N repositories for a quick scan.")
    parser.add_argument("--no-plots", action="store_true", help="Disable displaying matplotlib plots.")
    parser.add_argument("--no-cache", action="store_true", help="Force a fresh fetch of data from the GitHub API.")
    args = parser.parse_args()

    github_token = os.getenv("GITHUB_TOKEN")
    if not github_token:
        console.print("[bold red]Error: GITHUB_TOKEN environment variable not set.[/bold red]")
        sys.exit(1)
    
    g = Github(github_token)
    
    df, repos = load_or_fetch_data(g, args.username, args.limit_repos, not args.no_cache)
    df.attrs['username'] = args.username

    report_overview(df, args.username)
    
    advanced_metrics = calculate_advanced_metrics(df, repos)
    if not advanced_metrics:
        console.print("[yellow]Could not generate advanced metrics due to lack of commit data.[/yellow]")
        sys.exit(0)
        
    report_advanced_metrics(advanced_metrics)
    generate_final_verdict(advanced_metrics)
    
    console.print("\n[bold green]Analysis Complete.[/bold green]")

if __name__ == "__main__":
    main()

⚖️ Disclaimer

The analysis is performed on public repositories only and is based on the commit history authored by the specified user.
The Security Scan feature has been removed in V2 to focus on developer profile metrics. For security analysis, always use dedicated tools like Gitleaks or TruffleHog.
For users with thousands of commits, the initial data fetch can be slow. The caching mechanism is designed to mitigate this on subsequent runs.

fabriziosalmi/analyze_gh_commits.md

Select an option

No results found