Skip to content

Instantly share code, notes, and snippets.

@schickling
Created August 28, 2025 10:42
Show Gist options
  • Save schickling/c165714531957760d9c9754095cc3203 to your computer and use it in GitHub Desktop.
Save schickling/c165714531957760d9c9754095cc3203 to your computer and use it in GitHub Desktop.
Git Repository Lines of Code History Analyzer - Monthly LOC tracking with cloc and terminal visualization

Git Repository Lines of Code History Analyzer

A TypeScript script that analyzes the lines of code (LOC) growth of a Git repository month by month using cloc's git-aware functionality.

Features

  • πŸ“ˆ Monthly LOC tracking over any time period
  • 🎯 Git-aware analysis - no need to checkout commits
  • πŸš€ Language breakdown - see which languages grow over time
  • πŸ“Š Terminal visualization with ASCII bar charts
  • πŸ’Ύ CSV export for further analysis
  • ⚑ Flexible exclusions - exclude directories, file types, or languages

Prerequisites

  • Nix (for cloc installation)
  • Bun or Node.js (for running TypeScript)
  • Git repository to analyze

Quick Start

  1. Save the script as analyze-loc-history.ts
  2. Make it executable: chmod +x analyze-loc-history.ts
  3. Run it: nix shell nixpkgs#cloc -c bun run analyze-loc-history.ts

Script

#!/usr/bin/env -S nix shell nixpkgs#cloc -c bun run

import { execSync } from 'child_process'
import { writeFileSync } from 'fs'

interface LocData {
  month: string
  commit: string
  totalLines: number
  languages: Record<string, number>
}

// Get commits for the last day of each month from Oct 2023 to Aug 2025
function getMonthlyCommits(): { month: string; commit: string }[] {
  const months = []
  
  // Generate month list from Oct 2023 to Aug 2025
  for (let year = 2023; year <= 2025; year++) {
    const startMonth = year === 2023 ? 10 : 1 // Start from October 2023
    const endMonth = year === 2025 ? 8 : 12   // End at August 2025
    
    for (let month = startMonth; month <= endMonth; month++) {
      const monthStr = `${year}-${month.toString().padStart(2, '0')}`
      
      // Get last commit of the month
      try {
        const lastDayOfMonth = new Date(year, month, 0).getDate()
        const untilDate = `${year}-${month.toString().padStart(2, '0')}-${lastDayOfMonth}`
        
        const commitCmd = `git log --until="${untilDate}" --format="%H" -1`
        const commit = execSync(commitCmd, { encoding: 'utf8' }).trim()
        
        if (commit) {
          months.push({ month: monthStr, commit })
        }
      } catch (error) {
        console.warn(`No commits found for ${monthStr}`)
      }
    }
  }
  
  return months
}

// Run cloc on a specific git commit
function runClocForCommit(commit: string): LocData['languages'] {
  try {
    const clocCmd = `cloc --git ${commit} --exclude-dir=node_modules,dist,.direnv,.wrangler,.vercel,.netlify,test-results,playwright-report --exclude-lang=SQL --json --quiet`
    const output = execSync(clocCmd, { encoding: 'utf8', stdio: ['pipe', 'pipe', 'ignore'] })
    
    const data = JSON.parse(output)
    const languages: Record<string, number> = {}
    
    // Parse cloc JSON output
    Object.entries(data).forEach(([key, value]: [string, any]) => {
      if (key !== 'header' && key !== 'SUM' && typeof value === 'object' && value.code) {
        languages[key] = value.code
      }
    })
    
    return languages
  } catch (error) {
    console.warn(`Failed to run cloc for commit ${commit.slice(0, 7)}`)
    return {}
  }
}

// Generate terminal chart
function generateTerminalChart(data: LocData[]): string {
  const maxLines = Math.max(...data.map(d => d.totalLines))
  const maxBarLength = 50
  
  let chart = '\nLines of Code Over Time:\n'
  chart += '=' .repeat(70) + '\n'
  
  data.forEach(({ month, totalLines }) => {
    const barLength = Math.round((totalLines / maxLines) * maxBarLength)
    const bar = 'β–ˆ'.repeat(barLength)
    const spaces = ' '.repeat(Math.max(0, 15 - bar.length))
    chart += `${month}: ${bar}${spaces} ${totalLines.toLocaleString()} lines\n`
  })
  
  chart += '=' .repeat(70) + '\n'
  chart += `Peak: ${maxLines.toLocaleString()} lines\n`
  
  return chart
}

// Generate CSV content
function generateCSV(data: LocData[]): string {
  const allLanguages = new Set<string>()
  data.forEach(d => Object.keys(d.languages).forEach(lang => allLanguages.add(lang)))
  
  const languageColumns = Array.from(allLanguages).sort()
  const headers = ['Month', 'Commit', 'Total_Lines', ...languageColumns.map(lang => `${lang}_Lines`)]
  
  let csv = headers.join(',') + '\n'
  
  data.forEach(({ month, commit, totalLines, languages }) => {
    const row = [
      month,
      commit.slice(0, 7),
      totalLines.toString(),
      ...languageColumns.map(lang => (languages[lang] || 0).toString())
    ]
    csv += row.join(',') + '\n'
  })
  
  return csv
}

// Main execution
function main() {
  console.log('πŸ” Analyzing repository history (excluding SQL files)...')
  
  const monthlyCommits = getMonthlyCommits()
  console.log(`πŸ“… Found ${monthlyCommits.length} monthly snapshots`)
  
  const locData: LocData[] = []
  
  monthlyCommits.forEach(({ month, commit }, index) => {
    process.stdout.write(`\rπŸ“Š Processing ${month} (${index + 1}/${monthlyCommits.length})...`)
    
    const languages = runClocForCommit(commit)
    const totalLines = Object.values(languages).reduce((sum, lines) => sum + lines, 0)
    
    locData.push({
      month,
      commit,
      totalLines,
      languages
    })
  })
  
  console.log('\nβœ… Analysis complete!')
  
  // Generate outputs
  const csv = generateCSV(locData)
  const chart = generateTerminalChart(locData)
  
  // Write CSV file
  writeFileSync('loc-history-analysis.csv', csv)
  console.log('πŸ’Ύ CSV saved to loc-history-analysis.csv')
  
  // Print terminal chart
  console.log(chart)
  
  // Print summary
  const firstMonth = locData[0]
  const lastMonth = locData[locData.length - 1]
  const growth = lastMonth.totalLines - firstMonth.totalLines
  const growthPercent = ((growth / firstMonth.totalLines) * 100).toFixed(1)
  
  console.log('\nπŸ“ˆ Summary:')
  console.log(`β€’ Start (${firstMonth.month}): ${firstMonth.totalLines.toLocaleString()} lines`)
  console.log(`β€’ End (${lastMonth.month}): ${lastMonth.totalLines.toLocaleString()} lines`)
  console.log(`β€’ Growth: +${growth.toLocaleString()} lines (+${growthPercent}%)`)
  
  // Top languages in final snapshot
  const topLanguages = Object.entries(lastMonth.languages)
    .sort(([,a], [,b]) => b - a)
    .slice(0, 5)
  
  console.log('\nπŸ† Top Languages (current):')
  topLanguages.forEach(([lang, lines]) => {
    const percent = ((lines / lastMonth.totalLines) * 100).toFixed(1)
    console.log(`β€’ ${lang}: ${lines.toLocaleString()} lines (${percent}%)`)
  })
}

if (import.meta.main) {
  main()
}

Customization Options

1. Date Range

Modify the getMonthlyCommits() function to change the analysis period:

// Change these values to analyze different periods
const startMonth = year === 2023 ? 10 : 1 // Start from October 2023
const endMonth = year === 2025 ? 8 : 12   // End at August 2025

2. Exclusions

Modify the clocCmd in runClocForCommit():

// Add more exclusions as needed
const clocCmd = `cloc --git ${commit} \\
  --exclude-dir=node_modules,dist,.direnv,build,coverage \\
  --exclude-lang=SQL,JSON \\
  --json --quiet`

3. Output File

Change the CSV filename:

writeFileSync('my-custom-analysis.csv', csv)

Common Exclusions

Directories to Exclude

  • node_modules - Dependencies
  • dist, build - Build outputs
  • .direnv, .git - Tool directories
  • test-results, playwright-report - Test artifacts
  • coverage - Coverage reports

Languages to Exclude

  • SQL - Database schema files (often auto-generated)
  • JSON - Configuration files (if too verbose)
  • YAML - CI/CD configs (if not relevant to code analysis)

Example Output

Terminal Chart

Lines of Code Over Time:
======================================================================
2023-10: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         12,384 lines
2023-11: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      18,759 lines
2024-01: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     20,157 lines
2024-06: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 31,112 lines
2025-08: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 91,630 lines
======================================================================
Peak: 91,630 lines

Summary

πŸ“ˆ Summary:
β€’ Start (2023-10): 12,384 lines
β€’ End (2025-08): 91,630 lines
β€’ Growth: +79,246 lines (+639.9%)

πŸ† Top Languages (current):
β€’ TypeScript: 45,372 lines (49.5%)
β€’ YAML: 23,399 lines (25.5%)
β€’ JavaScript: 11,143 lines (12.2%)
β€’ Markdown: 4,949 lines (5.4%)
β€’ JSON: 3,028 lines (3.3%)

Tips & Tricks

1. Handle Large Repositories

For very large repos, you might want to:

  • Reduce the frequency (quarterly instead of monthly)
  • Focus on specific directories: --include-dir=src,lib
  • Run during off-peak hours

2. Analyze Specific Components

# Only analyze source code directories
cloc --git ${commit} --include-dir=src,lib,packages

3. Compare Branches

Modify the script to analyze different branches:

git log branch-name --until="${untilDate}" --format="%H" -1

4. Export to Spreadsheet Tools

The CSV can be imported into:

  • Excel/Google Sheets for advanced charting
  • Grafana for time-series dashboards
  • Python/pandas for statistical analysis

Why This Approach Works

  1. Git-aware: Uses cloc --git to analyze historical commits without checking them out
  2. Efficient: No file system operations, just git object analysis
  3. Accurate: Respects .gitignore and git history automatically
  4. Flexible: Easy to customize exclusions and date ranges
  5. Reproducible: Same results every time, independent of working directory state

Troubleshooting

"Command not found: cloc"

Make sure you're running with nix shell:

nix shell nixpkgs#cloc -c bun run analyze-loc-history.ts

Empty results

Check if commits exist in your date range:

git log --oneline --since="2023-01-01" --until="2023-12-31"

Performance issues

For large repos, consider:

  • Reducing date range
  • Adding more directory exclusions
  • Running on a machine with more RAM/CPU

Created by: Lines of Code Analysis Script
License: MIT
Dependencies: Nix (cloc), Bun/Node.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment