Skip to content

Instantly share code, notes, and snippets.

@heathdutton
Last active February 8, 2026 16:52
Show Gist options
  • Select an option

  • Save heathdutton/14dc1fd16db67eb272335f79367bf166 to your computer and use it in GitHub Desktop.

Select an option

Save heathdutton/14dc1fd16db67eb272335f79367bf166 to your computer and use it in GitHub Desktop.
Fake GitHub Stars Analysis: January 2025 - Projects with suspected artificial star inflation

Fake GitHub Stars: January 2025 Analysis

This analysis uses StarScout, an open-source tool from ICSE '26 research paper "Six Million (Suspected) Fake Stars on GitHub."

Detection Methodology

Two complementary heuristics identify suspected fake stars:

Heuristic What It Catches Signal Strength
Low-Activity Throwaway accounts (single-day activity, ≤2 actions) High confidence for intentional fraud
Clustered/Lockstep Coordinated starring campaigns (many users starring same repos in tight timeframes) Catches artificial amplification; some false positives from viral organic growth

Key Patterns Observed

1. Two Distinct Fraud Profiles

Intentional Fraud (Low-Activity Heuristic)

  • Crypto/trading bots, MEV extractors
  • Game hacks, cheats, exploits
  • Darkweb/tor directories
  • "Predictor" and gambling signal tools
  • Pirated software, cracks

Artificial Amplification (Clustered Heuristic)

  • AI/LLM projects riding hype cycles
  • Frontend component libraries seeking visibility
  • "Awesome" curated lists gaming discoverability
  • Blockchain/Web3 projects

2. Fraud Percentage Distribution

Fake Star % Repos Interpretation
90-100% 63 Almost certainly fraudulent
70-89% 77 Highly suspicious
50-69% 134 Likely boosted
<50% 3,068 Mixed signals; may include organic growth

3. Category Breakdown (High-Confidence Fraud Only)

From repos with >50% fake stars:

Category Fake Stars Description
Bots/Automation 6,741 Trading bots, scrapers, automation tools
AI/LLM 5,041 AI wrappers, prompt tools
Hacks/Cheats 3,169 Game exploits, cracks
Predictors 2,092 Gambling/trading "signal" tools
Darkweb 1,986 Tor/onion link directories

Top January 2025 Projects Flagged for Fake Stars

Cross-referencing the most-starred repos in January 2025 with our detection data:

High Suspicion (>20% fake OR >50k flagged stars)

Project Jan Stars Flagged Stars Fake % Concern Level
unionlabs/union 19,942 65,309 47.4% Very High
langflow-ai/langflow 4,635 31,515 47.9% Very High
raga-ai-hub/RagaAI-Catalyst 2,892 5,522 49.5% Very High
raga-ai-hub/AgentNeo 3,567 1,398 42.3% Very High
shardeum/shardeum 4,137 6,077 42.2% Very High
DigitalPlatDev/FreeDomain 2,554 90,091 32.1% High
anoma/anoma 4,569 31,553 23.0% High
linera-io/linera-protocol 11,389 35,584 21.4% High

Massive Clustered Activity (Likely Amplification Campaigns)

These projects show enormous clustered starring activity. While some may be organic viral growth (especially established AI tools), the scale is notable:

Project Jan Stars Clustered Stars Notes
deepseek-ai/* 120k+ 600k+ total AI hype cycle; likely mix of organic + amplified
open-webui/open-webui 9,719 124,526 Popular LLM UI
huggingface/open-r1 12,739 121,231 DeepSeek-R1 replication
browser-use/browser-use 13,369 119,278 Browser automation AI
ollama/ollama 10,704 117,344 Local LLM runner
inkonchain/* 98k+ 89k+ total Blockchain project; coordinated campaign

Fraud by Sector

Sector Avg Fake % Total Fake Stars Profile
Bots/Automation 50.2% 9,873 Highest fraud rate; trading bots, scrapers
Blockchain/Crypto 35.8% 8,727 Second highest; investor-driven visibility
Hacks/Exploits 26.2% 6,663 Game cheats, cracks
AI/ML 11.8% 177,153 Lower rate but massive volume due to hype

Pattern: Blockchain/Crypto Projects

The highest fraud rate by sector. These projects are disproportionately represented:

  • unionlabs/union (47% fake)
  • shardeum/shardeum (42% fake)
  • anoma/anoma (23% fake)
  • linera-io/linera-protocol (21% fake)
  • inkonchain/* (massive clustered activity)

Star-buying appears endemic in crypto/Web3, likely driven by investor relations and token launch visibility.

Pattern: AI/ML Projects

AI projects have a lower fraud rate (11.8%) but the highest absolute volume (177k fake stars) due to the sector's explosive growth. The AI hype cycle creates pressure for visibility.

Worst AI offenders:

  • langflow-ai/langflow (48% fake) - AI workflow builder
  • raga-ai-hub/RagaAI-Catalyst (50% fake) - AI testing platform
  • raga-ai-hub/AgentNeo (42% fake) - AI agent framework
  • openai/openai-fm (57% fake) - Voice model demo
  • sidetrip-ai/ici-core (82% fake) - AI assistant

Many smaller AI projects show 70-99% fake stars, suggesting a cottage industry of AI wrappers and "agents" using fake stars to appear legitimate.


Conclusion

Projects Popular Today Specifically Due to Fraud:

Project Sector Fake % Verdict
unionlabs/union Blockchain 47% Nearly half fake
langflow-ai/langflow AI 48% Half fake; major AI tool
raga-ai-hub/* AI 42-50% Both repos heavily inflated
shardeum/shardeum Blockchain 42% Investor-driven fraud
DigitalPlatDev/FreeDomain Utility 32% 90k flagged stars
openai/openai-fm AI 57% Even OpenAI projects targeted

The pattern is clear:

  1. Blockchain/crypto has the highest fraud rate (~36% average)
  2. AI/ML follows with lower rates but massive volume—the hype cycle creates pressure to game visibility
  3. The two hottest sectors in tech are also the most fraudulent on GitHub

These projects' current visibility is substantially inflated by purchased or manufactured engagement.


Data Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment