AI-Driven Issue Resolution for System.Text.Json: Strategic Analysis & Plan

The Landscape (262 open issues)

Category	Count	Notes
Bugs	30	Core correctness issues
Enhancements	55	Behavior improvements, not new API
API Suggestions	76	Need API review process
API Approved	6	Ready to implement
(remaining ~95)	~95	Unlabeled beyond area tag, or other combos

Age: ~70% of bugs and enhancements are 2+ years old. These are the "long tail" -- not urgent enough to prioritize, but real enough not to close.

Special labels: 8 "help wanted", 5 "in-pr" (already have a PR), 2 regressions, 5 partner-impact, 18 wishlist.

The Core Tension

You've identified the key insight: there are three bottleneck roles that are currently human-bound:

Triage -- "Is this worth fixing?" / "Should we close this as won't-fix?"
Design -- "We want behavior A not B" / "What's the right API shape?"
Implementation + Review -- Write code, write tests, review the PR

AI can already do (3) for many issues. The bottleneck is (1) and (2), which are policy/judgment calls. But here's the thing: not all issues need all three steps. Some issues have already been triaged and the desired behavior is unambiguous.

Proposed Categorization: AI Tractability Tiers

Tier 1: "AI can fix this today, PR is near-trivial to review" (~15-25 issues)

These share characteristics:

The desired behavior is unambiguous (bug has clear repro, expected vs actual is obvious)
The fix is localized (one file, one code path)
Tests can be derived from the issue (repro steps = test case)
No design decisions needed

Examples from the current bugs:

#125237 Struct properties deserialized as null from stream (regression, has repro)
#123372 Breaking change in JsonNodeConverter (regression)
#113268 Deserialization failing for types that worked in net8 (regression)
#110450 JsonException deserializing async to nullable types
#75269 Polymorphic deserialization ignores PropertyNameCaseInsensitive
#92780 Virtual property with JsonPropertyName serialized twice (already in-pr)
#104700 Collection property without getter gives misleading error (help wanted)
#71784 Duplicate object keys causing ArgumentException (help wanted)

Also: the 6 API-approved issues are fully designed and just need implementation.

Why near-trivial to review: The reviewer's job is essentially "does the test match the issue, and does the fix make the test pass without breaking others." CI does most of that.

Tier 2: "AI can fix, but needs a human to make one decision first" (~30-40 issues)

These need a single "which way do we go" judgment call, after which the implementation is mechanical.

Examples:

#60560 Custom converter for Dict<string,object> collides with JsonExtensionData -- what should win?
#50078 JsonIgnore not inherited on override -- should it be? (13 reactions, clearly wanted)
#51165 JsonPropertyName inheritance preserves parent member -- intended?
#96996 Properties overriding setter only are ignored -- should they be supported?
#56212 JsonConverter inconsistency without public getter

Workflow: AI reads the issue + comments, proposes behavior A vs B to a human, human picks one in a comment, AI implements.

Tier 3: "Needs human design/API review" (~80-100 issues)

The 76 API suggestions + some enhancements. These need the API review process. AI cannot shortcut this -- it's a governance question, not a technical one.

However, AI can help by:

Drafting API proposals in the correct format
Doing feasibility analysis ("here's what the implementation would look like")
Identifying which suggestions are duplicates or already addressed

Tier 4: "Recommend closing" (~20-40 issues)

AI could scan for:

Issues from .NET 5/6 era that describe behaviors now changed
Issues with no activity for 3+ years and no reactions
Issues marked "wishlist" with very niche use cases
Issues that are duplicates of other issues
Issues where the workaround is trivial and the fix would be disproportionately complex

The 18 "wishlist" items are prime candidates for review.

Practical Workflow: Avoiding the "200 PRs" Problem

Opening 200 PRs at once is a non-starter. But this can be solved:

Phase 1: Triage Sweep (AI-assisted, human-approved)

AI analyzes all 262 issues and produces a spreadsheet/report categorizing each into the tiers above
AI recommends ~20-40 for closure with justification
Human reviews the triage recommendations in bulk (much faster than individual issue analysis)
Result: Issue count drops to ~180-220; remaining issues are categorized

Phase 2: Low-Hanging Fruit (Tier 1 fixes, batched)

AI fixes Tier 1 issues in batches of 5-8 PRs at a time
Each PR: fix + tests + links to issue
Reviewer can review these quickly because the scope is small and behavior is unambiguous
Pace: ~5 PRs/week, clears Tier 1 in 3-5 weeks
API-approved items (6) get done first since they're fully spec'd

Phase 3: Decision-Gated Fixes (Tier 2)

AI posts a structured comment on each Tier 2 issue: "I can fix this. The design question is: A or B. Here's what each looks like."
Human responds with a one-line decision
AI implements based on the decision
Pace: depends on human response time for decisions

Phase 4: API Pipeline Acceleration (Tier 3)

AI drafts formal API proposals for promising suggestions
AI identifies duplicates and recommends consolidation
This feeds the existing API review process, just faster

Key Insights for the Future

What works today

Regression bugs are the sweet spot. Clear before/after, test = repro, fix = restore old behavior.
"Help wanted" bugs are pre-triaged as "yes, fix this." They're literally asking for someone to do the work.
API-approved issues are fully designed. Pure implementation.

What's still hard

Behavioral ambiguity. Many STJ bugs are really "is this the intended design?" questions. Edge cases in serialization where reasonable people disagree. AI can propose, but someone with context on STJ's design philosophy needs to decide.
Cross-cutting concerns. Some fixes touch the source generator, reflection path, AND the core reader/writer. These need architectural awareness.
Performance implications. STJ is perf-critical. Some "obvious" fixes (e.g., adding a null check) can have measurable perf impact on hot paths. Human judgment needed.

The review bottleneck is real but solvable

The "200 PRs" problem is a throughput issue, not a fundamental barrier. Solutions:

Batch by subsystem -- group related fixes so one reviewer context-loads once
AI-generated review summaries -- "This PR fixes #X. The change is: [one-sentence diff summary]. Test coverage: [list]. Perf impact: none (cold path)."
Graduated trust -- as AI PRs prove reliable, review can become lighter

The bigger picture

The real unlock isn't "AI fixes all bugs." It's AI collapses the triage-to-fix pipeline. Today:

User files issue (minutes)
Human triages (days to weeks)
Human prioritizes (weeks to never)
Human implements (hours to days)
Human reviews (hours to days)

With AI: steps 2-5 can happen in hours, with humans only needed for judgment calls in step 2-3. The backlog doesn't accumulate because the fix comes almost as fast as the report.

Concrete Next Steps (Todos)

triage-sweep: Build an AI triage report for all 262 issues -- categorize each into Tier 1/2/3/4 with justification
close-candidates: From the triage sweep, extract the "recommend closing" list with per-issue rationale
tier1-pilot: Pick 3-5 Tier 1 bugs and have AI create actual fix PRs as a proof of concept
api-approved-impl: Implement the 6 API-approved issues (these are unambiguous and ready)
tier2-decisions: For Tier 2 issues, draft the "A vs B" decision comments for human review
measure-review-cost: After the pilot PRs, measure actual review time to validate the "near-trivial" claim

Issue Resolution Dashboard: Tooling Design

Design Principle: "Seconds Per Decision"

The human's job is to make judgment calls, not to gather context, navigate GitHub, write comments, or manage state. Every second spent on logistics is wasted. The tool should present pre-digested information and capture decisions with a single click.

Target metrics:

Close/keep decision: 5-10 seconds per issue (vs. minutes today)
"Go fix it" decision: 10-20 seconds per issue
"A vs B" design call: 30-60 seconds per issue
PR review (Tier 1): 2-5 minutes per PR (vs. 15-30 min unassisted)

The Dashboard: A Single-Page Web App

One URL. Three tabs/queues. Each is a stack of cards the human burns through.

Queue 1: "Recommend Closing" (Triage Sweep)

What the human sees: A card per issue, sorted by AI confidence (highest first).

┌─────────────────────────────────────────────────────────┐
│ #90140  Uri serialization is missing Scheme              │
│ ─────────────────────────────────────────────────────── │
│ Filed: 2024-01-15 | Reactions: 0 | Comments: 2          │
│ Labels: enhancement, wishlist                            │
│                                                          │
│ AI Summary: Requests that Uri.ToString() include the     │
│ scheme in JSON output. Workaround: custom converter      │
│ (3 lines). No activity in 2 years.                       │
│                                                          │
│ AI Recommendation: CLOSE - Niche request, trivial        │
│ workaround, no community demand.                         │
│ Confidence: 92%                                          │
│                                                          │
│ [Close with AI message] [Keep Open] [Open in GitHub ↗]  │
└─────────────────────────────────────────────────────────┘

Key UX decisions:

"Close with AI message" is the primary action (green, large) -- the AI has pre-drafted a polite close message explaining the rationale. Human doesn't write anything.
"Keep Open" is secondary -- optionally tag with a reason (dropdown: "valid but low priority", "needs discussion", "duplicate of #___")
"Open in GitHub" is the escape hatch for when the summary isn't enough
Keyboard shortcuts: C = close, K = keep, O = open in GitHub, → = next card. Human can fly through these.
Cards auto-advance after action -- no "submit" step
Running counter: "14 of 37 reviewed | 9 closed, 5 kept"

Pre-drafted close message example:

Thank you for filing this issue. After reviewing the current backlog, we're closing this as the use case can be addressed with a custom JsonConverter (see [link to docs]). If this becomes a more broadly requested feature, please feel free to reopen or file a new issue with additional context.

The human can edit before sending, but the default should be good enough 95% of the time.

Queue 2: "Ready to Fix" (Tier 1 + Tier 2)

What the human sees: Issues grouped by sub-category, each with an AI analysis.

Tier 1 cards (no decision needed, just approval):

┌─────────────────────────────────────────────────────────┐
│ #125237  Struct properties deserialized as null (stream) │
│ ─────────────────────────────────────────────────────── │
│ Type: Regression (net9+) | Reactions: 0 | Comments: 2   │
│                                                          │
│ AI Analysis:                                             │
│ • Root cause: Buffer boundary handling in streaming      │
│   deserialization path loses nullable struct state        │
│ • Fix location: Utf8JsonReader or JsonSerializer stream  │
│   deserialization logic                                  │
│ • Test: Direct from repro (struct with nullable prop,    │
│   deserialize from stream vs bytes)                      │
│ • Risk: Low (regression fix restores prior behavior)     │
│ • Confidence: 88%                                        │
│                                                          │
│ [Create PR] [Skip] [Needs Discussion] [Open in GitHub ↗]│
└─────────────────────────────────────────────────────────┘

"Create PR" queues the issue for AI to fix. It doesn't block the human -- they keep reviewing cards while AI works in background.

Tier 2 cards (one decision needed):

┌─────────────────────────────────────────────────────────┐
│ #50078  JsonIgnore not inherited on override (13 👍)     │
│ ─────────────────────────────────────────────────────── │
│ Type: Behavior question | Age: 4 years                   │
│                                                          │
│ AI Analysis:                                             │
│ The question: When a derived class overrides a property  │
│ marked [JsonIgnore] on the base, should the attribute    │
│ be inherited?                                            │
│                                                          │
│ Option A: Yes, inherit [JsonIgnore] (matches user        │
│   expectations, consistent with other frameworks)        │
│ Option B: No, override clears it (current behavior,      │
│   gives derived class full control)                      │
│                                                          │
│ AI recommends: A (13 reactions all want this)            │
│                                                          │
│ [Go with A] [Go with B] [Needs Discussion] [GitHub ↗]  │
└─────────────────────────────────────────────────────────┘

One click picks the behavior, AI implements it. No comment-writing, no context-gathering.

Sorting/filtering:

Sort by: confidence, age, reactions, risk level
Filter by: regressions only, help-wanted only, source-gen issues, etc.
"Quick mode": show only issues where AI confidence > 85%

Queue 3: "Review PRs" (After AI Creates PRs)

This is where AI-created PRs queue up for review. Not a replacement for GitHub's review UI, but a triage layer on top of it.

┌─────────────────────────────────────────────────────────┐
│ PR #9999  Fix: JsonIgnore inheritance on override        │
│ Fixes #50078 | +47 -3 | 2 files changed                 │
│ ─────────────────────────────────────────────────────── │
│ AI Review Summary:                                       │
│ • Changed: DefaultJsonTypeInfoResolver.cs (property      │
│   resolution now walks base class attributes)            │
│ • Tests: 3 new tests covering base/derived/multi-level  │
│ • Perf: No hot-path changes. Adds one attribute check   │
│   during type resolution (cold path, once per type).     │
│ • Breaking: Technically yes -- code relying on ignore    │
│   being cleared on override will change. But 13 users    │
│   explicitly asked for this behavior.                    │
│ • CI: ✅ All passing                                     │
│                                                          │
│ [Approve & Merge] [Review in GitHub ↗] [Request Changes] │
└─────────────────────────────────────────────────────────┘

The key insight: For Tier 1 fixes, the AI summary + passing CI should be enough for the reviewer to approve without reading every line. The "Review in GitHub" button is for when they want to dig deeper.

Batch Operations

For power users who trust the AI's judgment on high-confidence items:

┌─────────────────────────────────────────────────────────┐
│ ⚡ Batch Mode                                            │
│                                                          │
│ ☑ Close 12 issues (AI confidence > 90%)                  │
│ ☑ Create PRs for 8 Tier 1 bugs (confidence > 85%)       │
│ ☐ Auto-merge PRs with passing CI (confidence > 95%)     │
│                                                          │
│ [Preview Actions] [Execute]                              │
└─────────────────────────────────────────────────────────┘

"Preview Actions" expands to show each issue/PR that would be affected. Human can uncheck individual items. This is for the "I've validated that the AI is reliable on these types of issues" phase.

Architecture

Option A: Static Site + GitHub Actions (Simpler)

Data layer: A GitHub Action runs periodically (or on-demand via workflow_dispatch) that:
1. Fetches all open issues with the area label
2. Runs AI analysis on each (categorize, summarize, recommend)
3. Writes results to a JSON file in a repo (or gist)
Frontend: A static site (GitHub Pages) reads the JSON and renders the dashboard
Actions: Button clicks trigger GitHub API calls directly from the browser (using a GitHub App token or the user's PAT)
Pros: No server to maintain, free hosting, simple
Cons: Token management, no real-time updates, limited batch operations

Option B: Lightweight Server + React Frontend (More Capable)

Backend: A small API (Azure Function, or even a GitHub App) that:
1. Manages the AI analysis pipeline
2. Handles GitHub API calls (close issues, create PRs, post comments)
3. Caches issue analysis results
4. Tracks human decisions and AI outcomes over time
Frontend: React SPA with the queue-based card UI
Pros: Real-time, better batch operations, can track metrics, better auth
Cons: More to build and maintain

Recommendation: Start with Option A

The static site approach validates the UX with minimal investment. The JSON data file can be generated by a CLI tool or GitHub Action. Migrate to Option B only if the workflow proves valuable and needs more sophistication.

Data Model (the JSON that powers the dashboard)

{
  "generated_at": "2026-03-07T06:00:00Z",
  "issues": [
    {
      "number": 50078,
      "title": "JsonIgnore attribute is not inherited in overridden properties",
      "url": "https://github.com/dotnet/runtime/issues/50078",
      "created_at": "2021-03-22",
      "labels": ["bug", "area-System.Text.Json"],
      "reactions": 13,
      "comments": 10,
      "tier": 2,
      "ai_summary": "When a derived class overrides a property marked [JsonIgnore] on the base class, the attribute is not inherited, causing the property to be serialized unexpectedly.",
      "ai_recommendation": {
        "action": "fix",
        "confidence": 0.91,
        "rationale": "Clear user expectation (13 reactions), consistent with other serializer frameworks.",
        "decision_needed": {
          "question": "Should [JsonIgnore] be inherited on property overrides?",
          "options": [
            {"key": "A", "label": "Yes, inherit (matches user expectations)", "recommended": true},
            {"key": "B", "label": "No, override clears it (current behavior)"}
          ]
        }
      },
      "ai_close_message": null,
      "status": "pending_review",
      "human_decision": null,
      "pr_number": null
    },
    {
      "number": 90140,
      "title": "Uri serialization is missing Scheme",
      "url": "https://github.com/dotnet/runtime/issues/90140",
      "tier": 4,
      "ai_summary": "...",
      "ai_recommendation": {
        "action": "close",
        "confidence": 0.92,
        "rationale": "Niche request, trivial workaround, no community demand in 2 years."
      },
      "ai_close_message": "Thank you for filing this issue. After reviewing the backlog...",
      "status": "pending_review"
    }
  ]
}

Metrics & Feedback Loop

The dashboard should track:

Accuracy: How often did the human agree with AI's recommendation? (Target: >85%)
Override rate: How often did the human pick the non-recommended option?
PR quality: What % of AI PRs were merged without changes? With minor changes? Rejected?
Time per decision: Are we hitting the 5-10 second target for close/keep?
Throughput: Issues resolved per week

These metrics feed back into the AI prompts -- if the AI is consistently wrong about a category, adjust.

What This Doesn't Solve

API review process: The 76 API suggestions still need the existing governance. The tool can surface them and draft proposals, but the review meeting cadence is a human process.
Cross-team dependencies: Some issues involve other areas (ASP.NET, EF Core). The tool can flag these but can't resolve them.
Political/strategic decisions: "Should STJ try to be Newtonsoft-compatible?" is a direction question, not a per-issue question.

MVP Scope

To validate this approach with minimal investment:

CLI tool that generates the JSON analysis for all 262 issues
Single HTML file (no build step) that renders the card UI from the JSON
Close action works via gh issue close (user runs from their terminal)
Fix action queues to a local list; a second CLI tool processes the queue

This could be built in a day and tested on the real issue set.

danmoseley/stj-ai-issue-resolution-plan.md