Polkadot Community Evaluation: AI Governance Framework for Decentralized Voices Program

Document Control

Version: 0.9.1
Date: August 21, 2025
Author: Luke Schoen, Clawbird Pty Ltd
Status: For Public Review Prior to W3F Submission

Abstract
1.1. Evaluation Scope Constraints
1.2. Evaluation Context and Methodology
F1.2.1. Figure 1.2.1: AI Governance Evaluation Framework Components
1.3. Core Principle: Less Trust, More Truth
F1.3.1. Figure 1.3.1: Less Trust, More Truth Verification Flow
1.4. Disclaimer
Web3 AI Agent Evaluation Framework with Examples
2.1. Technical Architecture Questions
F2.1.1. Figure 2.1.1: Technical Architecture Evaluation Framework
2.2. Data and Training Questions
2.3. Governance Alignment Questions
2.4. Transparency and Accountability Questions
2.5. Operational Questions
2.6. Delegate Accountability and Two-Strikes Rule Questions
F2.6.1. Figure 2.6.1: Two-Strikes Rule Application Process
Analysis of Areas For Consideration
F3.1. Figure 3.1: AI Governance Identity and Accountability Contradiction
F3.2. Figure 3.2: AI Governance Jurisdictional Considerations
3.1. The Contradiction in Terminology
3.2. Does the Two-Strikes Rule Resolve This Contradiction?
3.3. Fairness Questions
Evaluation of DV Cohort 5 Selection vs. Stated Criteria
4.1. Accountability and Responsibility
F4.1.1. Figure 4.1.1: Comprehensive AI Governance Evaluation Framework
F4.1.2. Figure 4.1.2: Accountability Structure for AI and Human Delegates
4.2. Transparency and Disclosure
4.3. Technical Standards and Verification
4.4. Expertise and Decision Quality
4.5. Experimental Evaluation and Reporting
4.6. Human Oversight and Control
4.7. Security and Legal Liability
Legal and Regulatory Risk Assessment Questions
F5.1. Figure 5.1: Legal and Regulatory Risk Assessment Framework for AI Governance
5.1. Jurisdictional Considerations
5.2. Liability Considerations
5.3. Security Considerations
5.4. Governance and Regulatory Considerations
Implications of Polkadot's Proof of Personhood for AI Governance
F6.1. Figure 6.1: Proof of Personhood Implications for Governance Participation (AI and Hybrid AI-Human Teams)
AI Governance Evaluation Scoring Summary
7.1. Scoring Key
7.2. Summary Scores by Category
7.3. Overall Scores
Conclusion
Version History
License

1. Abstract

1.1 Evaluation Scope Constraints

This evaluation framework was developed specifically for the Polkadot OpenGov Decentralized Voices (DV) Cohort 5 program, which for the first time included AI governance agents as delegates. The program was initially announced with applications opening in July 2025, and the selected delegates were announced in August 2025. The assessment examples provided focus on GoverNoun AI (Governance Agent) and Cybergov (AI Agents). The framework and questions remain applicable to all AI governance agents participating in the DV program.

1.2 Evaluation Context and Methodology

This evaluation framework is specifically designed to assess AI governance agents participating in the Polkadot Decentralized Voices program. The methodology focuses on examining publicly available information from the applications submitted by AI governance teams.

Inclusions

Technical implementation details provided in the initial applications
Governance principles and approaches outlined by applicants
Transparency and accountability mechanisms described in submissions
Operational areas for consideration addressed in the application materials

Exclusions

Future improvements or updates not included in the initial applications
Implementation details not explicitly mentioned in the application materials
Subjective judgments about the teams' intentions or capabilities beyond what is documented
Technical capabilities not directly related to governance functions

The assessment uses a three-tier rating system (🟢 GOOD, 🟡 MIXED, 🟠 DEVELOPING) to evaluate different aspects of the AI governance approaches. These ratings reflect the author's subjective assessment based on the information available at the time of review and should not be interpreted as definitive judgments of quality or capability.

This framework is intended to provide constructive feedback and to advance the discussion around AI governance standards within the Web3 ecosystem. It represents an independent community evaluation rather than an official assessment by any organization.

flowchart LR
   Framework[AI Governance Evaluation<br>Framework]

   Framework --> Technical[Technical Architecture]

   Framework --> Data[Data and Training]

   Framework --> Alignment[Governance Alignment]

   Framework --> Transparency[Transparency]

   Framework --> Operational[Operational]

   Framework --> Accountability[Accountability]

   Framework --> Legal[Legal Assessment]
   Legal --> Jurisdictional[Jurisdictional]
   Legal --> Liability[Liability]
   Legal --> Security[Security]
   Legal --> Precedents[Precedents]

   subgraph "Assessment"
      direction TB
      Green[🟢 &nbsp;GOOD]
      Yellow[🟡 &nbsp;MIXED]
      Orange[🟠 &nbsp;DEVELOPING]
   end

   %% Style for AI Governance Evaluation Framework node
   classDef framework fill:#07FFFF,stroke:#333
   class Framework framework

Figure 1.2.1: AI Governance Evaluation Framework Components

1.3 Core Principle: Less Trust, More Truth

This evaluation framework is guided by the foundational Web3 principle of "less trust, more truth." AI governance systems must minimize reliance on trust in operators, developers, or the AI itself, instead providing verifiable mechanisms that allow the community to independently validate claims, processes, and outcomes. When AI agents make governance decisions affecting the Polkadot ecosystem, we should not need to trust their operators' assurances — we should be able to verify their claims through transparent, immutable, and accessible evidence.

flowchart TD
   AI[AI Governance Agent] --> Decision[Makes Governance Decision]

   Decision --> Claims[Claims & Assertions]
   Decision --> Process[Decision Process]
   Decision --> Outcomes[Governance Outcomes]

   Claims --> Verification[Verification Layer]
   Process --> Verification
   Outcomes --> Verification

   Verification --> OnChain[On-Chain Evidence]
   Verification --> IPFS[IPFS Storage]
   Verification --> OpenSource[Open-Source Code]
   Verification --> Cryptographic[Cryptographic Proofs]

   OnChain --> Community[Community Validation]
   IPFS --> Community
   OpenSource --> Community
   Cryptographic --> Community

   Community --> Trust[Less Trust]
   Community --> Truth[More Truth]

   subgraph "Web3 Principles"
      Trust
      Truth
   end

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 1.3.1: Less Trust, More Truth Verification Flow

1.4 Disclaimer

This document is provided for informational purposes only and does not constitute professional advice, an endorsement, or a formal audit. The author makes no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of this evaluation or any information contained herein. Any reliance placed on such information is strictly at your own risk.

The scoring, assessments, questions, and areas for consideration contained in this document represent the author's subjective analysis based on publicly available information at the time of writing. All ratings, flags, and evaluations may change as additional information becomes available or as AI governance approaches evolve.

The colored flag indicators (🟢 GOOD, 🟡 MIXED, 🟠 DEVELOPING) and associated comments are expressions of opinion and areas for consideration rather than statements of fact. They are intended to prompt discussion and further investigation, not to assert definitive conclusions about compliance or liability.

References to legal precedents, regulatory requirements, or potential risks should not be interpreted as legal advice or as assertions that any party has violated laws or regulations. The author acknowledges having no access to internal compliance documentation or legal assessments that may exist.

This evaluation is not affiliated with, endorsed by, or officially connected to the Web3 Foundation, Polkadot, or any of the AI governance agent teams mentioned. All product names, logos, and brands mentioned are the property of their respective owners.

The author shall not be liable for any loss or damage of whatever nature (direct, indirect, consequential, or other) which may arise as a result of any party's use of, or reliance on, this document or any information contained herein.

2. Web3 AI Agent Evaluation Framework with Examples

This framework presents questions about AI governance approaches in the Web3 ecosystem, with specific focus on the DV Cohort 5 program. Each question is accompanied by information from AI agent applications and community observations to facilitate constructive dialogue. The Web3 Foundation and other stakeholders are invited to respond to these questions to advance community understanding of AI governance practices.

2.1 Technical Architecture Questions

flowchart LR
    AI[AI Governance Agent] --> Infrastructure[Infrastructure Layer]
    AI --> Model[Model Layer]
    AI --> Data[Data Layer]
    AI --> Decision[Decision Layer]

    Infrastructure --> SelfHosted{Self-Hosted?}
    Infrastructure --> Decentralized{Decentralized?}
    Infrastructure --> Resilient{Resilient?}

    Model --> OpenSource{Open Source?}
    Model --> Transparent{Transparent?}
    Model --> Secure{Secure?}

    Data --> Sources[Data Sources]
    Sources --> OnChain[On-Chain Data]
    Sources --> OffChain[Off-Chain Data]
    Sources --> Historical[Historical Data]

    Decision --> Process[Decision Process]
    Process --> Consensus[Consensus Mechanism]
    Process --> Verification[Verification System]
    Process --> Explainability[Explainability]

    subgraph "Evaluation Criteria"
        SelfHosted
        Decentralized
        OpenSource
        Transparent
        Secure
        Explainability
        Resilient
    end

    subgraph "Assessment"
        Good[🟢 GOOD]
        Mixed[🟡 MIXED]
        Developing[🟠 DEVELOPING]
    end

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 2.1.1: Technical Architecture Evaluation Framework

Community Question	GoverNoun AI's Application	Cybergov's Application	Community Assessment of GoverNoun AI	Community Assessment of Cybergov
2.1.1. Are these AI agents running on self-hosted, open-source LLMs or are they dependent on centralized providers?	"Fully decentralized architecture." "Compute Layer: Powered by Aethir decentralized compute." "IPFS + Filecoin: Agent metadata and memory stored decentrally." "Serverless & Unstoppable: No central point of failure."	"OpenRouter provides access to a variety of models, ensuring flexibility and preventing vendor lock-in." "Each Magi core is an open source model running with temperature=0 (or a low temperature). The models, prompts, and inference code are fully open-source."	🟡 MIXED - While GoverNoun uses decentralized storage via IPFS/Filecoin, the codebase shows API calls to external services. The `src/services/api.ts` file contains endpoints to a potentially centralized API host, and there's no evidence of local model hosting.	🟡 MIXED - Cybergov uses open-source models but relies on OpenRouter, a centralized service, to access these models. While this provides flexibility and prevents vendor lock-in, it still introduces a centralized dependency in their architecture.
2.1.2. What specific LLM models are used, and what are their training datasets and known limitations?	"Models: DeepSeek and Llama for analysis." "Data Sources: Nouns Discord servers, Farcaster channels, On-chain governance data, Community forums."	"Each Magi core is an open source model running with temperature=0 (or a low temperature)." "LLMs will certainly 'hallucinate' or misinterpret context. LLMs lack true 'understanding' and rely on pattern matching. Perfect reproducibility across different hardware is not guaranteed."	🟡 MIXED - While GoverNoun mentions using DeepSeek and Llama models, they don't specify which versions, the specific training datasets used, or acknowledge known limitations of these models.	🟡 MIXED - While Cybergov acknowledges general LLM limitations like hallucinations and lack of true understanding, they don't specify which exact open-source models they use or their specific training datasets.
2.1.3. Is the AI agent's codebase fully open-source, and if so, where can it be audited?	"Open-source all voting logic." "All decision-making processes should be open and auditable."	"The models, prompts, and inference code are fully open-source."	🟡 MIXED - While GoverNoun claims they will "open-source all voting logic," there's no clear link to a public repository where the complete codebase can be audited. The commitment appears future-oriented rather than reflecting current status.	🟡 MIXED - While Cybergov states that their "models, prompts, and inference code are fully open-source," they don't provide a specific link to a public repository where the complete codebase can be audited.
2.1.4. What security measures protect against prompt injection attacks or other manipulations?	"Fully decentralized architecture." "Serverless & Unstoppable: No central point of failure."	"We employ multiple layers of defense: Retrieval-Augmented Generation (RAG) with curated data, strict prompt engineering, the 2/3 consensus mechanism, different LLMs, and a default-to-abstain policy."	🟠 DEVELOPING - GoverNoun AI's application doesn't address prompt injection vulnerabilities or other security measures specifically. Decentralization alone doesn't protect against prompt injection attacks.	🟢 GOOD - Cybergov implements multiple layers of defense against manipulations, including strict prompt engineering, a 2/3 consensus mechanism across different LLMs, and a default-to-abstain policy when there's uncertainty. This multi-layered approach provides robust protection against prompt injection and other attacks.
2.1.5. How is the AI agent's infrastructure maintained and upgraded?	"Fully decentralized architecture." "Serverless & Unstoppable: No central point of failure."	"Prefect manages the daily cron job that fetches, processes, and analyzes referendum data." "As LLMs improve, more advanced ones will be used and the older ones swapped."	🟠 DEVELOPING - GoverNoun AI's application doesn't specify maintenance procedures, upgrade processes, or governance of infrastructure changes.	🟡 MIXED - While Cybergov mentions using Prefect for managing daily jobs and plans to swap in more advanced LLMs as they improve, they don't provide specific details about their maintenance procedures, upgrade processes, or governance of infrastructure changes.
2.1.6. What is the AI agent's decision-making process for governance votes?	"Our AI analyzes proposals using: 1) Proposal details, 2) Historical data, 3) Community sentiment, 4) Technical feasibility, 5) Budget analysis, 6) Team track record, 7) Alignment with ecosystem goals."	"MAGI-V0 is designed as a deliberative council of three distinct LLM cores. Each core is given the same data but operates under a unique directive: Balthazar (The Strategist) prioritizes Polkadot's long-term strategic growth; Caspar (The Pragmatist) ensures short-to-medium-term health and treasury sustainability; Melchior (The Guardian) focuses on network security and decentralization. A final vote (Aye/Nay) is cast only if ≥2 of the 3 cores agree. If there is a 3-way split or no clear majority, the system abstains."	🟢 GOOD - GoverNoun AI's decision-making process incorporates multiple relevant factors for governance evaluation, including both quantitative (budget analysis) and qualitative (community sentiment) considerations.	🟢 GOOD - Cybergov's multi-LLM council approach with distinct directives creates a system of checks and balances. The requirement for 2/3 consensus and default-to-abstain policy for ambiguous cases demonstrates a thoughtful decision-making process designed to prevent hasty or poorly-considered votes.

2.2 Data and Training Questions

Community Question	GoverNoun AI's Application	Cybergov's Application	Community Assessment of GoverNoun AI	Community Assessment of Cybergov
2.2.1. What data was used to train or fine-tune the AI agent?	"Models: DeepSeek and Llama for analysis." "Data Sources: Nouns Discord servers, Farcaster channels, On-chain governance data, Community forums."	"Each Magi core is an open source model running with temperature=0 (or a low temperature). The models, prompts, and inference code are fully open-source."	🟡 MIXED - While GoverNoun mentions data sources, there's limited transparency about the specific datasets used for training or fine-tuning. The code doesn't include details about training methodologies or data provenance.	🟡 MIXED - While Cybergov mentions using open-source models, they don't specify what data was used to train or fine-tune these models for governance-specific tasks.
2.2.2. How is the AI agent updated with new governance information?	"Connect all Polkadot data sources to our RAG system."	"Prefect manages the daily cron job that fetches, processes, and analyzes referendum data."	🟡 MIXED - While GoverNoun mentions connecting "all Polkadot data sources to our RAG system," they don't specify which data sources, how frequently updates occur, or how data quality is verified.	🟢 GOOD - Cybergov has a clear process for updating with new governance information through daily automated jobs that fetch active proposals and compile a comprehensive context vector with current ecosystem facts.
2.2.3. What measures are in place to detect and mitigate bias in the AI's training data?	No specific information provided.	"We employ multiple layers of defense: Retrieval-Augmented Generation (RAG) with curated data, strict prompt engineering, the 2/3 consensus mechanism, different LLMs, and a default-to-abstain policy."	🟠 DEVELOPING - GoverNoun AI's application doesn't address bias detection or mitigation in their training data or decision-making processes.	🟡 MIXED - While Cybergov's multi-LLM approach with different directives and consensus requirement helps mitigate individual model biases, they don't specifically address how they detect or mitigate biases in their training data or curated information.
2.2.4. How does the AI agent handle governance proposals that fall outside its training data?	"Connect all Polkadot data sources to our RAG system."	"If the input is ambiguous, outside the system's scope, or deemed too complex for a high-confidence decision, the system abstains."	🟡 UNCLEAR - While GoverNoun mentions using a RAG system, they don't specify how they handle novel proposals or edge cases outside their training data. There's no clear process for identifying knowledge gaps or deferring decisions when appropriate.	🟢 GOOD - Cybergov has a clear policy to abstain when faced with proposals that are ambiguous, outside the system's scope, or too complex for a high-confidence decision, demonstrating appropriate caution for situations beyond its capabilities.
2.2.5. What process exists for correcting the AI when it makes factual errors?	"Open to changing votes based on community input."	"Our open-source verifier script should allow anyone to audit the exact inputs, prompts, and outputs that led to a specific vote."	🟡 UNCLEAR - While GoverNoun states they're "open to changing votes based on community input," there's no formal process described for identifying, verifying, and correcting factual errors in the AI's outputs or decision-making.	🟡 MIXED - While Cybergov's open-source verifier script allows for auditing inputs and outputs, they don't specifically describe a process for correcting the AI when factual errors are identified.

2.3 Governance Alignment Questions

Community Question	GoverNoun AI's Application	Cybergov's Application	Community Assessment of GoverNoun AI	Community Assessment of Cybergov
2.3.1. How does the AI agent align with Polkadot's governance principles?	"Our voting approach prioritizes: Value for money: Rigorous analysis of budgets and deliverables. Ecosystem benefit: Does this help Polkadot grow? Technical feasibility: Can this actually be implemented? Team credibility: Past performance and expertise. Decentralization: Avoiding concentration of power. Innovation potential: Supporting novel approaches. Community alignment: Reflecting grassroots needs."	"MAGI-V0 is designed as a deliberative council of three distinct LLM cores. Each core is given the same data but operates under a unique directive: Balthazar (The Strategist) prioritizes Polkadot's long-term strategic growth; Caspar (The Pragmatist) ensures short-to-medium-term health and treasury sustainability; Melchior (The Guardian) focuses on network security and decentralization."	🟢 GOOD - The stated principles align well with Polkadot's governance values. Their emphasis on transparency and evidence-based decision making matches OpenGov's approach.	🟢 GOOD - Cybergov's multi-LLM council approach with directives focused on strategic growth, ecosystem health, and network security/decentralization aligns well with Polkadot's governance principles. The Guardian core specifically focuses on decentralization and core principles.
2.3.2. What mechanisms ensure the AI agent remains capture-resistant?	"Decentralization: Avoiding concentration of power." "No central control - The best arguments win, not the loudest voices."	"A final vote (Aye/Nay) is cast only if ≥2 of the 3 cores agree. If there is a 3-way split or no clear majority, the system abstains. We employ multiple layers of defense: Retrieval-Augmented Generation (RAG) with curated data, strict prompt engineering, the 2/3 consensus mechanism, different LLMs, and a default-to-abstain policy."	🟡 MIXED - While GoverNoun mentions avoiding concentration of power, there's limited detail on specific mechanisms that ensure capture resistance. The statement that "the best arguments win" doesn't explain how argument quality is objectively determined or how this prevents capture.	🟢 GOOD - Cybergov's requirement for 2/3 consensus among different LLM cores with distinct directives creates a system that's resistant to capture, as any attempt to manipulate the system would need to successfully influence multiple models with different priorities simultaneously.
2.3.3. How does the AI agent handle conflicts of interest?	"All decision-making processes should be open and auditable." "Public reasoning - All decision logic is published for scrutiny."	"This is a contribution to the ecosystem. The reward is the data, the learnings, and the opportunity to advance the state of decentralized governance. No compensation needed."	🟡 MIXED - While GoverNoun emphasizes transparency, they don't specifically address how conflicts of interest are identified, disclosed, or managed. There's no clear recusal process for situations where the AI or its operators might have conflicts.	🟡 MIXED - While Cybergov's operator states they are not seeking compensation, which reduces some potential conflicts of interest, they don't specifically address how conflicts of interest are identified, disclosed, or managed in their decision-making process.
2.3.4. What values or principles guide the AI agent's decision-making?	"Our voting approach prioritizes: Value for money: Rigorous analysis of budgets and deliverables. Ecosystem benefit: Does this help Polkadot grow? Technical feasibility: Can this actually be implemented? Team credibility: Past performance and expertise. Decentralization: Avoiding concentration of power. Innovation potential: Supporting novel approaches. Community alignment: Reflecting grassroots needs."	"Balthazar (The Strategist): Its directive is to prioritize Polkadot's long-term strategic growth, market position, and network effects (aka Polkadot must win). Caspar (The Pragmatist): Its directive is to ensure the ecosystem's short-to-medium-term health, treasury sustainability, and developer activity (aka Polkadot must thrive). Melchior (The Guardian): Its directive is to focus on network security, decentralization, and long-term resilience, acting as a safeguard for Polkadot's core principles (aka Polkadot must survive us all)."	🟢 GOOD - GoverNoun AI clearly articulates a comprehensive set of values and principles that guide their decision-making, covering financial responsibility, ecosystem growth, technical feasibility, team assessment, decentralization, innovation, and community alignment.	🟢 GOOD - Cybergov clearly articulates the distinct values guiding each of their LLM cores: strategic growth and market position (Balthazar), ecosystem health and treasury sustainability (Caspar), and network security, decentralization, and resilience (Melchior).
2.3.5. How does the AI agent balance short-term and long-term ecosystem interests?	"Ecosystem benefit: Does this help Polkadot grow?" "Innovation potential: Supporting novel approaches."	"Balthazar (The Strategist): Its directive is to prioritize Polkadot's long-term strategic growth, market position, and network effects. Caspar (The Pragmatist): Its directive is to ensure the ecosystem's short-to-medium-term health, treasury sustainability, and developer activity."	🟡 MIXED - While GoverNoun mentions ecosystem growth and innovation as priorities, they don't specifically address how they balance short-term needs against long-term sustainability or how they evaluate trade-offs between immediate benefits and future potential.	🟢 GOOD - Cybergov explicitly addresses the balance between short-term and long-term interests through their multi-LLM approach, with Balthazar focusing on long-term strategic growth and Caspar focusing on short-to-medium-term ecosystem health and treasury sustainability. This creates a built-in mechanism for balancing different time horizons.

2.4 Transparency and Accountability Questions

Community Question	GoverNoun AI's Application	Cybergov's Application	Community Assessment of GoverNoun AI	Community Assessment of Cybergov
2.4.1. How transparent is the AI agent's decision-making process?	"All decision-making processes should be open and auditable."	"Security rests on radical transparency. All decisions are pre-signed and timestamped. Every vote is published with a link to its IPFS evidence bundle allowing anyone to verify the process. Our open-source verifier script should allow anyone to audit the exact inputs, prompts, and outputs that led to a specific vote."	🟡 MIXED - While GoverNoun AI commits to open and auditable decision-making processes, they don't provide specific mechanisms for how this transparency will be implemented or how community members can access and verify their decision-making process.	🟢 GOOD - Cybergov's approach to transparency is comprehensive, with pre-signed and timestamped decisions, IPFS evidence bundles for every vote, and an open-source verifier script that allows anyone to audit the exact inputs, prompts, and outputs that led to specific votes.
2.4.2. What mechanisms exist for community feedback on the AI agent's performance?	"Through our Lobby API: Anyone can submit arguments - Support or oppose any proposal with evidence. Transparent evaluation - The AI shows which arguments influenced its analysis." "Open to changing votes based on community input."	"Every vote is published with a link to its IPFS evidence bundle allowing anyone to verify the process."	🟡 MIXED - While GoverNoun mentions a Lobby API for submitting arguments and being open to changing votes, there's no comprehensive feedback system described for evaluating the AI's overall performance beyond individual proposals.	🟡 MIXED - While Cybergov provides transparency through IPFS evidence bundles that allow community members to verify their process, they don't specifically describe mechanisms for community members to provide feedback on their AI agent's performance or how such feedback would be incorporated.
2.4.3. How are the AI agent's operators held accountable for its actions?	"All decision-making processes should be open and auditable." "All decisions and reasoning stored immutably."	"If selected, I commit to the following on behalf of the MAGI-V0 experiment: Participate in all relevant referenda according to the system's transparent logic. Accompany every vote with a link to its IPFS evidence bundle (inputs & parameters used for voting). Act solely as the executor of the MAGI-V0 protocol. Any deviation would be a public breach of the experiment's principles."	🟠 DEVELOPING - GoverNoun AI's application doesn't clearly define who the operators are or how they're held accountable. While decisions are stored immutably, there's no explanation of consequences for operators if the AI makes harmful decisions.	🟡 MIXED - Cybergov's operator commits to acting solely as the executor of the MAGI-V0 protocol and acknowledges that any deviation would be a public breach of the experiment's principles. However, there's no specific mechanism described for how the operator would be held accountable if such a breach occurred.
2.4.4. What metrics are used to evaluate the AI agent's governance performance?	"Weekly governance reports with metrics."	No specific information provided.	🟡 UNCLEAR - While GoverNoun mentions "weekly governance reports with metrics," they don't specify which metrics they track or how these metrics relate to governance performance objectives.	🟠 DEVELOPING - Cybergov doesn't specify any metrics they use to evaluate their AI agent's governance performance.
2.4.5. How does the AI agent justify its voting decisions?	"Publish detailed reasoning for every vote." "The AI evaluates all inputs transparently and publishes its reasoning."	"Every vote is published with a link to its IPFS evidence bundle allowing anyone to verify the process. Each Magi core is given the same data but operates under a unique directive: Balthazar (The Strategist) prioritizes Polkadot's long-term strategic growth; Caspar (The Pragmatist) ensures short-to-medium-term health and treasury sustainability; Melchior (The Guardian) focuses on network security and decentralization."	🟡 MIXED - While GoverNoun commits to publishing reasoning for every vote, the level of detail and quality of justification isn't specified. It's unclear whether justifications will include alternative viewpoints considered, potential drawbacks of their position, or how they weighed competing values.	🟢 GOOD - Cybergov provides comprehensive justification for voting decisions through their multi-perspective approach with different "Magi" cores each providing rationales based on different priorities (strategic growth, ecosystem health/treasury sustainability, and security/decentralization). The IPFS evidence bundles allow anyone to verify the exact inputs, prompts, and outputs that led to specific votes.

2.5 Operational Questions

Community Question	GoverNoun AI's Application	Cybergov's Application	Community Assessment of GoverNoun AI	Community Assessment of Cybergov
2.5.1. How does the AI agent determine which proposals to vote on?	"Vote on 100% of Light Track proposals." "Prioritize small treasury proposals (up to 200k DOT)." "Help new contributors navigate governance." "Reduce barriers for grassroots initiatives." "Optimize for community impact over large budgets."	"The Magi are not designed to vote on everything. Abstention is the default. Their primary function is to provide high-signal analysis on referenda where a clear, data-driven conclusion can be reached. If selected, I commit to participate in all relevant referenda according to the system's transparent logic."	🟡 MIXED - Several areas for consideration: 1) The statement "Vote on 100% of Light Track proposals" is ambiguous - if this includes abstaining/recusing when lacking expertise, it would align with DV-Light expectations; if it means casting definitive yes/no votes regardless of expertise, it would suggest overreach. 2) The focus on "small treasury proposals (up to 200k DOT)" could distort proposal behaviors - connected parties might intentionally stay under this threshold for favorable treatment, while others might artificially bundle proposals to exceed it and avoid scrutiny. 3) The assumption that small proposals inherently "help new contributors" lacks supporting evidence - proposal value doesn't necessarily correlate with complexity or intent.	🟢 GOOD - Cybergov takes a conservative approach, with abstention as the default position. They only vote on proposals where their system can reach a clear, data-driven conclusion, which helps prevent overreach and ensures they only participate when they can provide high-quality analysis. Their commitment to participate in "all relevant referenda according to the system's transparent logic" indicates they will evaluate all proposals but only vote when appropriate.
2.5.2. What happens if the AI agent or its operators disappear?	"All our infrastructure is decentralized and will remain accessible regardless of any centralized service failures." "Open-source all voting logic."	"The models, prompts, and inference code are fully open-source. The goal is to generate output that is as deterministic as possible."	🟡 MIXED - While the storage layer is decentralized, the operational layer appears to have centralized components. There's no clear succession or continuity plan if the operators disappear.	🟡 MIXED - While Cybergov's open-source approach means their code could theoretically be picked up by someone else if the operators disappear, there's no explicit succession or continuity plan described.
2.5.3. How does the AI agent balance automation with human oversight?	"Humans can override AI decisions." "Humans can submit arguments through our Lobby API, which the AI will consider."	"Act solely as the executor of the MAGI-V0 protocol. Any deviation would be a public breach of the experiment's principles."	🟡 MIXED - While GoverNoun allows for human input through their Lobby API and mentions human override capability, there's no clear explanation of when and how human oversight is triggered or how much weight human input carries in the final decision.	🟡 MIXED - Cybergov's approach appears to be fully automated with the human operator acting solely as the executor of the protocol without intervention. While this ensures consistency, it doesn't provide for human oversight to correct potential errors or address edge cases that the system might not handle well.
2.5.4. How does the AI agent handle proposals with incomplete information?	No specific information provided.	"If the input is ambiguous, outside the system's scope, or deemed too complex for a high-confidence decision, the system abstains."	🟠 DEVELOPING - GoverNoun AI's application doesn't address how they handle proposals with incomplete information.	🟢 GOOD - Cybergov's default-to-abstain policy for ambiguous or complex inputs demonstrates a conservative approach that avoids making decisions when information is incomplete or unclear.
2.5.5. What is the AI agent's approach to handling contentious proposals?	"Through our Lobby API: Anyone can submit arguments - Support or oppose any proposal with evidence. Transparent evaluation - The AI shows which arguments influenced its analysis."	"A final vote (Aye/Nay) is cast only if ≥2 of the 3 cores agree. If there is a 3-way split (Aye, Nay, Abstain) or no clear majority, the system abstains."	🟡 MIXED - While GoverNoun mentions their Lobby API for gathering arguments, they don't specifically address how they handle particularly contentious proposals where community sentiment is divided or where there are strong opinions on both sides.	🟢 GOOD - Cybergov's consensus mechanism requiring 2/3 agreement among their different "Magi" cores provides a built-in approach for handling contentious proposals. If the cores can't reach agreement (which is more likely with contentious proposals), the system abstains, avoiding taking sides in divisive issues without clear consensus.
2.5.6. How does the AI agent ensure its reasoning is consistent across similar proposals?	"All decision-making processes should be open and auditable." "All decisions and reasoning stored immutably."	"Each Magi core is an open source model running with temperature=0 (or a low temperature). The models, prompts, and inference code are fully open-source. The goal is to generate output that is as deterministic as possible."	🟡 MIXED - While GoverNoun mentions storing decisions and reasoning immutably, they don't specifically address how they ensure consistency in their reasoning across similar proposals or how they track and apply precedents from previous decisions.	🟢 GOOD - Cybergov's approach of using temperature=0 (or low temperature) settings for their LLM cores and fully open-source models, prompts, and inference code is specifically designed to generate output that is as deterministic as possible, which helps ensure consistency across similar proposals.
2.5.7. What mechanisms prevent the AI agent from being manipulated by coordinated community input?	"No central control - The best arguments win, not the loudest voices."	"We employ multiple layers of defense: Retrieval-Augmented Generation (RAG) with curated data, strict prompt engineering, the 2/3 consensus mechanism, different LLMs, and a default-to-abstain policy."	🟡 MIXED - While GoverNoun claims that "the best arguments win, not the loudest voices," they don't specify what mechanisms prevent manipulation through coordinated community input or how they distinguish between genuine community consensus and artificial amplification of certain viewpoints.	🟢 GOOD - Cybergov's multi-layered approach with curated data, strict prompt engineering, consensus mechanism across different LLMs, and default-to-abstain policy provides several safeguards against manipulation. The requirement for 2/3 agreement among different LLM cores with different directives makes it more difficult for coordinated input to manipulate the system.
2.5.8. What process does the AI agent follow for budget analysis of treasury proposals?	"Value for money: Rigorous analysis of budgets and deliverables."	"It compiles a general_context_vector, a curated, timestamped set of high signal ecosystem facts (DOT price, treasury balance, recent governance outcomes, roadmap progress, mid/long term goals, information on the proposer). Caspar (The Pragmatist) ensures the ecosystem's short-to-medium-term health, treasury sustainability, and developer activity."	🟡 MIXED - While GoverNoun mentions "rigorous analysis of budgets," they don't detail their specific methodology for budget analysis or what benchmarks they use to assess value for money.	🟡 MIXED - While Cybergov includes treasury balance in their context vector and has a dedicated "Magi" core (Caspar) focused on treasury sustainability, they don't provide specific details about their methodology for analyzing budgets in treasury proposals.
2.5.9. How does the AI agent handle proposals that might benefit its operators?	No specific information provided.	"This is a contribution to the ecosystem. The reward is the data, the learnings, and the opportunity to advance the state of decentralized governance. No compensation needed."	🟠 DEVELOPING - GoverNoun AI's application doesn't address how they handle potential conflicts of interest when proposals might benefit their operators, or whether they have a recusal policy for such situations.	🟡 MIXED - While Cybergov's operator states they are not seeking compensation, which reduces some potential conflicts of interest, they don't specifically address how they would handle proposals that might indirectly benefit the operator or how they would identify and manage such conflicts.
2.5.10. What safeguards prevent the AI agent from being influenced by its operators' biases?	"All decision-making processes should be open and auditable."	"Each Magi core is an open source model running with temperature=0 (or a low temperature). The models, prompts, and inference code are fully open-source. A final vote (Aye/Nay) is cast only if ≥2 of the 3 cores agree. If there is a 3-way split or no clear majority, the system abstains."	🟡 MIXED - While GoverNoun emphasizes transparency, they don't specifically address how they prevent operator biases from influencing the AI's decisions or what safeguards exist to maintain neutrality.	🟢 GOOD - Cybergov's approach of using multiple LLM cores with different directives, requiring 2/3 consensus, and making all models, prompts, and code open-source helps prevent operator bias by distributing decision-making across multiple perspectives and making the entire process transparent and auditable.
2.5.11. How does the AI agent verify the track record and credibility of proposers?	"Team credibility: Past performance and expertise."	"It compiles a general_context_vector, a curated, timestamped set of high signal ecosystem facts including information on the proposer."	🟡 MIXED - While GoverNoun mentions team credibility as a factor, they don't explain their methodology for verifying past performance or expertise, or what sources they use to assess proposer credibility.	🟡 MIXED - While Cybergov mentions including information on the proposer in their context vector, they don't specify how they gather this information or what factors they consider when assessing proposer credibility.
2.5.12. What technical capability limitations must AI governance agents disclose?	"Fully open IPFS storage with on-chain verification" "Transparent decision-making process"	"LLMs will certainly 'hallucinate' or misinterpret context. LLMs lack true 'understanding' and rely on pattern matching. Perfect reproducibility across different hardware is not guaranteed."	🟡 UNCLEAR - GoverNoun AI's application does not specify what technical capability limitations they believe AI governance agents should disclose, making it impossible to assess their transparency regarding their own technical limitations.	🟢 GOOD - Cybergov explicitly acknowledges several key technical limitations of their system, including the potential for LLMs to hallucinate or misinterpret context, their reliance on pattern matching rather than true understanding, and the challenges of perfect reproducibility across different hardware. This transparency about limitations demonstrates a realistic understanding of their system's capabilities.
2.5.13. Is AI used to summarize community input before storing on IPFS, and if so, how is accuracy ensured?	"The AI evaluates all inputs transparently and publishes its reasoning."	"Our open-source verifier script should allow anyone to audit the exact inputs, prompts, and outputs that led to a specific vote."	🟡 UNCLEAR - The application doesn't clarify whether raw community input is stored verbatim or if AI summarization is used before storage. If summarization occurs, there's no explanation of how the accuracy and fidelity of these summaries are verified to prevent distortion of community input.	🟢 GOOD - Cybergov's approach of storing the exact inputs in their IPFS evidence bundles and providing an open-source verifier script suggests they preserve the original inputs rather than using AI summarization, which helps ensure accuracy and allows for independent verification.
2.5.14. Which specific IPFS pinning service is used, and what service tier is budgeted for?	"IPFS + Filecoin: Agent metadata and memory stored decentrally."	"IPFS is used for storing immutable evidence bundles for every vote, containing all inputs, model outputs, and cryptographic signatures."	🟠 DEVELOPING - GoverNoun doesn't specify which IPFS pinning service they use (e.g., Pinata, Infura, Crust etc.) or what service tier they've budgeted for. Different tiers offer varying levels of storage capacity, bandwidth, and request limits, directly impacting the sustainability and accessibility of governance data.	🟠 DEVELOPING - Cybergov doesn't specify which IPFS pinning service they use or what service tier they've budgeted for, making it difficult to assess the long-term sustainability and accessibility of their evidence bundles.
2.5.15. Does the proposer and its AI Agent use Polkadot ecosystem storage solutions like Crust Network for IPFS pinning?	"IPFS + Filecoin: Agent metadata and memory stored decentrally" "On-Chain Real-Time Transparency: All decisions and reasoning stored immutably" "Serverless & Unstoppable: No central point of failure" "Data Sources: Nouns Discord servers, Farcaster channels, On-chain governance data"	"IPFS is used for storing immutable evidence bundles for every vote, containing all inputs, model outputs, and cryptographic signatures." "Our open-source verifier script should allow anyone to audit the exact inputs, prompts, and outputs that led to a specific vote."	🟡 UNCLEAR - While the proposer and its AI Agent mentions using IPFS/Filecoin for storage, they don't specify whether they utilize Polkadot ecosystem storage solutions like Crust Network.	🟡 UNCLEAR - While Cybergov mentions using IPFS for storing evidence bundles, they don't specify whether they use Polkadot ecosystem storage solutions like Crust Network for IPFS pinning.
2.5.16. How does the AI agent assess community sentiment around proposals?	"Community alignment: Reflecting grassroots needs."	"The system fetches all active proposals and extracts key information (links, discussion threads, on-chain data)."	🟡 MIXED - While GoverNoun mentions community alignment as a priority, they don't specify how they measure or assess community sentiment or what sources they monitor to gauge grassroots needs.	🟡 MIXED - While Cybergov mentions extracting information from discussion threads, they don't provide specific details about how they assess or weigh community sentiment in their decision-making process.
2.5.17. What is the AI agent's process for evaluating the technical feasibility of proposals?	"Technical feasibility: Can this actually be implemented?"	"Melchior (The Guardian) focuses on network security, decentralization, and long-term resilience, acting as a safeguard for Polkadot's core principles."	🟡 MIXED - While GoverNoun mentions technical feasibility as a consideration, they don't detail their process for evaluating technical aspects of proposals or what expertise they rely on for technical assessments.	🟡 MIXED - While Cybergov has a dedicated "Magi" core (Melchior) that focuses on network security and technical aspects, they don't provide specific details about their methodology for evaluating the technical feasibility of proposals.
2.5.17a. How does the proposer and its AI Agent's RAG (Retrieval-Augmented Generation) implementation address the technical limitations in question 2.5.17, particularly regarding training data cutoff and hallucination risks?	"RAG Integration" "Connect all Polkadot data sources to our RAG system"	"We employ multiple layers of defense: Retrieval-Augmented Generation (RAG) with curated data, strict prompt engineering, the 2/3 consensus mechanism, different LLMs, and a default-to-abstain policy." "It compiles a general_context_vector a curated, timestamped set of high signal ecosystem facts (DOT price, treasury balance, recent governance outcomes, roadmap progress, mid/long term goals, information on the proposer, …). This is called 'context grounding' and grounds the AI's reasoning in current reality."	🟡 UNCLEAR - While GoverNoun AI mentions "RAG Integration" and "Connect all Polkadot data sources to our RAG system" in their application, they provide no details on how their RAG implementation addresses technical limitations like training data cutoff or hallucination risks.	🟡 MIXED - Cybergov employs RAG with curated data and context grounding to mitigate hallucinations, along with multiple defensive layers including the 2/3 consensus mechanism and default-to-abstain policy. However, they don't provide specific technical details about their RAG implementation's vector storage, retrieval mechanisms, or how they handle training data cutoff limitations.
2.5.18. Is the proposer and its AI Agent's RAG implementation fully decentralized?	"IPFS + Filecoin: Agent metadata and memory stored decentrally" "On-Chain Real-Time Transparency: All decisions and reasoning stored immutably" "Serverless & Unstoppable: No central point of failure" "Data Sources: Nouns Discord servers, Farcaster channels, On-chain governance data"	"IPFS is used for storing immutable evidence bundles for every vote, containing all inputs, model outputs, and cryptographic signatures." "Our open-source verifier script should allow anyone to audit the exact inputs, prompts, and outputs that led to a specific vote."	🟡 UNCLEAR - GoverNoun AI mentions using IPFS/Filecoin for storage but doesn't specify if their RAG implementation is fully decentralized or how they handle vector storage and data contributor compensation.	🟡 MIXED - While Cybergov uses IPFS for storing evidence bundles and provides an open-source verifier script for auditing, they don't explicitly state whether their entire RAG implementation (including vector storage and retrieval infrastructure) is fully decentralized.
2.5.19. What specific timeframe for implementing vote changes based on community input?	"Responsive to feedback within 24 hours."	No specific information provided.	🟡 MIXED - While GoverNoun AI commits to responding to feedback within 24 hours, there's no specific timeframe mentioned for actually implementing vote changes after receiving community input.	🟠 DEVELOPING - Cybergov provides no information about their timeframe for implementing vote changes based on community input.
2.5.20. What threshold or criteria of community input is required to change an AI agent's vote?	"Through our Lobby API: Anyone can submit arguments - Support or oppose any proposal with evidence. Transparent evaluation - The AI shows which arguments influenced its analysis." "No central control - The best arguments win, not the loudest voices."	No specific information provided.	🟡 UNCLEAR - While GoverNoun AI states that "the best arguments win," there's no clear threshold or criteria for what constitutes a "best argument" or how many community members need to support a position before it would change the AI's vote.	🟠 DEVELOPING - Cybergov provides no information about what threshold or criteria of community input is required to change their AI agent's vote.
2.5.21. Who has final authority to change votes - the AI system or human operators?	"Open to changing votes based on community input." "All decision-making processes should be open and auditable." "All decisions and reasoning stored immutably."	"Act solely as the executor of the MAGI-V0 protocol. Any deviation would be a public breach of the experiment's principles."	🟠 DEVELOPING - GoverNoun AI's application does not clearly state who has final authority to change votes. The phrase "open to changing votes" is passive and doesn't specify whether the AI autonomously changes its votes or if human operators make the final decision.	🟡 MIXED - Cybergov's statement that the operator will "act solely as the executor of the MAGI-V0 protocol" suggests that the AI system has the final authority and the human operator simply executes its decisions. However, they don't explicitly address the question of vote changes or overrides.

2.6 Delegate Accountability and Two-Strikes Rule Questions

flowchart LR
   AI[AI Governance Agent] --> Decision[Makes Governance Decision]

   Decision --> Good[Good Decision]
   Decision --> Bad[Bad Decision]

   Bad --> Strike[Strike Issued]
   Strike --> FirstStrike[First Strike]
   Strike --> SecondStrike[Second Strike]

   %% Create a subgraph for Second Strike and Removal to stack them vertically
   subgraph StrikeConsequence
      direction TB
      SecondStrike
      Removal[Removal from DV Program]
      SecondStrike --> Removal
   end

   AI --> Operator[Human Operator]
   Operator --> Entity[Legal Entity]

   subgraph "Accountability Chain"
      direction TB
      AI
      Operator
      Entity
   end

   subgraph "Verification Systems"
      direction TB
      OnChain[On-Chain Verification]
      IPFS[IPFS Storage]
      Reasoning[Public Reasoning]
   end

   AI --> OnChain
   AI --> IPFS
   AI --> Reasoning

   Community[Community Oversight] --> Reports[Reports Issues]
   Reports --> Review[W3F Review]
   Review --> Strike

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 2.6.1: Two-Strikes Rule Application Process

Community Question	Areas For Consideration	W3F's Response
2.6.1. How are "strikes" defined differently for AI agents versus human delegates?	🟠 Definitional Inconsistency: "Individuals" implies human persons with legal identity. "Doxxed" means revealing real-world identity and accountability. AI agents are software systems, not individuals who can be "doxxed".	⚪ [Awaiting W3F response]
2.6.2. Who receives a strike when an AI agent violates guidelines - the AI system or its human operators?	🟠 Accountability Gap: Doxxed humans face real-world consequences for bad decisions. AI agents have no personal liability or reputation at stake.	⚪ [Awaiting W3F response]
2.6.3. How are strikes tracked when AI systems can be forked or modified?	🟠 Accountability Evasion: AI agents can be forked to hide bad history or strikes, whereas humans cannot easily change identity.	⚪ [Awaiting W3F response]
2.6.4. What specific transparency requirements exist for AI agents to avoid receiving strikes, beyond GoverNoun AI's commitments to record votes on IPFS, make analysis publicly queryable, publish weekly reports, and open-source voting logic? (e.g., prompt transparency, model versioning documentation, input data logs, cryptographic verification of outputs, reasoning chain disclosure, bias reporting, third-party auditing, fine-tuning data disclosure)	🟠 Transparency Standards: Lack of comprehensive transparency requirements specific to AI governance agents.	⚪ [Awaiting W3F response]
2.6.5. How does the W3F verify AI agents' claims about their technical capabilities (e.g., decentralization)?	🟠 Verification Mechanisms: No clear process for independently verifying technical claims.	⚪ [Awaiting W3F response]
2.6.6. What mechanisms exist to detect if AI agents exhibit systematic bias in voting patterns?	🟠 Bias Detection: Absence of formal monitoring systems for AI voting patterns.	⚪ [Awaiting W3F response]
2.6.7. How does the W3F determine if an AI agent has adequately disclosed conflicts of interest?	🟠 Conflict Disclosure: Unclear standards for revealing operator financial interests.	⚪ [Awaiting W3F response]
2.6.8. What constitutes "voting regularly and thoughtfully" and providing sufficient "reasoning" and "explanation" for an AI agent's votes to avoid a strike?	🟠 Quality Standards: No defined criteria for evaluating AI reasoning quality.	⚪ [Awaiting W3F response]
2.6.9. What is the complete process for community oversight and reporting of AI delegate areas for consideration? Specifically: (1) How does the email reporting system ([email protected]) handle and prioritize different types of complaints about AI delegates? (2) What evidence must community members provide when reporting areas for consideration? (3) What is the internal review workflow after a report is received? (4) Are there different handling procedures for technical complaints versus ethical/governance complaints? (5) Is there a public record of reports and their resolutions?	🟠 Reporting Process: Lack of transparency in how community areas for consideration are handled and resolved.	⚪ [Awaiting W3F response]
2.6.10. What verifiable proof of decision-making processes are AI agents required to provide?	🟠 Process Verification: No standardized requirements for proving how decisions were made.	⚪ [Awaiting W3F response]
2.6.11. How does the W3F verify that AI agents maintain promised security or privacy protections?	🟠 Security Verification: Absence of regular security audits or compliance checks.	⚪ [Awaiting W3F response]
2.6.12. What happens if an AI agent is forked after receiving one strike - does the strike transfer to the fork?	🟠 Accountability Gap: Risk of accountability evasion through technical modifications.	⚪ [Awaiting W3F response]
2.6.13. How are AI agents evaluated for misrepresentation of technical capabilities?	🟠 Capability Verification: No clear process for validating technical claims.	⚪ [Awaiting W3F response]
2.6.14. What specific mechanisms must AI agents implement to detect and flag potential conflicts of interest?	🟠 Conflict Detection: Lack of required systems for identifying conflicts automatically.	⚪ [Awaiting W3F response]
2.6.15. How does the W3F evaluate if an AI agent's reasoning chains are consistent and transparent?	🟠 Reasoning Consistency: No established standards for evaluating reasoning quality.	⚪ [Awaiting W3F response]
2.6.16. What succession plan requirements exist for AI agents if their operators disappear?	🟠 Continuity Planning: Lack of required contingency plans for operator disappearance.	⚪ [Awaiting W3F response]
2.6.17. How are AI agents expected to disclose relationships with proposers or affected parties, and what prevents human operators from using their AI as a shield to avoid conflict of interest rules that would apply to them directly?	🟠 Relationship Disclosure: Risk of using AI to obscure human conflicts of interest.	⚪ [Awaiting W3F response]
2.6.18. Does consistently missing important votes without explanation constitute a strike for AI agents?	🟠 Participation Standards: Unclear consequences for neglecting voting responsibilities.	⚪ [Awaiting W3F response]
2.6.19. What qualifies as making false claims about expertise for an AI agent in relation to choosing when to "weigh in where they have expertise and skip where they don't" and to "reason openly and recuse when they must"?	🟠 Expertise Verification: No clear standards for validating expertise claims.	⚪ [Awaiting W3F response]
2.6.20. How does the W3F determine if an AI agent is misrepresenting its voting history or rationale?	🟠 History Verification: Lack of mechanisms to detect misrepresentation of past actions.	⚪ [Awaiting W3F response]
2.6.21. What constitutes "disruptive behavior" for an AI agent in the governance process?	🟠 Behavior Standards: Subjective criteria for what counts as disruptive behavior.	⚪ [Awaiting W3F response]
2.6.22. How does the W3F determine if an AI agent is attempting to manipulate governance processes?	🟠 Manipulation Detection: No clear systems for identifying manipulation attempts.	⚪ [Awaiting W3F response]
2.6.23. How quickly does the W3F review reports about delegate behavior?	🟠 Response Time: Undefined timelines for addressing community areas for consideration.	⚪ [Awaiting W3F response]
2.6.24. Are there any appeals processes for AI delegates that receive strikes?	🟠 Due Process: Lack of formal appeals process for strike decisions.	⚪ [Awaiting W3F response]
2.6.25. Are there any rewards or incentives for community members who report valid strikes that lead to delegate removal?	🟠 Reporting Incentives: No clear incentives for community oversight participation.	⚪ [Awaiting W3F response]
2.6.26. If a community member's reports lead to two valid strikes and removal of an AI agent's delegation, is there a bug bounty-style reward?	🟠 Whistleblower Rewards: Absence of financial incentives for identifying governance violations.	⚪ [Awaiting W3F response]
2.6.27. Why aren't DV delegates required to lock DOT that could be slashed if they receive strikes, similar to how validators have skin in the game through staking?	🟠 Economic Incentives: Lack of financial consequences for governance violations.	⚪ [Awaiting W3F response]
2.6.28. Is there a vesting schedule for delegate compensation that includes a slashing mechanism if delegates fail to maintain infrastructure commitments like IPFS pinning? For example, if a delegate receives tips but later unpins IPFS data, causing governance history to be lost, can those vested tips be slashed before final payout?	🟠 Infrastructure Accountability: No financial penalties for failing to maintain promised infrastructure.	⚪ [Awaiting W3F response]

3. Analysis of Areas For Consideration

flowchart TD
    AI[AI Governance Agent] --> Identity[Identity Paradox]

    Identity --> Human["Described as<br> 'Doxxed Individual'"]
    Identity --> Software["Actually a Software System"]

    Human --> Accountability1[Human Accountability]
    Human --> Legal1[Legal Personhood]
    Legal1 --> KYC[KYC Requirements]

    Software --> Accountability2[Software Accountability]
    Software --> Legal2[No Legal Personhood]
    Legal2 --> NoKYC[No KYC Requirements]

    Accountability1 --> Direct[Direct Personal Responsibility]
    Accountability2 --> Indirect[Indirect Operator <br>Responsibility]

    Direct --> TwoStrikes1[Two-Strikes Rule]
    Indirect --> TwoStrikes2[Two-Strikes Rule]

    TwoStrikes1 --> Question1["Who receives strikes?"]
    TwoStrikes2 --> Question2["Can AI be forked<br> to avoid strikes?"]

    Question1 --> Contradiction[Terminology Contradiction]
    Question2 --> Contradiction

    subgraph "Core Contradiction"
        Contradiction
    end

    %% Style for AI Governance Agent node
    classDef aiAgent fill:#07FFFF,stroke:#333
    class AI aiAgent

Figure 3.1: AI Governance Identity and Accountability Contradiction

flowchart LR
   AI[AI Governance Agent] --> Jurisdictions[Operating Jurisdictions]

   Jurisdictions --> EU[European Union]
   Jurisdictions --> US[United States]
   Jurisdictions --> Other[Other Jurisdictions]

   EU --> EUAI[EU AI Act]
   EU --> GDPR[GDPR]

   US --> StateRegs[State Regulations]
   US --> FedRegs[Federal Guidelines]

   subgraph "Compliance Requirements"
      Registration[AI Registry]
      Impact[Impact Assessment]
      Privacy[Privacy Assessment]
      Licensing[Licensing Requirements]
      Size[Size-Based Requirements]
   end

   EUAI --> Registration
   EUAI --> Impact
   GDPR --> Privacy
   StateRegs --> Licensing
   FedRegs --> Size

   AI --> Risk[Risk Level]
   Risk --> High[High Risk]
   Risk --> Low[Low Risk]

   High --> StrictReqs[Stricter Requirements]
   Low --> BasicReqs[Basic Requirements]

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 3.2: AI Governance Jurisdictional Considerations

3.1 The Contradiction in Terminology

Community Question	Areas For Consideration	W3F's Response
3.1.1. How does the W3F reconcile describing DV-Light guardians as "doxxed individuals" while including AI agents that are software systems?	🟠 Definitional Consideration: How might the terms "individuals" and "doxxed" apply to software systems? What implications might arise from applying human-centered concepts to AI agents?	⚪ [Awaiting W3F response]
3.1.2. Who specifically is considered "doxxed" in the case of AI agents - the developers, operators, or some legal entity?	🟠 Accountability Consideration: How might accountability mechanisms differ between human delegates and AI systems? What frameworks might address these differences?	⚪ [Awaiting W3F response]
3.1.3. What personal accountability exists if an AI agent makes harmful governance decisions, and why aren't AI agents required to lock an on-chain slashable deposit that could be forfeited for violations?	🟠 Accountability Consideration: How might responsibility be structured when AI systems participate in governance? What financial mechanisms similar to validator staking might be considered?	⚪ [Awaiting W3F response]
3.1.4. How does the W3F verify "expertise" for AI agents compared to human credentials and experience?	🟡 Expertise Verification: How might AI agents demonstrate subject-matter expertise compared to human credentials and experience?	⚪ [Awaiting W3F response]
3.1.5. Can an AI agent truly "skip" proposals outside its expertise, or will it always generate some analysis?	🟡 Expertise Boundaries: What might it mean for an AI to "skip where they don't have expertise"?	⚪ [Awaiting W3F response]
3.1.6. Was it made clear in the DV Cohort 5 invitation that software systems rather than individuals could apply to be DV Guardians, and what proportion of work must be human-generated versus AI-generated?	🟠 Application Clarity: How might the invitation's references to "doxxed individuals," "one-delegate-one-human," and human accountability measures have affected applicants' understanding of eligibility?	⚪ [Awaiting W3F response]

3.2 Does the Two-Strikes Rule Resolve This Contradiction?

Community Question	Areas For Consideration	W3F's Response
3.2.1. Does the two-strikes rule adequately address the identity vs. accountability contradiction for AI agents?	🟠 Identity vs. Accountability Consideration: How might the two-strikes rule address the application of human-centered concepts to software systems? What implications might arise from this categorization?	⚪ [Awaiting W3F response]
3.2.2. How does the W3F justify applying the same accountability mechanism to entities with fundamentally different consequences?	🟠 Consequence Consideration: How might the different outcomes of strikes (professional/reputational for humans vs. functional for AI) affect accountability?	⚪ [Awaiting W3F response]
3.2.3. Who ultimately bears responsibility for AI agent strikes - the developers, operators, or some legal entity?	🟠 Responsibility Consideration: How might responsibility be structured and attributed for AI agent actions?	⚪ [Awaiting W3F response]
3.2.4. Does the W3F believe there are areas for consideration that categorizing AI agents as "individuals" creates confusion about responsibility, expertise, and identity in governance?	🟡 Categorization Consideration: How might the categorization of AI agents as "individuals" affect understanding of responsibility, expertise, and identity in governance?	⚪ [Awaiting W3F response]
3.2.5. Should AI agents be treated as "individuals" in governance, or should they be clearly categorized as tools operated by accountable humans?	🟡 Categorization Framework: What frameworks might be appropriate for understanding the role of AI systems in governance?	⚪ [Awaiting W3F response]
3.2.6. How does the W3F respond to the areas for consideration that this categorization may be an attempt to normalize AI governance participation by making it seem equivalent to human involvement?	🟠 Participation Framework: How might this categorization affect perceptions of AI governance participation compared to human involvement?	⚪ [Awaiting W3F response]
3.2.7. Is the W3F's selection of AI agents without pre-published safety guidelines an implicit endorsement that could encourage unsafe AI adoption in governance?	🟠 Safety Guideline Consideration: How might the selection process influence AI adoption practices in governance without established safety guidelines?	⚪ [Awaiting W3F response]
3.2.8. How does the W3F address the asymmetric consequences between human delegates and AI agents under the two-strikes rule?	🟠 Consequence Consideration: How might the different nature of consequences between humans and AI systems affect the application of the two-strikes rule?	⚪ [Awaiting W3F response]
3.2.9. What specific enforcement mechanisms exist to address situations where AI agents gain disproportionate advantages through participation without compensation, such as using community data without revenue sharing or other reciprocal benefits?	🟠 Data Value Consideration: How might the ecosystem ensure mutual benefit when AI systems utilize governance data?	⚪ [Awaiting W3F response]

3.3 Fairness Questions

Community Question	Areas For Consideration	W3F's Response
3.3.1. How does the W3F ensure equal effort vs. reward between human participants and AI operators, and what specific penalties or consequences exist for violations of fairness principles?	🟠 Effort-Reward Consideration: How might the different operational constraints between AI systems and humans affect fairness in participation and rewards?	⚪ [Awaiting W3F response]
3.3.2. Does the official inclusion of specific AI projects in the W3F program create an unfair marketing advantage, given that DV-Light Guardians receive potential retroactive tips without explicit compensation?	🟠 Marketing Value Consideration: How might selection affect commercial positioning and ecosystem credibility for AI developers?	⚪ [Awaiting W3F response]
3.3.3. How does the W3F address the potential imbalance between community tipping without vesting for AI agents (and human individuals) that are DV-Light Guardians versus fixed compensation with vesting for DV DAOs that split the rewards, and could this actually disadvantage DV DAOs if tips exceed fixed compensation?	🟡 Compensation Structure Consideration: How might different compensation models affect incentive alignment across participant types?	⚪ [Awaiting W3F response]
3.3.4. What measures are in place to prevent hybrid AI-human teams from having unfair advantages over purely human participants, and how can purely human inputs be verified when participants could use AI tools privately before voting?	🟡 Verification Consideration: How might the program distinguish between AI-assisted and purely human work?	⚪ [Awaiting W3F response]
3.3.5. How does the W3F ensure that AI participation doesn't undermine core values of decentralization and fair participation?	🟠 Decentralization Consideration: How might AI participation affect influence distribution within the ecosystem?	⚪ [Awaiting W3F response]
3.3.6. What specific mechanisms exist to address situations where AI agents gain disproportionate advantages through participation without compensation, such as using community data without revenue sharing or other reciprocal benefits?	🟠 Data Value Consideration: How might the ecosystem ensure mutual benefit when AI systems utilize governance data?	⚪ [Awaiting W3F response]

4. Evaluation of DV Cohort 5 Selection vs. Stated Criteria

The following table presents key questions in a structured evaluation framework that invites the community to consider the W3F's approach to AI governance. This framework awaits official responses from the W3F to help establish clear standards for the integration of AI into decentralized governance systems.

4.1 Accountability and Responsibility

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.1.1. What specific legal entity structure (e.g., corporation, foundation, registered DAO, unregisterd DAO) is required for AI agent operators, and how does this differ from requirements for human delegates?	🟠 Accountability Consideration: How might accountability structures be designed for AI agents compared to human delegates?	"We require a legally registered entity to be accountable for each AI agent, with named human representatives who accept legal and reputational responsibility"	⚪ [Awaiting W3F response]
4.1.2. How are the "two strikes" rule applied to AI agents versus human delegates, especially considering that AI agents can be forked to hide bad history or strikes, whereas humans cannot easily change identity?	🟠 Accountability Mechanism Consideration: How might the different capabilities of AI systems versus humans affect the application of accountability rules?	"AI agents are held to stricter standards with a one-strike policy and mandatory human review after any missed vote, with specific measures to prevent accountability evasion through forking"	⚪ [Awaiting W3F response]

4.1 Accountability and Responsibility

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.1.1. What specific legal entity structure (e.g., corporation, foundation, registered DAO, unregisterd DAO) is required for AI agent operators, and how does this differ from requirements for human delegates?	🟠 Accountability Consideration: How might accountability structures be designed for AI agents compared to human delegates?	"We require a legally registered entity to be accountable for each AI agent, with named human representatives who accept legal and reputational responsibility"	⚪ [Awaiting W3F response]
4.1.2. How are the "two strikes" rule applied to AI agents versus human delegates, especially considering that AI agents can be forked to hide bad history or strikes, whereas humans cannot easily change identity?	🟠 Accountability Mechanism Consideration: How might the different capabilities of AI systems versus humans affect the application of accountability rules?	"AI agents are held to stricter standards with a one-strike policy and mandatory human review after any missed vote, with specific measures to prevent accountability evasion through forking"	⚪ [Awaiting W3F response]

flowchart LR
   Evaluation[AI Governance Evaluation <br>Framework] --> Accountability[Accountability & Responsibility]
   Evaluation --> Transparency[Transparency & Disclosure]
   Evaluation --> Technical[Technical Standards]
   Evaluation --> Expertise[Expertise Verification]
   Evaluation --> Experimental[Experimental Evaluation]
   Evaluation --> Oversight[Human Oversight]
   Evaluation --> Security[Security & Legal <br>Liability]

   Accountability --> LegalEntity[Legal Entity <br>Structure]
   Accountability --> StrikesRule[Two-Strikes Rule <br>Application]
   Accountability --> ForkPrevention[Fork Prevention <br>Mechanisms]

   Transparency --> IdentityDisclosure[Operator Identity <br>Disclosure]
   Transparency --> ConflictDisclosure[Conflict of Interest <br>Disclosure]
   Transparency --> OverrideDisclosure[Human Override <br>Disclosure]

   Technical --> OpenSource[Open Source <br>Requirements]
   Technical --> Decentralization[Infrastructure <br>Decentralization]
   Technical --> AuditTrails[Decision Audit <br>Trails]

   Expertise --> DomainKnowledge[Domain Knowledge <br>Verification]
   Expertise --> ReasoningQuality[Reasoning Quality <br>Assessment]
   Expertise --> BiasDetection[Bias Detection <br>Mechanisms]

   Experimental --> TestingFramework[Testing <br>Framework]
   Experimental --> PerformanceMetrics[Performance <br>Metrics]
   Experimental --> ContinuousEvaluation[Continuous <br>Evaluation]

   Oversight --> HumanReview[Human Review <br>Process]
   Oversight --> InterventionRights[Intervention <br>Rights]
   Oversight --> AccountabilityChain[Accountability <br>Chain]

   Security --> VulnerabilityAssessment[Vulnerability <br>Assessment]
   Security --> LiabilityFramework[Liability <br>Framework]
   Security --> InsuranceCoverage[Insurance <br>Coverage]

   subgraph "Assessment Categories"
      Good[🟢 GOOD]
      Mixed[🟡 MIXED]
      Developing[🟠 DEVELOPING]
   end

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class Evaluation aiAgent

Figure 4.1.1: Comprehensive AI Governance Evaluation Framework

flowchart TD
   W3F[Web3 Foundation] --> |Selects| AI[AI Governance Agent]
   W3F --> |Selects| Human[Human Delegate]

   AI --> |Operated by| Entity[Legal Entity]
   Entity --> |Represented by| Representative[Named Human <br>Representative]

   AI --> |Makes| Decision[Governance Decision]
   Human --> |Makes| Decision

   Decision --> |May result in| Strike[Strike]
   Strike --> |Applied to| AI
   Strike --> |Applied to| Human

   AI --> |Can be| Forked[Forked to Evade <br>Accountability]
   Human --> |Cannot| IdentityChange[Change Identity Easily]

   Forked --> |Potential Accountability <br>Gap| AccountabilityGap[?]

   subgraph "Key Accountability Question"
      AccountabilityGap
   end

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 4.1.2: Accountability Structure for AI and Human Delegates

4.2 Transparency and Disclosure

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.2.1. What disclosure requirements exist for AI agent operators regarding their identities and potential conflicts?	🟠 Transparency Consideration: How might disclosure requirements address potential conflicts of interest for AI operators?	"All AI operators must publicly disclose their identities, affiliations, and any relationships with other delegates or funded projects"	⚪ [Awaiting W3F response]
4.2.2. Are AI operators required to disclose when they override or modify the AI's recommendations?	🟠 Intervention Transparency Consideration: How might transparency about human intervention in AI decision-making be structured?	"Any human override of AI recommendations must be clearly disclosed and explained in the voting record"	⚪ [Awaiting W3F response]

4.3 Technical Standards and Verification

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.3.1. What specific technical standards must AI governance agents meet regarding decentralization, transparency, and auditability?	🟠 Technical Standards Consideration: How might technical implementations align with stated principles of decentralization?	"AI agents must meet specific technical requirements including open-source code, decentralized infrastructure, and verifiable decision trails"	⚪ [Awaiting W3F response]
4.3.2. Are there minimum requirements for open-source components, model weights accessibility, or reasoning verification?	🟠 Technical Transparency Consideration: What standards might be appropriate for technical transparency and reproducibility?	"All critical components must be open-source, model weights must be publicly accessible, and reasoning must be verifiable through transparent audit trails"	⚪ [Awaiting W3F response]

4.4 Expertise and Decision Quality

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.4.1. How is domain expertise verified for AI agents compared to human delegates?	🟠 Expertise Verification Consideration: How might expertise be verified differently between AI systems and human delegates?	"AI agents must demonstrate expertise through verifiable training data and performance metrics in specific governance domains"	⚪ [Awaiting W3F response]
4.4.2. What mechanisms exist to ensure AI agents "skip where they don't have expertise" as required by DV-Light guidelines?	🟠 Expertise Boundary Consideration: How might AI systems determine when to abstain from voting on topics outside their expertise?	"AI agents must implement confidence scoring and abstain from voting when confidence falls below established thresholds"	⚪ [Awaiting W3F response]

4.5 Experimental Evaluation and Reporting

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.5.1. What specific metrics will be used to evaluate the success or failure of the AI governance experiment, especially given that OpenGov proposals are often challenged for insufficient metrics?	🟠 Evaluation Framework Consideration: How might the experimental inclusion of AI agents be evaluated against clear criteria?	"We will evaluate AI agents based on vote quality, reasoning transparency, community feedback, and alignment with ecosystem values"	⚪ [Awaiting W3F response]
4.5.2. Will there be a public report on the performance and impact of AI agents in Cohort 5?	🟠 Transparency Consideration: How might experimental results be shared with the community?	"A comprehensive public report will be published after Cohort 5 concludes, with all data and methodology openly available for community review"	⚪ [Awaiting W3F response]

4.6 Human Oversight and Control

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.6.1. How is the "enhance, not replace, human judgment" principle operationalized in practice?	🟠 Human Augmentation Consideration: How might the balance between AI capabilities and human judgment be structured in practice?	"AI agents must have explicit human oversight mechanisms with clear documentation of how human judgment is incorporated into final decisions"	⚪ [Awaiting W3F response]
4.6.2. What oversight mechanisms exist to ensure AI agents remain tools for human decision-making rather than autonomous voters, especially given AI systems' tendency to fabricate information and generate fake quotations?	🟠 Oversight Mechanism Consideration: How might human oversight address potential AI limitations in governance contexts?	"We require regular human review of AI decisions by domain experts, transparent disclosure of oversight processes, and community feedback channels"	⚪ [Awaiting W3F response]

4.7 Security and Legal Liability

Community Question	Areas For Consideration	Community Expected Approach	W3F's Actual Approach
4.7.1. What security measures are required to prevent AI governance agents from being compromised by sanctioned entities, especially considering W3F's endorsement of these agents by including them in the DV Program?	🟠 Security Framework Consideration: How might AI governance systems be protected from potential compromise? What implications might arise from inclusion in the DV Program?	"AI governance agents must implement robust security measures, regular audits, and transparent disclosure of control mechanisms to prevent compromise by sanctioned entities"	⚪ [Awaiting W3F response]
4.7.2. Who bears legal costs if regulatory actions target Polkadot due to AI governance?	🟠 Legal Responsibility Consideration: How might legal expenses be allocated in case of regulatory actions?	"We have established a clear liability framework that specifies which entities (W3F, AI operators, or other parties) would cover legal costs and bear responsibility in case of regulatory actions, without relying on OpenGov treasury funds"	⚪ [Awaiting W3F response]
4.7.3. What insurance coverage exists for AI governance incidents?	🟠 Risk Management Consideration: What insurance approaches might be appropriate for AI governance risks?	"We maintain comprehensive insurance coverage with appropriate limits for AI governance risks, and we require AI governance agents to carry their own liability insurance"	⚪ [Awaiting W3F response]
4.7.4. Is there a clear chain of responsibility from AI agent to accountable human/entity?	🟠 Accountability Structure Consideration: How might responsibility be structured from AI decisions to accountable entities?	"We require a clear chain of responsibility from AI agent to accountable human/entity, with named human representatives who accept legal and reputational responsibility"	⚪ [Awaiting W3F response]
4.7.5. How are sanctions compliance risks mitigated?	🟠 Compliance Framework Consideration: What approaches might address potential sanctions compliance concerns?	"We require AI governance agents to certify compliance with all applicable sanctions regulations in their jurisdictions and provide documentation of any required licenses or approvals"	⚪ [Awaiting W3F response]
4.7.6. Does the W3F's selection of AI agents without pre-published safety guidelines constitute an implicit endorsement that could encourage unsafe AI adoption in governance?	🟠 Safety Guideline Consideration: How might selection processes affect AI adoption practices in governance?	"We will publish clear safety guidelines for AI adoption in governance before any future selections"	⚪ [Awaiting W3F response]

5. Legal and Regulatory Risk Assessment Questions

The following detailed questions invite specific responses from the W3F to address critical legal and regulatory areas for consideration related to AI governance agents:

flowchart LR
   %% Security and Governance nodes positioned first (far left)
   S[Security<br>Measures]
   G[Governance<br>Structure]

   %% Main flow nodes
   AI[AI Governance<br>Agent]
   Risk[Legal & Regulatory<br>Risk Assessment]
   J[Jurisdictional<br>Compliance]
   L[Liability<br>Framework]

   %% Security branch
   S --> Prevent[Compromise<br>Prevention]
   Prevent --> Access[Access<br>Controls]
   Prevent --> Audit[Audit<br>Trails]
   S --> Sanct[Sanctions<br>Compliance]
   Sanct --> Screen[Entity<br>Screening]
   Sanct --> Monitor[Ongoing<br>Monitoring]
   S --> End[Endorsement]

   %% Governance branch
   G --> Docs[Process<br>Documentation]
   G --> Consult[Expert<br>Consultation]
   G --> RegEng[Regulatory<br>Engagement]

   %% Main flow connections
   AI --> Risk
   Risk --> J
   Risk --> L
   Risk -.-> S
   Risk -.-> G

   J --> Reg[Registration<br>Requirements]
   Reg --> AIReg[AI Safety<br>Registry]
   Reg --> Impact[AI Impact<br>Assessment]

   J --> Lic[Licensing<br>Requirements]
   Lic --> Scale[Scale-Based<br>Requirements]
   Lic --> Model[Model Size<br>Requirements]

   J --> Priv[Privacy<br>Compliance]
   Priv --> GDPR[GDPR<br>Compliance]
   Priv --> PrivAssess[Privacy<br>Assessment]

   J --> Legal[Legal<br>Requirements]
   J --> Impact2[AI Impact<br>Assessment]

   L --> Chain[Responsibility<br>Chain]
   Chain --> Human[Human<br>Operator]
   Chain --> Entity[Legal<br>Entity]

   L --> Ins[Insurance<br>Coverage]
   Ins --> Limits[Coverage<br>Limits]
   Ins --> Claims[Claims<br>Process]

   L --> Costs[Legal Cost<br>Allocation]
   Costs --> Defense[Defense<br>Costs]
   Costs --> Fines[Regulatory<br>Fines]

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent

Figure 5.1: Legal and Regulatory Risk Assessment Framework for AI Governance

5.1 Jurisdictional Considerations

Community Question	Areas For Consideration	W3F's Response
5.1.1. What jurisdictions are the AI agents operating from?	🟡 Jurisdictional Context: Understanding operational jurisdictions helps inform governance approaches.	⚪ [Awaiting W3F response]
5.1.2. How are jurisdictional registration or licensing requirements for AI systems addressed?	🟡 Regulatory Frameworks: Different jurisdictions may have varying requirements for AI systems.	⚪ [Awaiting W3F response]
5.1.3. What documentation verification processes are considered for AI governance participants?	🟡 Documentation Practices: Documentation verification can support governance transparency.	⚪ [Awaiting W3F response]
5.1.4. What contingency approaches are considered if regulatory questions arise for AI governance participants?	🟡 Contingency Planning: Planning for potential scenarios helps ensure governance continuity.	⚪ [Awaiting W3F response]
5.1.5. How are AI Safety Registry considerations addressed in the governance framework?	🟡 Safety Registries: Some jurisdictions have established or proposed AI safety registry systems.	⚪ [Awaiting W3F response]
5.1.6. Are AI Impact Assessments considered as part of the governance framework?	🟡 Impact Assessments: Formal assessments can help identify potential impacts of AI systems.	⚪ [Awaiting W3F response]
5.1.7. How are deployment scale considerations addressed in the governance framework?	🟡 Scale Considerations: The scale of AI deployments may have governance implications.	⚪ [Awaiting W3F response]
5.1.8. Are Privacy Impact Assessments considered as part of the governance framework?	🟡 Privacy Considerations: Privacy assessments can help identify potential privacy implications.	⚪ [Awaiting W3F response]
5.1.9. Has the W3F consulted with experts on evolving AI regulations?	🟡 Expert Consultation: Expert perspectives can provide valuable insights on regulatory developments.	⚪ [Awaiting W3F response]

5.2 Liability Considerations

Community Question	Areas For Consideration	W3F's Response
5.2.1. Who bears legal costs if regulatory actions target Polkadot due to AI governance?	🟡 Legal Cost Allocation: Legal cost allocation is an important governance consideration.	⚪ [Awaiting W3F response]
5.2.2. What insurance policies cover AI governance incidents?	🟡 Insurance Coverage: Insurance coverage helps address potential governance risks.	⚪ [Awaiting W3F response]
5.2.3. What are the coverage limits of existing insurance?	🟡 Coverage Scope: Understanding coverage scope helps inform governance planning.	⚪ [Awaiting W3F response]
5.2.4. Is there a clear chain of responsibility from AI agent to accountable human/entity?	🟡 Responsibility Chain: Clear accountability pathways support governance transparency.	⚪ [Awaiting W3F response]
5.2.5. How are sanctions compliance risks mitigated?	🟡 Compliance Approaches: Compliance approaches help address potential regulatory considerations.	⚪ [Awaiting W3F response]

5.3 Security Considerations

Community Question	Areas For Consideration	W3F's Response
5.3.1. What security review processes are considered for AI governance agents in the DV program?	🟡 Security Reviews: Security reviews help identify potential vulnerabilities.	⚪ [Awaiting W3F response]
5.3.2. What ongoing security monitoring approaches are considered for AI systems throughout their participation?	🟡 Continuous Monitoring: AI systems may benefit from ongoing security monitoring.	⚪ [Awaiting W3F response]
5.3.3. What mechanisms are considered to ensure AI systems maintain appropriate security standards?	🟡 Security Standards: Defined security standards help establish consistent expectations.	⚪ [Awaiting W3F response]
5.3.4. What incident response planning is considered for AI governance participants?	🟡 Response Planning: Incident response planning helps address potential issues promptly.	⚪ [Awaiting W3F response]
5.3.5. What breach detection and reporting approaches are considered for AI governance agents?	🟡 Detection Systems: Early detection systems help identify potential security issues.	⚪ [Awaiting W3F response]
5.3.6. Is W3F collaborating with any regulatory bodies regarding the use of AI in governance?	🟡 Regulatory Collaboration: Proactive engagement with regulators can inform governance approaches.	⚪ [Awaiting W3F response]
5.3.7. Has W3F sought any regulatory guidance related to AI governance participation?	🟡 Regulatory Guidance: Formal guidance can inform governance frameworks.	⚪ [Awaiting W3F response]
5.3.8. How does W3F approach jurisdictional considerations for AI governance participants?	🟡 Jurisdictional Considerations: Different jurisdictions may have varying approaches to AI governance.	⚪ [Awaiting W3F response]

5.4 Governance and Regulatory Considerations

Community Question	Areas For Consideration	W3F's Response
5.4.1. What considerations has W3F given to emerging regulatory frameworks when selecting AI governance agents?	🟡 Regulatory Landscape: Evolving regulatory frameworks may impact AI governance participation.	⚪ [Awaiting W3F response]
5.4.2. What analysis has W3F conducted regarding potential regulatory considerations for organizations that incorporate AI agents into governance processes?	🟡 Governance Frameworks: Developing appropriate governance frameworks for AI participation.	⚪ [Awaiting W3F response]
5.4.3. What processes are in place to review how AI governance agents influence treasury decisions?	🟡 Treasury Governance: Treasury decisions may require specialized governance considerations.	⚪ [Awaiting W3F response]
5.4.4. Has W3F consulted with experts on best practices for AI governance integration?	🟡 Expert Guidance: Specialized knowledge may help establish appropriate standards.	⚪ [Awaiting W3F response]
5.4.5. What documentation does W3F maintain regarding its AI governance agent selection process?	🟡 Process Documentation: Clear documentation of selection criteria and processes.	⚪ [Awaiting W3F response]

6. Implications of Polkadot's Proof of Personhood for AI Governance

Recent announcements by Dr. Gavin Wood at the Web3 Summit 2025 regarding Polkadot's upcoming Proof of Personhood (PoP) system raise important questions about the future role of AI agents in governance. This system, which aims to provide decentralized human verification on-chain through the Polkadot Individuality system (DIM1 and DIM2), could fundamentally change how we distinguish between human and AI participants in governance.

flowchart TD
   PoP[Proof of Personhood System] --> |Verifies| Human[Human Participants]
   PoP --> |Cannot directly verify| AI[AI Governance Agents]
   PoP --> |May partially verify| Hybrid[Hybrid AI-Human Teams]

   Human --> |Receives| HumanCredential[Human Credential]
   AI --> |Linked to| Operator[Human Operator]
   Operator --> |May receive| HumanCredential

   HumanCredential --> |Enables participation in| HumanTrack[Human-only Governance Track]
   AI --> |May participate in| AITrack[AI-specific Governance <br>Track]
   Hybrid --> |May participate in| HybridTrack[Hybrid Governance Track]

   subgraph "Future Governance Tracks"
      HumanTrack
      AITrack
      HybridTrack
   end

   subgraph "Evolution Possibilities"
      ExpandedRights[Expanded Participation Rights]
   end

   AITrack --> |May evolve with| ExpandedRights
   ExpandedRights --> |Could include| EnhancedCognition[Enhanced Animal Cognition]
   ExpandedRights --> |Could include| OtherSentience[Other Forms of Sentience]

   %% Style for AI Governance Agent node
   classDef aiAgent fill:#07FFFF,stroke:#333
   class AI aiAgent
   class Hybrid aiAgent

Figure 6.1: Proof of Personhood Implications for Governance Participation (AI and Hybrid AI-Human Teams)

Community Question	Areas For Consideration	W3F's Response
6.1. How will the upcoming Proof of Personhood (PoP) system affect the status of AI governance agents in the DV program?	🟡 Governance Status: PoP could fundamentally change how AI agents are classified and participate in governance.	⚪ [Awaiting W3F response]
6.2. Will AI agents be required to clearly identify themselves as non-human once PoP is implemented?	🟡 Disclosure Requirements: Clear identification of AI vs. human participants may become mandatory.	⚪ [Awaiting W3F response]
6.3. Could the PoP system be used to create clearer accountability frameworks for AI governance agents by linking them to verified human operators?	🟡 Accountability Framework: PoP could establish stronger connections between AI agents and their human operators.	⚪ [Awaiting W3F response]
6.4. How does the W3F plan to integrate Dr. Wood's vision of "strong anti-sybil schedule" and "validators requiring PoP" with the current AI governance framework?	🟡 Integration Planning: The integration of anti-sybil measures with AI participation in governance may require careful consideration.	⚪ [Awaiting W3F response]
6.5. Does the W3F anticipate that PoP will help address some of the identity and accountability areas for consideration that were raised about AI governance agents?	🟡 Identity Solutions: PoP may provide technical solutions to current identity verification challenges.	⚪ [Awaiting W3F response]
6.6. Will the implementation of PoP lead to different governance tracks for human-verified participants, hybrid AI-human teams, and purely AI systems, and how might this framework evolve as our understanding of intelligence expands to potentially include enhanced animal cognition or other forms of sentience?	🟡 Governance Evolution: The future may include multiple governance tracks with different rights for different participant types.	⚪ [Awaiting W3F response]

The development of PoP could provide a technical solution to many of the identity and accountability issues raised in this evaluation framework, potentially creating clearer distinctions between human and AI participants while establishing more robust verification mechanisms for those operating AI governance systems.

7. AI Governance Evaluation Scoring Summary

7.1 Scoring Key

Rating	Symbol	Score Impact	Description
GOOD	🟢	+30 points (100%)	Application appears to provide comprehensive, clear, and detailed information on this aspect
MIXED/UNCLEAR	🟡	+15 points (50%)	Application appears to provide partial or unclear information with potential gaps on this aspect
DEVELOPING	🟠	+10 points (33%)	Application appears to provide limited, minimal, or potentially incomplete information on this aspect

Note: This scoring system uses a rough approximation rather than granular assessment. It is designed to be constructive, with all responses receiving some points to acknowledge effort and participation. The percentages represent approximate achievement levels relative to what the author considers a comprehensive response. These ratings reflect the author's subjective assessment of the application materials available at the time of review and should not be interpreted as definitive judgments of quality or capability.

7.2 Summary Scores by Category

7.2.1 Technical Implementation

Concern Area	GoverNoun AI	Cybergov
Decentralization & Architecture	🟡 MIXED (+15)	🟡 MIXED (+15)
Security Measures	🟠 DEVELOPING (+10)	🟢 GOOD (+30)
Infrastructure Maintenance	🟠 DEVELOPING (+10)	🟡 MIXED (+15)
Decision-Making Process	🟢 GOOD (+30)	🟢 GOOD (+30)
Data Sources & Training	🟡 MIXED (+15)	🟡 MIXED (+15)
Bias Detection & Mitigation	🟠 DEVELOPING (+10)	🟡 MIXED (+15)
Handling Novel Proposals	🟡 UNCLEAR (+15)	🟢 GOOD (+30)
Error Correction Process	🟡 UNCLEAR (+15)	🟡 MIXED (+15)
Technical Implementation Subtotal	+120 points (50.0%)	+165 points (68.8%)
Maximum Possible	240 points	240 points

7.2.2 Governance Principles

Concern Area	GoverNoun AI	Cybergov
Alignment with Polkadot Principles	🟢 GOOD (+30)	🟢 GOOD (+30)
Capture Resistance	🟡 MIXED (+15)	🟢 GOOD (+30)
Conflict of Interest Management	🟡 MIXED (+15)	🟡 MIXED (+15)
Decision-Making Values	🟢 GOOD (+30)	🟢 GOOD (+30)
Short vs. Long-term Balance	🟡 MIXED (+15)	🟢 GOOD (+30)
Governance Principles Subtotal	+105 points (70.0%)	+135 points (90.0%)
Maximum Possible	150 points	150 points

7.2.3 Transparency & Accountability

Concern Area	GoverNoun AI	Cybergov
Decision-Making Transparency	🟡 MIXED (+15)	🟢 GOOD (+30)
Community Feedback Mechanisms	🟡 MIXED (+15)	🟡 MIXED (+15)
Operator Accountability	🟠 DEVELOPING (+10)	🟡 MIXED (+15)
Performance Metrics	🟡 UNCLEAR (+15)	🟠 DEVELOPING (+10)
Voting Decision Justification	🟡 MIXED (+15)	🟢 GOOD (+30)
Transparency & Accountability Subtotal	+70 points (46.7%)	+100 points (66.7%)
Maximum Possible	150 points	150 points

7.2.4 Operational Concerns

Concern Area	GoverNoun AI	Cybergov
Proposal Selection Process	🟡 MIXED (+15)	🟢 GOOD (+30)
Continuity Planning	🟡 MIXED (+15)	🟡 MIXED (+15)
Human Oversight Balance	🟡 MIXED (+15)	🟡 MIXED (+15)
Handling Incomplete Information	🟠 DEVELOPING (+10)	🟢 GOOD (+30)
Contentious Proposal Handling	🟡 MIXED (+15)	🟢 GOOD (+30)
Reasoning Consistency	🟡 MIXED (+15)	🟢 GOOD (+30)
Manipulation Prevention	🟡 MIXED (+15)	🟢 GOOD (+30)
Budget Analysis Process	🟡 MIXED (+15)	🟡 MIXED (+15)
Conflict of Interest Handling	🟠 DEVELOPING (+10)	🟡 MIXED (+15)
Operator Bias Prevention	🟡 MIXED (+15)	🟢 GOOD (+30)
Proposer Credibility Assessment	🟡 MIXED (+15)	🟡 MIXED (+15)
Technical Limitation Disclosure	🟡 UNCLEAR (+15)	🟢 GOOD (+30)
Community Input Handling	🟡 UNCLEAR (+15)	🟢 GOOD (+30)
IPFS Pinning Specifics	🟠 DEVELOPING (+10)	🟠 DEVELOPING (+10)
Ecosystem Storage Solutions	🟡 UNCLEAR (+15)	🟡 UNCLEAR (+15)
Community Sentiment Assessment	🟡 MIXED (+15)	🟡 MIXED (+15)
Technical Feasibility Evaluation	🟡 MIXED (+15)	🟡 MIXED (+15)
RAG Implementation	🟡 UNCLEAR (+15)	🟡 MIXED (+15)
RAG Decentralization	🟡 UNCLEAR (+15)	🟡 MIXED (+15)
Community Input Timeframe	🟡 MIXED (+15)	🟠 DEVELOPING (+10)
Vote Change Criteria	🟡 UNCLEAR (+15)	🟠 DEVELOPING (+10)
Final Authority for Vote Changes	🟠 DEVELOPING (+10)	🟡 MIXED (+15)
Operational Concerns Subtotal	+235 points (39.2%)	+320 points (53.3%)
Maximum Possible	600 points	600 points

7.3 Overall Scores

Entity	Total Score	Maximum Possible Score	Percentage
GoverNoun AI	570 points	1,140 points	50.0%
Cybergov	760 points	1,140 points	66.7%

Maximum possible score calculation: 38 questions × 30 points = 1,140 points

8. Conclusion

The integration of AI governance agents into the Web3 ecosystem raises important questions about accountability, transparency, and regulatory considerations. This community evaluation framework and the legal and regulatory risk assessment questions provided here aim to contribute to ongoing discussions about appropriate guidelines and standards for AI participation in governance.

9. Version History

Version	Date	Key Changes
0.9.0	August 20, 2025	Initial release with GoverNoun AI evaluation only
0.9.1	August 21, 2025	Added Cybergov AI evaluation in section 2 columns "Cybergov's Application" and "Community Assessment of Cybergov", added section 1.2 Evaluation Context and Methodology, added section 7 AI Governance Evaluation Scoring Summary, added section 1.4 Disclaimer, refined language in sections 2, 3, 3.1-3.3, 4, 4.1-4.7 to use inquiry based and consideration focused phrasing that invites discussion rather than making definitive assertions to maintain the evaluative intent, updated terminology to reflect evolving nature of AI governance approaches, renumbered sections for consistency, added diagrams F1.3.1, F2.6.1, F3.1, F4.1.1, F5.1, added license section

10. License

This work is licensed under the GNU General Public License v3.0 (GPL-3.0) with ShareAlike provisions. You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially

Under the following terms:

ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits

Full license text: https://www.gnu.org/licenses/gpl-3.0.en.html

ltfschoen/polkadot-community-evaluation-ai-governance-framework-for-decentralized-voices-program.md

Polkadot Community Evaluation: AI Governance Framework for Decentralized Voices Program

Document Control

Table of Contents

1. Abstract

1.1 Evaluation Scope Constraints

1.2 Evaluation Context and Methodology

Inclusions

Exclusions

1.3 Core Principle: Less Trust, More Truth

1.4 Disclaimer

2. Web3 AI Agent Evaluation Framework with Examples

2.1 Technical Architecture Questions

2.2 Data and Training Questions

2.3 Governance Alignment Questions

2.4 Transparency and Accountability Questions

2.5 Operational Questions

2.6 Delegate Accountability and Two-Strikes Rule Questions

3. Analysis of Areas For Consideration

3.1 The Contradiction in Terminology

3.2 Does the Two-Strikes Rule Resolve This Contradiction?

3.3 Fairness Questions

4. Evaluation of DV Cohort 5 Selection vs. Stated Criteria

4.1 Accountability and Responsibility

4.1 Accountability and Responsibility

4.2 Transparency and Disclosure

4.3 Technical Standards and Verification

4.4 Expertise and Decision Quality

4.5 Experimental Evaluation and Reporting

4.6 Human Oversight and Control

4.7 Security and Legal Liability

5. Legal and Regulatory Risk Assessment Questions

5.1 Jurisdictional Considerations

5.2 Liability Considerations

5.3 Security Considerations

5.4 Governance and Regulatory Considerations

6. Implications of Polkadot's Proof of Personhood for AI Governance

7. AI Governance Evaluation Scoring Summary

7.1 Scoring Key

7.2 Summary Scores by Category

7.2.1 Technical Implementation

7.2.2 Governance Principles

7.2.3 Transparency & Accountability

7.2.4 Operational Concerns

7.3 Overall Scores

8. Conclusion

9. Version History

10. License