Skip to content

Instantly share code, notes, and snippets.

@orenyomtov
Created April 14, 2026 06:58
Show Gist options
  • Select an option

  • Save orenyomtov/012da120450261a15945a456df57a96b to your computer and use it in GitHub Desktop.

Select an option

Save orenyomtov/012da120450261a15945a456df57a96b to your computer and use it in GitHub Desktop.
AutoGen Security Research Report - 8 findings including 3 CRITICAL

AutoGen Security Research Report

Executive Summary

This security research analyzed Microsoft's AutoGen multi-agent framework for vulnerabilities. The analysis identified 8 security findings, including 3 CRITICAL severity issues that could lead to arbitrary code execution.

Scope

  • Repository: https://github.com/microsoft/autogen
  • Packages Analyzed:
    • autogen-core (Core message passing and agent runtime)
    • autogen-agentchat (High-level multi-agent API)
    • autogen-ext (Extensions: code executors, MCP tools, HTTP tools)
    • autogen-studio (Web UI and API server)
    • Experimental packages

Key Findings Summary

ID Finding Severity Type
F001 Arbitrary Code Execution via FunctionTool Config CRITICAL RCE
F002 Weak Code Sanitization in Approval Function HIGH Bypass
F003 LocalCommandLineCodeExecutor No Sandbox CRITICAL RCE
F004 MCP StdioServerParams Command Injection CRITICAL RCE
F005 Malicious Content in Agent Messages MEDIUM Injection
F006 HTTP Tool SSRF Risk MEDIUM SSRF
F007 Unsafe Pickle Deserialization CRITICAL RCE
F008 No Validation of WebSocket Team Config HIGH RCE

Critical Issues Requiring Immediate Action

F001 & F007: Arbitrary Code Execution

The framework has multiple paths to arbitrary code execution:

  1. FunctionTool._from_config() uses exec() on untrusted source code
  2. Memory Bank uses pickle.load() on potentially untrusted files

Recommendation: Replace with safe alternatives (AST parsing, JSON serialization).

F003 & F004: Command Injection

  • LocalCommandLineCodeExecutor has no actual command filtering
  • MCP StdioServerParams allows arbitrary commands

Recommendation: Implement command allowlisting and sandboxing.

F008: Unvalidated Team Configs

WebSocket team configs bypass validation and can trigger RCE.

Recommendation: Validate all team configs before loading.

Risk Model Notes

AutoGen is designed with code execution as a core feature. This fundamentally changes the threat model:

  • HIGH RISK: Local execution, untrusted configs, sandbox escapes
  • MEDIUM RISK: SSRF, prompt injection, context pollution
  • DESIGN LIMITATION: Complete isolation is not achievable without disabling features

Recommendations Summary

Short Term

  1. Document security boundaries clearly
  2. Add security warnings to dangerous APIs
  3. Implement command allowlisting
  4. Replace pickle with JSON

Medium Term

  1. Add sandboxed execution modes
  2. Implement component validation for all config sources
  3. Add audit logging for sensitive operations
  4. Create security best practices guide

Long Term

  1. Consider AST-based code analysis instead of string matching
  2. Implement multi-tenant isolation for production deployments
  3. Add optional security-hardened execution modes

Conclusion

AutoGen is a powerful but inherently high-risk framework due to its code execution capabilities. The identified vulnerabilities primarily stem from:

  1. Trusting user-provided configurations
  2. Using unsafe serialization in experimental features
  3. Inadequate sandboxing by default

Users should NOT deploy AutoGen with untrusted inputs without additional security controls.


Research conducted: 2025-04-14 Target: microsoft/autogen (main branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment