Java Code Execution in LangGraph/LangChain

Overview

Adding Java code execution capabilities to LangGraph or LangChain workflows enables agents to write, compile, and run Java code dynamically. This pattern extends the reasoning capabilities of LLM agents with the ability to execute Java-specific operations, interact with JVM-based libraries, and leverage enterprise Java ecosystems.

graph TD
    A[LLM Agent] -->|Generates| B[Java Code]
    B --> C[Java Execution Tool]
    C -->|Compile| D[javac]
    D -->|Run| E[JVM]
    E -->|Output| F[Execution Results]
    F -->|Return to| A
    
    subgraph "Java Runtime Environment"
        D
        E
    end

Implementation Options

1. Custom Java Execution Tool

This approach creates a dedicated tool that can be integrated into any LangChain or LangGraph workflow:

from typing import Dict, Any, Optional
from langchain_core.tools import BaseTool
import subprocess
import tempfile
import os
import uuid
import json
import re

class JavaCodeExecutionTool(BaseTool):
    name: str = "java_code_executor"
    description: str = """
    Executes Java code and returns the output. The code should be a complete Java program with a main method.
    For example:
    ```java
    public class HelloWorld {
        public static void main(String[] args) {
            System.out.println("Hello, World!");
        }
    }
    ```
    """
    
    java_home: Optional[str] = None
    timeout: int = 30  # seconds
    memory_limit: str = "512m"
    save_artifacts: bool = False
    artifacts_dir: str = "./java_artifacts"
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Use environment JAVA_HOME if not specified
        if not self.java_home:
            self.java_home = os.environ.get("JAVA_HOME")
            if not self.java_home:
                raise ValueError("JAVA_HOME not found in environment and not specified")
        
        # Create artifacts directory if needed
        if self.save_artifacts and not os.path.exists(self.artifacts_dir):
            os.makedirs(self.artifacts_dir)
    
    def _run(self, code: str) -> Dict[str, Any]:
        """Run Java code and return the result."""
        unique_id = str(uuid.uuid4())[:8]
        
        # Extract class name from code
        class_match = re.search(r"public\s+class\s+(\w+)", code)
        if not class_match:
            return {"error": "No public class found in code"}
        
        class_name = class_match.group(1)
        file_name = f"{class_name}.java"
        
        # Create temporary directory for compilation
        with tempfile.TemporaryDirectory() as tmpdir:
            # Save code to file
            java_file_path = os.path.join(tmpdir, file_name)
            with open(java_file_path, "w") as f:
                f.write(code)
            
            # Save artifact if configured
            if self.save_artifacts:
                artifact_path = os.path.join(self.artifacts_dir, f"{class_name}_{unique_id}.java")
                with open(artifact_path, "w") as f:
                    f.write(code)
            
            # Compile Java file
            javac_path = os.path.join(self.java_home, "bin", "javac")
            compile_result = subprocess.run(
                [javac_path, file_name],
                cwd=tmpdir,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )
            
            if compile_result.returncode != 0:
                return {
                    "success": False,
                    "stage": "compilation",
                    "error": compile_result.stderr,
                    "code": code
                }
            
            # Run compiled Java class
            java_path = os.path.join(self.java_home, "bin", "java")
            run_result = subprocess.run(
                [java_path, f"-Xmx{self.memory_limit}", class_name],
                cwd=tmpdir,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )
            
            return {
                "success": run_result.returncode == 0,
                "stdout": run_result.stdout,
                "stderr": run_result.stderr,
                "exit_code": run_result.returncode,
                "execution_id": unique_id,
                "code": code
            }

# Example usage in a LangGraph workflow
def integrate_java_executor():
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_openai import ChatOpenAI
    from langchain.agents import create_react_agent
    from langchain.agents import AgentExecutor
    from langgraph.graph import StateGraph
    
    # Create Java executor tool
    java_tool = JavaCodeExecutionTool(
        java_home="/usr/lib/jvm/java-17-openjdk",  # Adjust path as needed
        timeout=15,
        memory_limit="256m"
    )
    
    # Create LLM and tools
    llm = ChatOpenAI(temperature=0)
    tools = [java_tool]
    
    # Create prompt
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a Java programming assistant. When asked to solve problems, write valid Java code with a main method."),
        ("human", "{input}"),
    ])
    
    # Create agent
    agent = create_react_agent(llm, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools)
    
    return agent_executor

2. Maven-Based Tool for Complex Java Projects

For more complex Java applications requiring dependencies:

class MavenJavaExecutor(BaseTool):
    name: str = "maven_java_executor"
    description: str = """
    Executes Java code using Maven for dependency management. Provide a complete Java project structure including pom.xml.
    """
    
    java_home: Optional[str] = None
    maven_home: Optional[str] = None
    timeout: int = 60  # seconds
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Validation logic as before
    
    def _run(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """Run Maven-based Java project."""
        project_files = input_data.get("files", {})
        main_class = input_data.get("main_class")
        
        if not main_class or not project_files.get("pom.xml"):
            return {"error": "Missing main_class or pom.xml"}
        
        with tempfile.TemporaryDirectory() as tmpdir:
            # Create project structure
            for file_path, content in project_files.items():
                full_path = os.path.join(tmpdir, file_path)
                os.makedirs(os.path.dirname(full_path), exist_ok=True)
                with open(full_path, "w") as f:
                    f.write(content)
            
            # Build with Maven
            mvn_path = os.path.join(self.maven_home, "bin", "mvn")
            build_result = subprocess.run(
                [mvn_path, "compile"],
                cwd=tmpdir,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )
            
            if build_result.returncode != 0:
                return {
                    "success": False,
                    "stage": "build",
                    "error": build_result.stderr
                }
            
            # Run with Maven exec plugin
            run_result = subprocess.run(
                [mvn_path, "exec:java", f"-Dexec.mainClass={main_class}"],
                cwd=tmpdir,
                capture_output=True,
                text=True,
                timeout=self.timeout
            )
            
            return {
                "success": run_result.returncode == 0,
                "stdout": run_result.stdout,
                "stderr": run_result.stderr,
                "exit_code": run_result.returncode
            }

3. Docker-Based Execution (Safer Option)

For enhanced security and isolation:

class DockerJavaExecutor(BaseTool):
    name: str = "docker_java_executor"
    description: str = "Executes Java code inside a Docker container for enhanced security."
    
    docker_image: str = "openjdk:17-slim"
    timeout: int = 30  # seconds
    memory_limit: str = "512m"
    
    def _run(self, code: str) -> Dict[str, Any]:
        """Run Java code in Docker and return the result."""
        unique_id = str(uuid.uuid4())[:8]
        
        # Extract class name from code
        class_match = re.search(r"public\s+class\s+(\w+)", code)
        if not class_match:
            return {"error": "No public class found in code"}
        
        class_name = class_match.group(1)
        file_name = f"{class_name}.java"
        
        with tempfile.TemporaryDirectory() as tmpdir:
            # Save code to file
            java_file_path = os.path.join(tmpdir, file_name)
            with open(java_file_path, "w") as f:
                f.write(code)
            
            # Run Docker container
            docker_cmd = [
                "docker", "run",
                "--rm",
                f"--memory={self.memory_limit}",
                f"--cpus=1",
                "--network=none",  # No network access
                "-v", f"{tmpdir}:/code",
                "-w", "/code",
                self.docker_image,
                "sh", "-c", f"javac {file_name} && java {class_name}"
            ]
            
            try:
                result = subprocess.run(
                    docker_cmd,
                    capture_output=True,
                    text=True,
                    timeout=self.timeout
                )
                
                return {
                    "success": result.returncode == 0,
                    "stdout": result.stdout,
                    "stderr": result.stderr,
                    "exit_code": result.returncode,
                    "execution_id": unique_id
                }
            except subprocess.TimeoutExpired:
                return {
                    "success": False,
                    "error": f"Execution timed out after {self.timeout} seconds",
                    "execution_id": unique_id
                }

Integration with LangGraph

from typing import TypedDict, List, Dict, Any
from langgraph.graph import StateGraph

class JavaAgentState(TypedDict):
    messages: List[Dict[str, str]]
    java_code: str
    execution_results: Dict[str, Any]

def generate_java_code(state: JavaAgentState) -> JavaAgentState:
    """Node that generates Java code based on the conversation."""
    llm = ChatOpenAI(model="gpt-4")
    messages = state["messages"]
    
    response = llm.invoke([
        SystemMessage(content="Generate valid Java code to solve the user's problem. Include a main method."),
        *[HumanMessage(content=m["content"]) if m["role"] == "user" else 
          AIMessage(content=m["content"]) for m in messages]
    ])
    
    # Extract code from response
    code_pattern = r"```java\s*([\s\S]*?)\s*```"
    code_match = re.search(code_pattern, response.content)
    java_code = code_match.group(1) if code_match else response.content
    
    return {
        **state,
        "java_code": java_code
    }

def execute_java_code(state: JavaAgentState) -> JavaAgentState:
    """Node that executes the generated Java code."""
    java_tool = JavaCodeExecutionTool(java_home="/usr/lib/jvm/java-17-openjdk")
    code = state["java_code"]
    
    execution_results = java_tool.invoke(code)
    
    return {
        **state,
        "execution_results": execution_results
    }

def format_response(state: JavaAgentState) -> JavaAgentState:
    """Node that formats the final response based on execution results."""
    results = state["execution_results"]
    code = state["java_code"]
    
    if results["success"]:
        response = f"""
I've executed the Java code successfully. Here's the result:

```java
{code}

Output:

{results["stdout"]}

""" else: response = f""" The Java code execution encountered an error:

{code}

Error:

{results["stderr"]}

Let me fix that and try again. """

messages = state["messages"] + [{"role": "assistant", "content": response}]

return {
    **state,
    "messages": messages
}

Create the graph

def create_java_execution_graph(): workflow = StateGraph(JavaAgentState)

# Add nodes
workflow.add_node("generate_code", generate_java_code)
workflow.add_node("execute_code", execute_java_code)
workflow.add_node("format_response", format_response)

# Add edges
workflow.add_edge("generate_code", "execute_code")
workflow.add_edge("execute_code", "format_response")
workflow.set_entry_point("generate_code")
workflow.set_finish_point("format_response")

# Compile graph
app = workflow.compile()

return app

Example usage

java_graph = create_java_execution_graph() result = java_graph.invoke({ "messages": [{"role": "user", "content": "Write a Java program that calculates the first 10 Fibonacci numbers"}], "java_code": "", "execution_results": {} })


## Security Considerations

When implementing Java execution in LangGraph:

1. **Sandboxing**: Always execute untrusted code in a sandbox environment
   - Docker containers with resource limits
   - Java Security Manager policies
   - No network access for execution containers

2. **Resource Limits**:
   - Set memory limits
   - Set execution timeouts
   - Limit CPU usage
   - Restrict disk I/O

3. **Input Validation**:
   - Validate that code doesn't include security-sensitive imports
   - Scan for potential exploits before execution
   - Prevent access to system properties and environment variables

4. **Output Handling**:
   - Sanitize outputs before returning to LLM
   - Limit output size to prevent memory issues
   - Handle execution failures gracefully

## LangGraph Integration Patterns

```mermaid
graph TD
    A[User Request] --> B[LLM Code Generator]
    B --> C[Code Validator]
    C -->|Valid| D[Java Execution Tool]
    C -->|Invalid| B
    D --> E[Result Parser]
    E -->|Success| F[Response Formatter]
    E -->|Failure| G[Error Handler]
    G --> B
    F --> H[User Response]

Pattern 1: Code Generation → Execution → Response

Simple sequential flow where the LLM generates code, executes it, and formats a response.

Pattern 2: Iterative Refinement

The LLM generates code, executes it, and if execution fails, uses the error message to refine the code and try again.

def error_handler(state: JavaAgentState) -> Dict[str, str]:
    """Determine the next node based on execution results."""
    if state["execution_results"]["success"]:
        return "format_response"
    else:
        return "refine_code"

workflow.add_node("refine_code", refine_java_code)
workflow.add_conditional_edges(
    "execute_code",
    error_handler,
    {
        "format_response": "format_response",
        "refine_code": "refine_code"
    }
)
workflow.add_edge("refine_code", "execute_code")

Pattern 3: Test-Driven Development

The LLM first generates test cases, then generates code that passes those tests.

workflow.add_node("generate_tests", generate_java_tests)
workflow.add_node("generate_implementation", generate_java_implementation)
workflow.add_node("execute_tests", execute_java_tests)

workflow.add_edge("generate_tests", "generate_implementation")
workflow.add_edge("generate_implementation", "execute_tests")

Advanced Features

1. External Library Support

class GradleJavaExecutor(BaseTool):
    # Similar to MavenJavaExecutor but using Gradle
    # Useful for Android development or Spring Boot applications

2. Multi-File Project Support

Support for projects with multiple Java files:

def _run(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
    """Run a multi-file Java project."""
    java_files = input_data.get("java_files", {})  # Dict of {filename: content}
    main_class = input_data.get("main_class")
    
    if not main_class or not java_files:
        return {"error": "Missing main_class or java_files"}
    
    with tempfile.TemporaryDirectory() as tmpdir:
        # Create all Java files
        for file_name, content in java_files.items():
            file_path = os.path.join(tmpdir, file_name)
            os.makedirs(os.path.dirname(file_path), exist_ok=True)
            with open(file_path, "w") as f:
                f.write(content)
        
        # Compile all Java files
        java_files_list = list(java_files.keys())
        javac_path = os.path.join(self.java_home, "bin", "javac")
        compile_result = subprocess.run(
            [javac_path] + java_files_list,
            cwd=tmpdir,
            capture_output=True,
            text=True
        )
        
        # Rest of the execution code...

3. Interactive Mode

Support for interactive Java programs:

def _run(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
    """Run Java code with interactive input."""
    code = input_data.get("code")
    inputs = input_data.get("inputs", [])  # List of strings to feed as stdin
    
    # Same setup as before...
    
    # Run with inputs
    process = subprocess.Popen(
        [java_path, class_name],
        cwd=tmpdir,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        text=True
    )
    
    stdout, stderr = process.communicate(input="\n".join(inputs), timeout=self.timeout)
    
    return {
        "success": process.returncode == 0,
        "stdout": stdout,
        "stderr": stderr,
        "exit_code": process.returncode
    }

Example Use Cases

Algorithm Implementation and Testing:
- Generate and test sorting algorithms
- Implement data structures with performance benchmarks
- Solve coding challenge problems
Java-Specific Tasks:
- Generate and run Java Spring Boot endpoints
- Create and test Android components
- Work with Java-specific libraries (JDBC, JPA, etc.)
Educational Tools:
- Interactive Java programming tutorials
- Automated code review and feedback
- Gradual learning systems that build on concepts

Conclusion

Integrating Java execution capabilities into LangGraph workflows opens up powerful possibilities for LLM agents to work with Java codebases, enterprise systems, and JVM-based technologies. By implementing the appropriate security measures and integration patterns, these tools can safely extend LangGraph's capabilities to include Java in its ecosystem of supported languages.

decagondev/java-runner-tool.md