sequenceDiagram
autonumber
participant Project
participant Joern
participant Agent1 as Report Generator
participant Agent2 as User Input Finder
participant Agent3 as Validation Finder
participant Agent4 as Input Synthesizer
participant Agent5 as Validation Synthesizer
Project->>Joern: Identify 'exec' sink points
note right of Joern: Analyzes project for exec() calls
loop Each Function Related to 'exec'
Joern->>Agent2: Send function code, call trace, project context
note over Agent2: Identifies user input in each function
Joern->>Agent3: Send function code, call trace, project context
note over Agent3: Identifies validation/sanitization in each function
end
Agent2->>Agent4: User input findings for list of functions
note right of Agent4: Aggregates and synthesizes user input data
Agent3->>Agent5: Validation findings for list of function
note right of Agent5: Aggregates and synthesizes validation data
Agent4->>Agent1: Synthesize relevant user input findings
Agent5->>Agent1: Synthesize relevant validation findings
Agent1->>Project: Generate comprehensive vulnerability report
note right of Agent1: Compiles findings into detailed report
- Select Projects: Choose five projects with exec function calls.
- Function Context Analysis: For each project, identify the context of exec functions, either by examining all functions in a specific file or through call graph connections.
- Manually Execute Pipeline: a. User Input Analysis: Individually input each function into an agent to identify user input. b. Validation/Sanitization Analysis: Individually input each function into an agent to detect validation and sanitization practices. c. Aggregate and Synthesize: Combine and analyze findings from (a) and (b) to understand which functions use validation, sanitization, and where user input occurs. d. Report Generation: Create a report based on the aggregated data, focusing on functions with exec calls.
- Manual Evaluation: Assess areas where the model struggled or failed to identify issues.
- Iterate Experiment: Develop a new iteration of the experiment based on the evaluation findings.
- CWE-78: Experiment targets this specific Common Weakness Enumeration.
- Base: Adopts Slava's checklist for identifying vulnerabilities.
- Focus: Analyzes vulnerabilities using real-world examples and NVD expert analyses.
- Signature Finding: Locate
runtime.getRuntime.exec()
in Java, searchable via AST. - Non-Constant Argument Identification: Ensure arguments of
exec
calls depend on user input, analyzing data dependencies. - Special Symbol Checks: Look for characters like
[: ; & | " ' ]
that could lead to arbitrary command execution, ensuring code checks, filters, or escapes them.
- Commit Level Information Gathering: Analyze vulnerability fix data.
- Candidate Search: Identify potential vulnerabilities using gathered data.
- Vulnerability Confirmation and Localization: Confirm and localize vulnerabilities in the codebase.
- Use agents to localize vulnerabilities, focusing on multi-function analysis and trigger points.
- Input to LLM: Functions with
Exec
are fed to LLM, including agent list. - Agent 2: Analyzes each function for user input (takes function code, prompt, additional context).
- Agent 3: Searches each function for validation/sanitization. (takes as above)
- Agent 4: Aggregates and synthesizes data from Agent 2. (takes list of responses from 2 agent prompt)
- Agent 5: Aggregates and synthesizes data from Agent 3. (takes list of responses from 3 agent prompt)
- Report Generation: Agent 1 compiles a report from synthesized data. (takes response from 4, 5 agents)
- Apply taint analysis method, examining function call graphs for:
Fs
: Function with a sink.Fu
: Function with user input.Fval_sanit
: Function with validation and sanitization.
- Main Model 1 compiles a final report on the vulnerability, including assumptions about severity, possible exploitation methods, and remediation recommendations.
Purpose: Track the flow from user input (Fu) to the sink (Fs) while checking for validation/sanitization (Fval_sanit) to determine vulnerability presence and severity.
- Provide context, call graphs, and stack traces to models for a comprehensive understanding of each function's role and context.
- Assess LLM limitations and analysis needs.
- RQ: Is the strategy sufficient for expert-level vulnerability analysis?
- LLMs' limitations in specific tasks.
- Lack of sufficient project context.
- Reliance on external analysis tools.
For each task :
how we can measure the effectiveness?