Scenarios
- Jailbreaking for Data Leaked from Hacker Actor: the hacker gets data out of the model by pure prompt manipulation,
- Intrusion from Hacker Actor: The malicious instructions pass through the application and can instruct the extension services to do bad things.
- Data poisoning for Training data: The source data used to train the LLM has malicious content before it is trained.
- Prompt Poising: hidden content or injected content unintentional from an innocent bystander.
graph RL
A1[Active Hacker Actor] -. "1 Malicious prompt" .-> C1[LLM]
subgraph sub1 ["1. Jailbreak: data leaked from hacker actor"]
C1
end
C1 -. "2 Unauthorized data" .-> A1
A2[Active Hacker Actor] -. "1 Malicious input" .-> C2[LLM]
subgraph sub2 ["2. Intrusion from hacker actor"]
C2 -. "2 Malicious instructions" .-> D2[Extension Services]
end
A3[Passive Hacker Actor] -. "1 Insert bad data" .-> E3[Data Store]
D3[User Actor] -. "3 Good prompt" .-> C3[LLM]
E3 -. "2 Training" .-> C3
C3 -. "4 Bad response" .-> D3
subgraph sub3 ["3. Data poisoning for training data"]
E3
C3
end
D4[User Actor] -. "1 Cut" .-> E4[Data or Source Repository]
E4 -. "2 Paste" .-> D4
D4 -. "3 Prompt" .-> C4[LLM]
C4 -. "4 Bad content" .-> D4
subgraph sub4 ["4. Prompt poisoning"]
C4
end