Scenarios
- Jailbreaking for Data Leaked from Hacker Actor: the hacker gets data out of the model by pure prompt manipulation,
- Intrusion from Hacker Actor: The malicious instructions pass through the application and can instruct the extension services to do bad things.
- Data poisoning for Training data: The source data used to train the LLM has malicious content before it is trained.
- Prompt Poising: hidden content or injected content unintentional from an innocent bystander.
graph RL
A1(Active Hacker Actor) -.1 melicious prompt.-> C1[LLM]
subgraph sub1["1. Jailbreaking for Data Leaked from Hacker Actor"]
C1[LLM]
end
C1[LLM] -.2 unauthrized data.-> A1(Hacker Actor)
A2(Active Hacker Actor) -.1 melicious.-> C2[LLM]
subgraph sub2["2. Intrusion from Hacker Actor"]
C2[LLM] -.2 melicious instructions.-> D2[Extension Services]
end
A3(Passive Hacker Actor) -.1 insert bad data.-> E3[Data Store]
D3(User Actor) -.3 good prompt.-> C3[LLM]
E3[Data Store] -.2 training.-> C3[LLM]
C3[LLM] -.4 bad response.-> D3(User Actor)
subgraph sub3["3. Data poisoning for Training data"]
E3[Data Store]
C3[LLM]
end
D4(User Actor) -.1 cut .-> E4[data or source repository]
D4(User Actor) -.3 prompt .-> C4[LLM]
C4[LLM] -.4 bad content .-> D4(User Actor)
E4[data or source repository] -.2 paste .-> D4(User Actor)
subgraph sub4["4. Prompt Poising"]
C4[LLM]
end