gist:56f1dc33413274ddb4e4a148d6706b81

Scenarios

Jailbreaking for Data Leaked from Hacker Actor: the hacker gets data out of the model by pure prompt manipulation,
Intrusion from Hacker Actor: The malicious instructions pass through the application and can instruct the extension services to do bad things.
Data poisoning for Training data: The source data used to train the LLM has malicious content before it is trained.
Prompt Poising: hidden content or injected content unintentional from an innocent bystander.

   graph RL

  A1[Active Hacker Actor] -. "1 Malicious prompt" .-> C1[LLM]
  subgraph sub1 ["1. Jailbreak: data leaked from hacker actor"]
    C1
  end
  C1 -. "2 Unauthorized data" .-> A1

  A2[Active Hacker Actor] -. "1 Malicious input" .-> C2[LLM]
  subgraph sub2 ["2. Intrusion from hacker actor"]
    C2 -. "2 Malicious instructions" .-> D2[Extension Services]
  end

  A3[Passive Hacker Actor] -. "1 Insert bad data" .-> E3[Data Store]
  D3[User Actor] -. "3 Good prompt" .-> C3[LLM]
  E3 -. "2 Training" .-> C3
  C3 -. "4 Bad response" .-> D3
  subgraph sub3 ["3. Data poisoning for training data"]
    E3
    C3
  end

  D4[User Actor] -. "1 Cut" .-> E4[Data or Source Repository]
  E4 -. "2 Paste" .-> D4
  D4 -. "3 Prompt" .-> C4[LLM]
  C4 -. "4 Bad content" .-> D4
  subgraph sub4 ["4. Prompt poisoning"]
    C4
  end

brianray/gist:56f1dc33413274ddb4e4a148d6706b81