Skip to content

Instantly share code, notes, and snippets.

@brianray
Last active August 7, 2025 20:00
Show Gist options
  • Save brianray/56f1dc33413274ddb4e4a148d6706b81 to your computer and use it in GitHub Desktop.
Save brianray/56f1dc33413274ddb4e4a148d6706b81 to your computer and use it in GitHub Desktop.

Scenarios

  1. Jailbreaking for Data Leaked from Hacker Actor: the hacker gets data out of the model by pure prompt manipulation,
  2. Intrusion from Hacker Actor: The malicious instructions pass through the application and can instruct the extension services to do bad things.
  3. Data poisoning for Training data: The source data used to train the LLM has malicious content before it is trained.
  4. Prompt Poising: hidden content or injected content unintentional from an innocent bystander.
   graph RL

  A1[Active Hacker Actor] -. "1 Malicious prompt" .-> C1[LLM]
  subgraph sub1 ["1. Jailbreak: data leaked from hacker actor"]
    C1
  end
  C1 -. "2 Unauthorized data" .-> A1

  A2[Active Hacker Actor] -. "1 Malicious input" .-> C2[LLM]
  subgraph sub2 ["2. Intrusion from hacker actor"]
    C2 -. "2 Malicious instructions" .-> D2[Extension Services]
  end

  A3[Passive Hacker Actor] -. "1 Insert bad data" .-> E3[Data Store]
  D3[User Actor] -. "3 Good prompt" .-> C3[LLM]
  E3 -. "2 Training" .-> C3
  C3 -. "4 Bad response" .-> D3
  subgraph sub3 ["3. Data poisoning for training data"]
    E3
    C3
  end

  D4[User Actor] -. "1 Cut" .-> E4[Data or Source Repository]
  E4 -. "2 Paste" .-> D4
  D4 -. "3 Prompt" .-> C4[LLM]
  C4 -. "4 Bad content" .-> D4
  subgraph sub4 ["4. Prompt poisoning"]
    C4
  end
    
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment