abhaybhargav · March 10, 2025 05:11
diff --git a/Issues with AI enabled apps b/Issues with AI enabled apps
 We're worried about our crazy AI adoption!


 Is something I've heard a lot of CISOs and ProdSec teams really worry about. And rightfully so. I see engineering teams start to plug-in LLMs into everything without thinking about security or privacy. This risk is heightened with Agents, because now, LLMs can literally call (sometimes extremely powerful) functions that can execute actions on your internal systems, APIs and more. And the AI landscape is massive and getting bigger every day. 

 This scares a lot of people and seems overwhelming. But let's break things down into smaller problems to make things easier to handle. 

 In my experience, 80%+ companies out there are building two types of apps: 

 * RAG (Retrieval Augmented Generation) apps  where the org's internal datasets are loaded into vector databases and LLMs use that as context to generate responses for chatbots, internal applications and more

 * Agents - Hooking up LLMs to functions so they can not only generate, but actually DO things like approve a user request, create an invoice, etc etc. 

 The most common types of issues that I see with these apps are as follows: 

 1. Access Control - Probably the biggest issue with both RAG and Agents is the lack of access control. This stems from the fact that Access Control is not considered in the design of these features. Lack of Access Control leads to RAG being leveraged for users gaining access to unauthorized datasets. For excessive agency issues and more

 2. Lack of Probablistic Input Validation - LLMs are probablistic, so input validation has to be able to deal with that. you can't do typical parameterized input validation. You need to consider using specialized BERTs or dual-LLM input validation and/or output validation to prevent against cheeky inputs.

 3. Sensitive Information Disclosure - This is similar to Access Control but a little different. Here, we're realing with issues where sensitive user data like PII, PHI, etc are exposed due to lack of masking or sanitization. This needs to be designed in the data ingestion pipelines and considered during output rendering

 I think getting a handle on these issues is a HUGE first step in getting asymmetric returns on your GenAI and LLM Security program
	We're worried about our crazy AI adoption!


	Is something I've heard a lot of CISOs and ProdSec teams really worry about. And rightfully so. I see engineering teams start to plug-in LLMs into everything without thinking about security or privacy. This risk is heightened with Agents, because now, LLMs can literally call (sometimes extremely powerful) functions that can execute actions on your internal systems, APIs and more. And the AI landscape is massive and getting bigger every day.

	This scares a lot of people and seems overwhelming. But let's break things down into smaller problems to make things easier to handle.

	In my experience, 80%+ companies out there are building two types of apps:

	* RAG (Retrieval Augmented Generation) apps where the org's internal datasets are loaded into vector databases and LLMs use that as context to generate responses for chatbots, internal applications and more

	* Agents - Hooking up LLMs to functions so they can not only generate, but actually DO things like approve a user request, create an invoice, etc etc.

	The most common types of issues that I see with these apps are as follows:

	1. Access Control - Probably the biggest issue with both RAG and Agents is the lack of access control. This stems from the fact that Access Control is not considered in the design of these features. Lack of Access Control leads to RAG being leveraged for users gaining access to unauthorized datasets. For excessive agency issues and more

	2. Lack of Probablistic Input Validation - LLMs are probablistic, so input validation has to be able to deal with that. you can't do typical parameterized input validation. You need to consider using specialized BERTs or dual-LLM input validation and/or output validation to prevent against cheeky inputs.

	3. Sensitive Information Disclosure - This is similar to Access Control but a little different. Here, we're realing with issues where sensitive user data like PII, PHI, etc are exposed due to lack of masking or sanitization. This needs to be designed in the data ingestion pipelines and considered during output rendering

	I think getting a handle on these issues is a HUGE first step in getting asymmetric returns on your GenAI and LLM Security program