As we build new tools for LLMs (Learning Language Models) to access functions like writing and executing code and querying databases, we have encountered potential exploits that have historically affected Python's eval()
function and executing unsafe sql queries. Oftentimes, adding negative examples or more detailed prompts within these tools are not enough to prevent harmful input from being executed.
To address this issue, we need to provide the LLM with additional tools and logic to detect and warn developers or the system of any potential exploits. This can be done by short-circuiting the logic, logging the occurrence, or even throwing an error. By doing so, we can ensure the security and stability of the LLM and its related tools.