This repository contains an evaluation framework for testing AI assistant personas on human-friendly behavior using the AISI Inspect framework. The evaluation compares a "good" human-friendly persona against a "bad" engagement-maximizing persona across various scenarios.
- Python 3.8 or higher
- VSCode (recommended), or derivitave like Cursor, Windsurf
- API keys for target models
E.g.
export OPENAI_API_KEY=your-openai-api-keySee more providers and the packages and environment variables they require here: https://inspect.aisi.org.uk/providers.html
python3 -m venv .venv
source .venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtInstall the following VSCode extensions:
- Inspect AI - For running and debugging Inspect evaluations
- Data Wrangler - For viewing and editing the dataset (optional, you could open it in a spreadsheet editor)
To run both the good and bad persona evaluations:
rm -rf logs/* && inspect eval-set good_persona_task.py bad_persona_task.py --model openai/gpt-4oYou can specify different models using the format from the Inspect AI providers documentation. Examples:
openai/gpt-4oanthropic/claude-3-5-sonnet-20241022google/gemini-2.5-pro
The evaluation uses simple_human_friendliness_dataset.csv, which contains test scenarios across various categories of Humane Technology, such as:
- Loneliness & Social Connection
- Screen Time & Dependency
- Mental Health & Crisis
- Transparency & AI Understanding
- Privacy & Data Rights
- Ethical Alignment
To modify the evaluation scenarios, edit simple_human_friendliness_dataset.csv. The dataset includes:
input: The user's question or scenariotarget: The expected human-friendly responsecategory: The type of scenario being testedseverity: The importance level (low, medium, high, critical)principle_to_evaluate: The core principle being assessed
Of these fields, input and target are the primary, required ones. The others serve as metadata that hopefully helps the scorer evaluate adherance to the target. They may or may not be so useful, TBD.
Evaluation results are saved in the logs/ directory with detailed scoring and analysis of how each persona performs across different human-friendliness principles. Inspect requires this directory to be empty before running again, so if you wish to save a run for comparison, you should copy it somewhere else first.