Measuring Prompt Engineering Accuracy ===================================== Prompt engineering is a technique for achieving optimal degrees in accuracy from annotation and classification tasks on datasets using large language models (LLM), such as ChatGPT and OpenAI. ## About This project demonstrates an example of using different prompt styles in order to classify text as having a specific sentiment towards anger. Notice how depending on the formatting of the prompt supplied to the LLM, the resulting F-score and accuracy are affected. The best performing prompt is achieved by specifying precicsly the format of the input data and the expected output response. The more detail that is provided clearly and concisely to the LLM, the more optimal the results from the LLM. ## Statistics Accuracy statistics are calculated using F-score and accuracy to measure effectiveness of prompts for classification. ## Results ### 1. Is the sentence about anger? Respond only with the word "yes" or "no". #### 73% ``` predict('Is the sentence about anger? Respond only with the word "yes" or "no".') [33, 1] [14, 7] (0.48275862068965514, 0.7272727272727273) ``` ### 2. Determine if the following sentence is angry. Respond only with the word "yes" or "no". #### 87% ``` predict('Determine if the following sentence is angry. Respond only with the word "yes" or "no".') [34, 0] [7, 14] 0.8 0.8727272727272727 ``` ### 3. Determine if the following sentence enclosed with backticks contains a sentiment of anger. Respond only with the word "yes" or "no". #### 93% ``` predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger. Respond only with the word "yes" or "no".') [34, 0] [4, 17] (0.8947368421052632, 0.9272727272727272) ``` ### 4. Determine if the following sentence enclosed with backticks contains a sentiment of anger, disgust, or negativity. Respond only with the word "yes" or "no". #### 93% ``` predict('Determine if the following sentence enclosed with backticks contains a sentiment of anger, disgust, or negativity. Respond only with the word "yes" or "no".') print(result['f_score'], result['accuracy']) [33, 1] [3, 18] 0.9 0.9272727272727272 ``` ## References - A subset of the sample data is sourced from the [Sentiment Labelled Sentences](https://www.kaggle.com/datasets/marklvl/sentiment-labelled-sentences-data-set) dataset on Kaggle. - [Deep Learning Prompt Engineering](https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/1/introduction)