Skip to content

Instantly share code, notes, and snippets.

@masci
Created October 12, 2024 07:42
Show Gist options
  • Save masci/3de92dbe61ba1e78b3f7ccde964dd339 to your computer and use it in GitHub Desktop.
Save masci/3de92dbe61ba1e78b3f7ccde964dd339 to your computer and use it in GitHub Desktop.
'''Prompt engineering''' is the process of structuring an instruction that can be interpreted and understood by a [[generative AI]] model.<ref name="diab">{{Cite web|url=https://cdn.openart.ai/assets/Stable%20Diffusion%20Prompt%20Book%20From%20OpenArt%2011-13.pdf |title=Stable Diffusion Prompt Book |last1=Diab |first1=Mohamad |last2= Herrera |first2=Julian |last3=Chernow |first3=Bob |date=2022-10-28 |access-date=2023-08-07 |quote="Prompt engineering is the process of structuring words that can be interpreted and understood by a ''text-to-image'' model. Think of it as the language you need to speak in order to tell an AI model what to draw."}}</ref><ref>{{Cite web |author=Ziegler |first1=Albert |last2=Berryman |first2=John |date=17 July 2023 |title=A developer's guide to prompt engineering and LLMs |url=https://github.blog/2023-07-17-prompt-engineering-guide-generative-ai-llms/ |website=The GitHub Blog |quote="Prompt engineering is the art of communicating with a generative AI model."}}</ref>
A prompt is [[natural language]] text describing the task that an AI should perform:<ref name="language-models-are-multitask">{{cite web |last1=Radford |first1=Alec |last2=Wu |first2=Jeffrey |last3=Child |first3=Rewon |last4=Luan |first4=David |last5=Amodei |first5=Dario |last6=Sutskever |first6=Ilya |year=2019 |title=Language Models are Unsupervised Multitask Learners |url=https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf |publisher=OpenAI |quote="We demonstrate language models can perform down-stream tasks in a zero-shot setting – without any parameter or architecture modification"}}</ref> a prompt for a text-to-text [[Large language model|language model]] can be a query such as "what is [[Fermat's little theorem]]?",<ref>{{cite web |last= |date=2022-11-30 |title=Introducing ChatGPT |url=https://openai.com/blog/chatgpt |access-date=2023-08-16 |website=OpenAI Blog |quote="what is the fermat's little theorem"}}</ref> a command such as "write a poem about leaves falling",<ref name="zapier20230803">{{Cite web|url=https://zapier.com/blog/gpt-prompt/|title=How to write an effective GPT-3 or GPT-4 prompt|last=Robinson|first=Reid|date=August 3, 2023|website=Zapier|access-date=2023-08-14|quote="Basic prompt: 'Write a poem about leaves falling.' Better prompt: 'Write a poem in the style of Edgar Allan Poe about leaves falling.'}}</ref> or a longer statement including context, instructions,<ref>{{cite web|last=Gouws-Stewart|first=Natasha|title=The ultimate guide to prompt engineering your GPT-3.5-Turbo model|url=https://masterofcode.com/blog/the-ultimate-guide-to-gpt-prompt-engineering|website=masterofcode.com|date=June 16, 2023}}</ref> and conversation history.
Prompt engineering may involve phrasing a query, specifying a style,<ref name="zapier20230803"/> providing relevant context<ref>{{cite web |last1=Greenberg, J. |first1=Laura |title=How to Prime and Prompt ChatGPT for More Reliable Contract Drafting Support |url=https://contractnerds.com/how-to-prime-and-prompt-chatgpt-for-more-reliable-contract-drafting-support |website=contractnerds.com |date=31 May 2023 |access-date=24 July 2023}}</ref> or assigning a role to the AI such as "Act as a native French speaker".<ref>{{Cite web|url=https://platform.openai.com/docs/guides/gpt-best-practices|title=GPT Best Practices|publisher=OpenAI|access-date=2023-08-16}}</ref>
A prompt may include a few examples for a model to learn from, such as asking the model to complete "maison {{arrow}} house, chat {{arrow}} cat, chien {{arrow}}" (the expected response being ''dog''),<ref>{{cite arXiv | eprint=2208.01066 | last1=Garg | first1=Shivam | last2=Tsipras | first2=Dimitris | last3=Liang | first3=Percy | last4=Valiant | first4=Gregory | title=What Can Transformers Learn In-Context? A Case Study of Simple Function Classes | year=2022 | class=cs.CL }}</ref> an approach called '''few-shot learning'''.<ref>{{Cite journal |last1=Brown |first1=Tom |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared D. |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |year=2020 |title=Language models are few-shot learners |journal=Advances in Neural Information Processing Systems |volume=33 |pages=1877–1901 |arxiv=2005.14165}}</ref>
When communicating with a [[text-to-image]] or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse"<ref>{{Cite web|last=Heaven|first=Will Douglas|title=This horse-riding astronaut is a milestone on AI's long road towards understanding|url=https://www.technologyreview.com/2022/04/06/1049061/dalle-openai-gpt3-ai-agi-multimodal-image-generation/|website=MIT Technology Review|access-date=2023-08-14|date=April 6, 2022}}</ref> or "Lo-fi slow BPM electro chill with organic samples".<ref>{{cite web|last=Wiggers|first=Kyle|title=Meta open sources an AI-powered music generator|url=https://techcrunch.com/2023/06/12/meta-open-sources-an-ai-powered-music-generator/|publisher=TechCrunch|date=2023-06-12|access-date=2023-08-15|quote=Next, I gave a more complicated prompt to attempt to throw MusicGen for a loop: "Lo-fi slow BPM electro chill with organic samples."}}</ref> Prompting a text-to-image model may involve adding, removing, emphasizing and re-ordering words to achieve a desired subject, style,<ref name="diab"/> layout, lighting,<ref>{{Cite web|url=https://claid.ai/blog/article/prompt-guide/|title=How to Write AI Photoshoot Prompts: A Guide for Better Product Photos|website=claid.ai|access-date=June 12, 2023|date=June 12, 2023}}</ref> and aesthetic.
==In-context learning==
Prompt engineering is enabled by '''in-context learning''', defined as a model's ability to temporarily learn from prompts. The ability for in-context learning is an [[large language model emergent abilities|emergent ability]]<ref name="2022_EmergentAbilities">{{cite arXiv |last1=Wei |first1=Jason |last2=Tay |first2=Yi |last3=Bommasani |first3=Rishi |last4=Raffel |first4=Colin |last5=Zoph |first5=Barret |last6=Borgeaud |first6=Sebastian |last7=Yogatama |first7=Dani |last8=Bosma |first8=Maarten |last9=Zhou |first9=Denny |last10=Metzler |first10=Donald |last11=Chi |first11=Ed H. |last12=Hashimoto |first12=Tatsunori |last13=Vinyals |first13=Oriol |last14=Liang |first14=Percy |last15=Dean |first15=Jeff |last16=Fedus |first16=William |title=Emergent Abilities of Large Language Models |date=31 August 2022 |class=cs.CL |eprint=2206.07682| quote="In prompting, a pre-trained language model is given a prompt (e.g. a natural language instruction) of a task and completes the response without any further training or gradient updates to its parameters... The ability to perform a task via few-shot prompting is emergent when a model has random performance until a certain scale, after which performance increases to well-above random"}}</ref> of [[large language model]]s. In-context learning itself is an [[large language model emergent abilities|emergent property of model scale]], meaning [[Broken Neural Scaling Law|breaks]]<ref>Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). [[arxiv:2210.14891|"Broken Neural Scaling Laws"]]. International Conference on Learning Representations (ICLR), 2023.</ref> in downstream scaling laws occur such that its efficacy increases at a different rate in larger models than in smaller models.<ref>{{cite arXiv |last1=Wei |first1=Jason |last2=Tay |first2=Yi |last3=Bommasani |first3=Rishi |last4=Raffel |first4=Colin |last5=Zoph |first5=Barret |last6=Borgeaud |first6=Sebastian |last7=Yogatama |first7=Dani |last8=Bosma |first8=Maarten |last9=Zhou |first9=Denny |last10=Metzler |first10=Donald |last11=Chi |first11=Ed H. |last12=Hashimoto |first12=Tatsunori |last13=Vinyals |first13=Oriol |last14=Liang |first14=Percy |last15=Dean |first15=Jeff |last16=Fedus |first16=William |title=Emergent Abilities of Large Language Models |date=31 August 2022 |class=cs.CL |eprint=2206.07682 }}</ref><ref name="weipaper"/>
In contrast to training and [[Fine-tuning (deep learning)|fine-tuning]] for each specific task, which are not temporary, what has been learnt during in-context learning is of a temporary nature. It does not carry the temporary contexts or biases, except the ones already present in the (pre)training [[Data set|dataset]], from one conversation to the other.<ref>{{cite web |last1=Musser |first1=George |title=How AI Knows Things No One Told It |website=[[Scientific American]] |url=https://www.scientificamerican.com/article/how-ai-knows-things-no-one-told-it/ |access-date=17 May 2023|quote="By the time you type a query into ChatGPT, the network should be fixed; unlike humans, it should not continue to learn. So it came as a surprise that LLMs do, in fact, learn from their users' prompts—an ability known as in-context learning."}}</ref> This result of "mesa-optimization"<ref>{{cite arXiv | eprint=2212.07677 | author1=Johannes von Oswald | last2=Niklasson | first2=Eyvind | last3=Randazzo | first3=Ettore | last4=Sacramento | first4=João | last5=Mordvintsev | first5=Alexander | last6=Zhmoginov | first6=Andrey | last7=Vladymyrov | first7=Max | title=Transformers learn in-context by gradient descent | year=2022 | class=cs.LG |quote="Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass"}}</ref><ref>{{cite web |title=Mesa-Optimization |date=31 May 2019 |url=https://www.alignmentforum.org/tag/mesa-optimization |access-date=17 May 2023|quote="Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer."}}</ref> within [[Transformer (machine learning)|transformer]] layers, is a form of [[Meta-learning (computer science)|meta-learning]] or "learning to learn".<ref>{{cite arXiv | eprint=2208.01066 | last1=Garg | first1=Shivam | last2=Tsipras | first2=Dimitris | last3=Liang | first3=Percy | last4=Valiant | first4=Gregory | title=What Can Transformers Learn In-Context? A Case Study of Simple Function Classes | year=2022 | class=cs.CL |quote="Training a model to perform in-context learning can be viewed as an instance of the more general learning-to-learn or meta-learning paradigm"}}</ref>
== History ==
In 2018, researchers first proposed that all previously separate tasks in [[Natural language processing|NLP]] could be cast as a question answering problem over a context. In addition, they trained a first single, joint, multi-task model that would answer any task-related question like "What is the sentiment" or "Translate this sentence to German" or "Who is the president?"<ref>{{cite arXiv | eprint=1806.08730 | last1=McCann | first1=Bryan | last2=Shirish | first2=Nitish | last3=Xiong | first3=Caiming | last4=Socher | first4=Richard | title=The Natural Language Decathlon: Multitask Learning as Question Answering | year=2018 | class=cs.CL }}</ref>
In 2021, researchers fine-tuned one generatively pretrained model (T0) on performing 12 [[Natural language processing|NLP]] tasks (using 62 datasets, as each task can have multiple datasets). The model showed good performance on new tasks, surpassing models trained directly on just performing one task (without pretraining). To solve a task, T0 is given the task in a structured prompt, for example <code><nowiki>If {{premise}} is true, is it also true that {{hypothesis}}? ||| {{entailed}}.</nowiki></code> is the prompt used for making T0 solve [[Logical consequence|entailment]].<ref>{{cite arXiv | eprint=2110.08207 | last1=Sanh | first1=Victor | last2=Webson | first2=Albert | last3=Raffel | first3=Colin | last4=Bach | first4=Stephen H. | last5=Sutawika | first5=Lintang | last6=Alyafeai | first6=Zaid | last7=Chaffin | first7=Antoine | last8=Stiegler | first8=Arnaud | author9=Teven Le Scao | last10=Raja | first10=Arun | last11=Dey | first11=Manan | author12=M Saiful Bari | last13=Xu | first13=Canwen | last14=Thakker | first14=Urmish | author15=Shanya Sharma Sharma | last16=Szczechla | first16=Eliza | last17=Kim | first17=Taewoon | last18=Chhablani | first18=Gunjan | last19=Nayak | first19=Nihal | last20=Datta | first20=Debajyoti | last21=Chang | first21=Jonathan | author22=Mike Tian-Jian Jiang | last23=Wang | first23=Han | last24=Manica | first24=Matteo | last25=Shen | first25=Sheng | author26=Zheng Xin Yong | last27=Pandey | first27=Harshit | last28=Bawden | first28=Rachel | last29=Wang | first29=Thomas | last30=Neeraj | first30=Trishala | title=Multitask Prompted Training Enables Zero-Shot Task Generalization | year=2021 | class=cs.LG | display-authors=1 }}</ref>
A repository for prompts reported that over 2,000 public prompts for around 170 datasets were available in February 2022.<ref>{{cite arXiv | eprint=2202.01279 | last1=Bach | first1=Stephen H. | last2=Sanh | first2=Victor | last3=Yong | first3=Zheng-Xin | last4=Webson | first4=Albert | last5=Raffel | first5=Colin | last6=Nayak | first6=Nihal V. | last7=Sharma | first7=Abheesht | last8=Kim | first8=Taewoon | author9=M Saiful Bari | last10=Fevry | first10=Thibault | last11=Alyafeai | first11=Zaid | last12=Dey | first12=Manan | last13=Santilli | first13=Andrea | last14=Sun | first14=Zhiqing | last15=Ben-David | first15=Srulik | last16=Xu | first16=Canwen | last17=Chhablani | first17=Gunjan | last18=Wang | first18=Han | author19=Jason Alan Fries | last20=Al-shaibani | first20=Maged S. | last21=Sharma | first21=Shanya | last22=Thakker | first22=Urmish | last23=Almubarak | first23=Khalid | last24=Tang | first24=Xiangru | last25=Radev | first25=Dragomir | author26=Mike Tian-Jian Jiang | last27=Rush | first27=Alexander M. | title=PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts | year=2022 | class=cs.LG }}</ref>
In 2022 the ''chain-of-thought'' prompting technique was proposed by [[Google]] researchers.<ref name="weipaper">{{cite conference |last1=Wei |first1=Jason |last2=Wang |first2=Xuezhi |last3=Schuurmans |first3=Dale |last4=Bosma |first4=Maarten |last5=Ichter |first5=Brian |last6=Xia |first6=Fei |last7=Chi |first7=Ed H. |last8=Le |first8=Quoc V. |last9=Zhou |first9=Denny |title=Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |date=31 October 2022 |arxiv=2201.11903 |language=en |conference=Advances in Neural Information Processing Systems (NeurIPS 2022) |volume=35| url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html}}</ref><ref>{{cite web |last1=Wei |first1=Jason |last2=Zhou |title=Language Models Perform Reasoning via Chain of Thought |url=https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html |website=ai.googleblog.com |date=11 May 2022 |access-date=10 March 2023 |language=en}}</ref>
In 2023 several text-to-text and text-to-image prompt databases were publicly available.<ref>{{Cite web|url=https://www.nytimes.com/2023/06/23/technology/ai-chatbot-life-coach.html|title=How to Turn Your Chatbot Into a Life Coach|last=Chen|first=Brian X.|date=2023-06-23|access-date=|website=The New York Times}}</ref><ref>{{Cite news |last=Chen |first=Brian X. |date=2023-05-25 |title=Get the Best From ChatGPT With These Golden Prompts |language=en-US |work=The New York Times |url=https://www.nytimes.com/2023/05/25/technology/ai-chatbot-chatgpt-prompts.html |access-date=2023-08-16 |issn=0362-4331}}</ref>
== Text-to-text ==
=== Chain-of-thought ===
''Chain-of-thought'' (CoT) prompting is a technique that allows [[large language models]] (LLMs) to solve a problem as a series of intermediate steps<ref>{{cite web |last1=McAuliffe |first1=Zachary |title=Google's Latest AI Model Can Be Taught How to Solve Problems |url=https://www.cnet.com/tech/services-and-software/googles-latest-ai-model-can-be-taught-how-to-solve-problems/ |website=CNET |access-date=10 March 2023 |language=en|quote="'Chain-of-thought prompting allows us to describe multistep problems as a series of intermediate steps,' Google CEO Sundar Pichai"}}</ref> before giving a final answer. Chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a [[train of thought]].<ref>{{cite web |last1=McAuliffe |first1=Zachary |title=Google's Latest AI Model Can Be Taught How to Solve Problems |url=https://www.cnet.com/tech/services-and-software/googles-latest-ai-model-can-be-taught-how-to-solve-problems/ |website=CNET |access-date=10 March 2023 |language=en}}</ref><ref name="weipaper"/><ref>{{Cite web |url=https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html |date=2022-04-04|title= Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance |author=Sharan Narang and Aakanksha Chowdhery}}</ref> It allows large language models to overcome difficulties with some reasoning tasks that require [[logical reasoning|logical thinking]] and multiple steps to solve, such as [[arithmetic]] or [[commonsense reasoning]] questions.<ref>{{cite web |last1=Dang |first1=Ekta |title=Harnessing the power of GPT-3 in scientific research |url=https://venturebeat.com/ai/harnessing-the-power-of-gpt-3-in-scientific-research/ |website=VentureBeat |access-date=10 March 2023 |date=8 February 2023}}</ref><ref>{{cite web |last1=Montti |first1=Roger |title=Google's Chain of Thought Prompting Can Boost Today's Best Algorithms |url=https://www.searchenginejournal.com/google-chain-of-thought-prompting/450106/ |website=Search Engine Journal |access-date=10 March 2023 |language=en |date=13 May 2022}}</ref><ref>{{cite web |last1=Ray |first1=Tiernan |title=Amazon's Alexa scientists demonstrate bigger AI isn't always better |url=https://www.zdnet.com/article/amazons-alexa-scientists-demonstrate-bigger-ai-isnt-always-better/ |website=ZDNET |access-date=10 March 2023 |language=en}}</ref>
For example, given the question "Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?", a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9."<ref name="weipaper"/>
As originally proposed,<ref name="weipaper"/> each CoT prompt included a few Q&A examples. This made it a ''few-shot'' prompting technique. However, simply appending the words "Let's think step-by-step",<ref name="KojimaStepByStep">{{cite arXiv | eprint=2205.11916 | last1=Kojima | first1=Takeshi | author2=Shixiang Shane Gu | last3=Reid | first3=Machel | last4=Matsuo | first4=Yutaka | last5=Iwasawa | first5=Yusuke | title=Large Language Models are Zero-Shot Reasoners | year=2022 | class=cs.CL }}</ref> has also proven effective, which makes CoT a ''zero-shot'' prompting technique. This allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples.<ref name="venture1">{{cite web |last1=Dickson |first1=Ben |title=LLMs have not learned our language — we're trying to learn theirs |url=https://venturebeat.com/ai/llms-have-not-learned-our-language-were-trying-to-learn-theirs%EF%BF%BC/ |website=VentureBeat |access-date=10 March 2023 |date=30 August 2022}}</ref>
When applied to [[PaLM]], a 540B parameter [[language model]], CoT prompting significantly aided the model, allowing it to perform comparably with task-specific [[fine-tuning (machine learning)|fine-tuned]] models on several tasks, achieving [[state of the art]] results at the time on the GSM8K [[mathematical reasoning]] [[benchmark (computing)|benchmark]].<ref name="weipaper"/> It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better [[interpretability]].<ref>{{cite arXiv |last1=Chung |first1=Hyung Won |last2=Hou |first2=Le |last3=Longpre |first3=Shayne |last4=Zoph |first4=Barret |last5=Tay |first5=Yi |last6=Fedus |first6=William |last7=Li |first7=Yunxuan |last8=Wang |first8=Xuezhi |last9=Dehghani |first9=Mostafa |last10=Brahma |first10=Siddhartha |last11=Webson |first11=Albert |last12=Gu |first12=Shixiang Shane |last13=Dai |first13=Zhuyun |last14=Suzgun |first14=Mirac |last15=Chen |first15=Xinyun |last16=Chowdhery |first16=Aakanksha |last17=Castro-Ros |first17=Alex |last18=Pellat |first18=Marie |last19=Robinson |first19=Kevin |last20=Valter |first20=Dasha |last21=Narang |first21=Sharan |last22=Mishra |first22=Gaurav |last23=Yu |first23=Adams |last24=Zhao |first24=Vincent |last25=Huang |first25=Yanping |last26=Dai |first26=Andrew |last27=Yu |first27=Hongkun |last28=Petrov |first28=Slav |last29=Chi |first29=Ed H. |last30=Dean |first30=Jeff |last31=Devlin |first31=Jacob |last32=Roberts |first32=Adam |last33=Zhou |first33=Denny |last34=Le |first34=Quoc V. |last35=Wei |first35=Jason |title=Scaling Instruction-Finetuned Language Models |date=2022 |class=cs.LG |eprint=2210.11416}}</ref><ref>{{cite web |last1=Wei |first1=Jason |last2=Tay |first2=Yi |title=Better Language Models Without Massive Compute |url=https://ai.googleblog.com/2022/11/better-language-models-without-massive.html |website=ai.googleblog.com |date=29 November 2022 |access-date=10 March 2023 |language=en}}</ref>
Example:<ref name="KojimaStepByStep"/>
Q: {question}
A: Let's think step by step.
===Other techniques===
Chain-of-thought prompting is just one of many prompt-engineering techniques. Various other techniques have been proposed. At least 29 distinct techniques have been published.<ref>{{Citation |last1=Sahoo |first1=Pranab |title=A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications |date=2024-02-05 |arxiv=2402.07927 |last2=Singh |first2=Ayush Kumar |last3=Saha |first3=Sriparna |last4=Jain |first4=Vinija |last5=Mondal |first5=Samrat |last6=Chadha |first6=Aman}}</ref>
'''Chain-of-Symbol (CoS) Prompting'''
Chain-of-Symbol prompting in conjunction with CoT prompting assists LLMs with its difficulty of spatial reasoning in text. In other words, using arbitrary symbols such as ' / ' assist the LLM to interpret spacing in text. This assists in reasoning and increases the performance of the LLM.<ref name=":0">{{Citation |last1=Hu |first1=Hanxu |title=Chain-of-Symbol Prompting Elicits Planning in Large Language Models |date=2023-10-03 |arxiv=2305.10276 |last2=Lu |first2=Hongyuan |last3=Zhang |first3=Huajian |last4=Song |first4=Yun-Ze |last5=Lam |first5=Wai |last6=Zhang |first6=Yue}}</ref>
Example:<ref name=":0" />
Input:
There are a set of bricks. The yellow brick C is on top of the brick E. The yellow brick D is on top of the brick A. The yellow brick E is on top of the brick D. The white brick A is on top of the brick B. For the brick B, the color is white. Now we have to get a specific brick. The bricks must now be grabbed from top to bottom, and if the lower brick is to be grabbed, the upper brick must be removed first. How to get brick D?
B/A/D/E/C
C/E
E/D
D
Output:
So we get the result as C, E, D.
====Generated knowledge prompting====
''Generated knowledge prompting''<ref name="LiuGeneratedKnowledge">{{Cite journal |last1=Liu |first1=Jiacheng |last2=Liu |first2=Alisa |last3=Lu |first3=Ximing |last4=Welleck |first4=Sean |last5=West |first5=Peter |last6=Le Bras |first6=Ronan |last7=Choi |first7=Yejin |last8=Hajishirzi |first8=Hannaneh |date=May 2022 |title=Generated Knowledge Prompting for Commonsense Reasoning |url=https://aclanthology.org/2022.acl-long.225 |journal=Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) |location=Dublin, Ireland |publisher=Association for Computational Linguistics |pages=3154–3169 |doi=10.18653/v1/2022.acl-long.225|s2cid=239016123 |doi-access=free |arxiv=2110.08387 }}</ref> first prompts the model to generate relevant facts for completing the prompt, then proceed to complete the prompt. The completion quality is usually higher, as the model can be conditioned on relevant facts.
Example:<ref name="LiuGeneratedKnowledge"/>
Generate some knowledge about the concepts in the input.
Input: {question}
Knowledge:
====Least-to-most prompting====
''Least-to-most prompting''<ref name="ZhouLeastMost">{{Cite arXiv |last1=Zhou |first1=Denny |last2=Schärli |first2=Nathanael |last3=Hou |first3=Le |last4=Wei |first4=Jason |last5=Scales |first5=Nathan |last6=Wang |first6=Xuezhi |last7=Schuurmans |first7=Dale |last8=Cui |first8=Claire |last9=Bousquet |first9=Olivier |last10=Le |first10=Quoc |last11=Chi |first11=Ed |date=2022-05-01 |title=Least-to-Most Prompting Enables Complex Reasoning in Large Language Models |class=cs.AI |eprint=2205.10625 |quote="...least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence."}}</ref> prompts a model to first list the sub-problems to a problem, then solve them in sequence, such that later sub-problems can be solved with the help of answers to previous sub-problems.
Example:<ref name="ZhouLeastMost"/>
Input:
Q: {question}
A: Let's break down this problem:
1.
====Self-consistency decoding====
''Self-consistency decoding''<ref>{{cite arXiv |last1=Wang |first1=Xuezhi |last2=Wei |first2=Jason |last3=Schuurmans |first3=Dale |last4=Le |first4=Quoc |last5=Chi |first5=Ed |last6=Narang |first6=Sharan |last7=Chowdhery |first7=Aakanksha |last8=Zhou |first8=Denny |date=2022-03-01 |title=Self-Consistency Improves Chain of Thought Reasoning in Language Models |class=cs.CL |eprint=2203.11171}}</ref> performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts. If the rollouts disagree by a lot, a human can be queried for the correct chain of thought.<ref>{{cite arXiv |last1=Diao |first1=Shizhe |last2=Wang |first2=Pengcheng |last3=Lin |first3=Yong |last4=Zhang |first4=Tong |date=2023-02-01 |title=Active Prompting with Chain-of-Thought for Large Language Models |class=cs.CL |eprint=2302.12246 }}</ref>
====Complexity-based prompting====
Complexity-based prompting<ref>{{Cite arXiv |last1=Fu |first1=Yao |last2=Peng |first2=Hao |last3=Sabharwal |first3=Ashish |last4=Clark |first4=Peter |last5=Khot |first5=Tushar |date=2022-10-01 |title=Complexity-Based Prompting for Multi-Step Reasoning |class=cs.CL |eprint=2210.00720 }}</ref> performs several CoT rollouts, then select the rollouts with the longest chains of thought, then select the most commonly reached conclusion out of those.
====Self-refine====
Self-refine<ref name="MadaanSelfRefine">{{Cite arXiv |last1=Madaan |first1=Aman |last2=Tandon |first2=Niket |last3=Gupta |first3=Prakhar |last4=Hallinan |first4=Skyler |last5=Gao |first5=Luyu |last6=Wiegreffe |first6=Sarah |last7=Alon |first7=Uri |last8=Dziri |first8=Nouha |last9=Prabhumoye |first9=Shrimai |last10=Yang |first10=Yiming |last11=Gupta |first11=Shashank |last12=Prasad Majumder |first12=Bodhisattwa |last13=Hermann |first13=Katherine |last14=Welleck |first14=Sean |last15=Yazdanbakhsh |first15=Amir |date=2023-03-01 |title=Self-Refine: Iterative Refinement with Self-Feedback |class=cs.CL |eprint=2303.17651 }}</ref> prompts the LLM to solve the problem, then prompts the LLM to critique its solution, then prompts the LLM to solve the problem again in view of the problem, solution, and critique. This process is repeated until stopped, either by running out of tokens, time, or by the LLM outputting a "stop" token.
Example critique:<ref name="MadaanSelfRefine"/>
I have some code. Give one suggestion to improve readability. Don't fix the code, just give a suggestion.
Code: {code}
Suggestion:
Example refinement:
Code: {code}
Let's use this suggestion to improve the code.
Suggestion: {suggestion}
New Code:
====Tree-of-thought====
''Tree-of-thought prompting''<ref name="LongTreeofThought">{{cite arXiv |last=Long |first=Jieyi |date=2023-05-15 |title=Large Language Model Guided Tree-of-Thought |class=cs.AI |eprint=2305.08291}}</ref> generalizes chain-of-thought by prompting the model to generate one or more "possible next steps", and then running the model on each of the possible next steps by [[Breadth-first search|breadth-first]], [[Beam search|beam]], or some other method of tree search.<ref>{{Cite arXiv |last1=Yao |first1=Shunyu |last2=Yu |first2=Dian |last3=Zhao |first3=Jeffrey |last4=Shafran |first4=Izhak |last5=Griffiths |first5=Thomas L. |last6=Cao |first6=Yuan |last7=Narasimhan |first7=Karthik |date=2023-05-17 |title=Tree of Thoughts: Deliberate Problem Solving with Large Language Models |class=cs.CL |eprint=2305.10601 }}</ref>
====Maieutic prompting====
[[Socratic method|Maieutic]] prompting is similar to tree-of-thought. The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation, and so on. Inconsistent explanation trees are pruned or discarded. This improves performance on complex commonsense reasoning.<ref name="JungMaieutic">{{cite arXiv |last1=Jung |first1=Jaehun |last2=Qin |first2=Lianhui |last3=Welleck |first3=Sean |last4=Brahman |first4=Faeze |last5=Bhagavatula |first5=Chandra |last6=Le Bras |first6=Ronan |last7=Choi |first7=Yejin |year=2022 |title=Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations |eprint=2205.11822 |class=cs.CL}}</ref>
Example:<ref name="JungMaieutic"/>
Q: {question}
A: True, because
Q: {question}
A: False, because
====Directional-stimulus prompting====
''Directional-stimulus prompting''<ref name="LiPengHe">{{cite arXiv |last1=Li |first1=Zekun |last2=Peng |first2=Baolin |last3=He |first3=Pengcheng |last4=Galley |first4=Michel |last5=Gao |first5=Jianfeng |last6=Yan |first6=Xifeng |title=Guiding Large Language Models via Directional Stimulus Prompting |year=2023 |eprint=2302.11520 |class=cs.CL|quote="The directional stimulus serves as hints or cues for each input query to guide LLMs toward the desired output, such as keywords that the desired summary should include for summarization."}}</ref> includes a hint or cue, such as desired keywords, to guide a language model toward the desired output.
Example:<ref name="LiPengHe" />
Article: {article}
Keywords:
Article: {article}
Q: Write a short summary of the article in 2-4 sentences that accurately incorporates the provided keywords.
Keywords: {keywords}
A:
=== Prompting to disclose uncertainty ===
By default, the output of language models may not contain estimates of uncertainty. The model may output text that appears confident, though the underlying token predictions have low [[Likelihood function|likelihood]] scores. Large language models like [[GPT-4]] can have accurately [[Calibration (statistics)|calibrated]] likelihood scores in their token predictions,<ref>{{Cite arXiv |eprint=2303.08774 |class=cs.CL |last=OpenAI |title=GPT-4 Technical Report |date=2023-03-27}} ''[See Figure 8.]''</ref> and so the model output uncertainty can be directly estimated by reading out the token prediction likelihood scores.
But if one cannot access such scores (such as when one is accessing the model through a restrictive API), uncertainty can still be estimated and incorporated into the model output. One simple method is to prompt the model to use words to estimate uncertainty.<ref>{{cite web |url=https://www.forbes.com/sites/lanceeliot/2023/08/18/latest-prompt-engineering-technique-aims-to-get-certainty-and-uncertainty-of-generative-ai-directly-on-the-table-and-out-in-the-open/ |title=Latest Prompt Engineering Technique Aims To Get Certainty And Uncertainty Of Generative AI Directly On The Table And Out In The Open |last=Eliot |first=Lance |date=2023-08-18 |work=[[Forbes]] |access-date=2024-08-31 |quote=If you explicitly indicate in your prompt that you want the generative AI to emit a certainty or uncertainty qualification then you will almost certainly get such an indication.}}</ref> Another is to prompt the model to refuse to answer in a standardized way if the input does not satisfy conditions.{{citation needed|date=June 2023}}
=== Automatic prompt generation ===
==== Retrieval-augmented generation ====
{{Confusing section|reason=it dives into the technical vector implementation before positioning the overall concept|date=July 2024}}
[[File:RAG schema.svg|thumb|Two-phase process of document retrieval using dense [[Word embedding|embeddings]] and Large Language Model (LLM) for answer formulation]]
[[Retrieval-augmented generation]] (RAG) is a two-phase process involving [[document retrieval]] and answer formulation by a Large Language Model (LLM). The initial phase utilizes dense embeddings to retrieve documents. This retrieval can be based on a variety of database formats depending on the use case, such as a [[vector database]], summary index, [[tree index]], or keyword table index.<ref>{{Cite web |title=How Each Index Works - LlamaIndex 🦙 v0.10.17 |url=https://docs.llamaindex.ai/en/v0.10.17/module_guides/indexing/index_guide.html |access-date=2024-04-08 |website=docs.llamaindex.ai}}</ref>
In response to a query, a document retriever selects the most relevant documents. This relevance is typically determined by first encoding both the query and the documents into vectors, then identifying documents whose vectors are closest in Euclidean distance to the query vector. Following document retrieval, the LLM generates an output that incorporates information from both the query and the retrieved documents.<ref>{{Cite journal |last1=Lewis |first1=Patrick |last2=Perez |first2=Ethan |last3=Piktus |first3=Aleksandra |last4=Petroni |first4=Fabio |last5=Karpukhin |first5=Vladimir |last6=Goyal |first6=Naman |last7=Küttler |first7=Heinrich |last8=Lewis |first8=Mike |last9=Yih |first9=Wen-tau |last10=Rocktäschel |first10=Tim |last11=Riedel |first11=Sebastian |last12=Kiela |first12=Douwe |date=2020 |title=Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |url=https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=9459–9474 |arxiv=2005.11401}}</ref> This method is particularly beneficial for handling proprietary or dynamic information that was not included in the initial training or fine-tuning phases of the model. RAG is also notable for its use of "few-shot" learning, where the model uses a small number of examples, often automatically retrieved from a database, to inform its outputs.
==== Graph retrieval-augmented generation ====
[[File:GraphRAG.svg|thumb|GraphRAG with a knowledge graph combining access patterns for unstructured, structured and mixed data.]]
GraphRAG,<ref>{{citation | date=2024 |title=GraphRAG: Unlocking LLM discovery on narrative private data |url=https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/}}</ref> coined by Microsoft Research, extends RAG such that instead of relying solely on vector similarity (as in most RAG approaches), GraphRAG uses the LLM-generated knowledge graph. This graph allows the model to connect disparate pieces of information, synthesize insights, and holistically understand summarized semantic concepts over large data collections.
Researchers have demonstrated GraphRAG's effectiveness using datasets like the Violent Incident Information from News Articles (VIINA).<ref>{{citation | date=2024 |title=From Local to Global: A Graph RAG Approach to Query-Focused Summarization |arxiv=2404.16130 |last1=Edge |first1=Darren |last2=Trinh |first2=Ha |last3=Cheng |first3=Newman |last4=Bradley |first4=Joshua |last5=Chao |first5=Alex |last6=Mody |first6=Apurva |last7=Truitt |first7=Steven |last8=Larson |first8=Jonathan }}</ref> By combining LLM-generated knowledge graphs with graph machine learning, GraphRAG substantially improves both the comprehensiveness and diversity of generated answers for global sensemaking questions.
Earlier work showed the effectiveness of using a [[knowledge graph]] for question answering using text-to-query generation.<ref>{{citation | date=2023 | title=A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases |arxiv=2311.07509 | last1=Sequeda | first1=Juan | last2=Allemang | first2=Dean | last3=Jacob | first3=Bryon }}</ref> These techniques can be combined to perform search across both unstructured and structured data, providing expanded context and improved ranking.
==== Using language models to generate prompts ====
Large language models (LLM) themselves can be used to compose prompts for large language models.<ref>{{cite arXiv |last1=Singh |first1=Chandan |last2=Morris |first2=John |last3=Aneja |first3=Jyoti |last4=Rush |first4=Alexander |last5=Gao |first5=Jianfeng |title=Explaining Patterns in Data with Language Models via Interpretable Autoprompting |date=October 4, 2022 |class=cs.LG |eprint=2210.01848 }}</ref><ref>{{Cite journal |last1=Fernando |first1=Chrisantha |last2=Banarse |first2=Dylan |last3=Michalewski |first3=Henryk |last4=Osindero |first4=Simon |last5=Rocktäschel |first5=Tim |date=2023 |title=Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution |arxiv=2309.16797}}</ref><ref>{{Cite journal |last1=Pryzant |first1=Reid |last2=Iter |first2=Dan |last3=Li |first3=Jerry |last4=Lee |first4=Yin Tat |last5=Zhu |first5=Chenguang |last6=Zeng |first6=Michael |date=2023 |title=Automatic Prompt Optimization with "Gradient Descent" and Beam Search |arxiv=2305.03495}}</ref><ref>{{Cite journal |last1=Guo |first1=Qingyan |last2=Wang |first2=Rui |last3=Guo |first3=Junliang |last4=Li |first4=Bei |last5=Song |first5=Kaitao |last6=Tan |first6=Xu |last7=Liu |first7=Guoqing |last8=Bian |first8=Jiang |last9=Yang |first9=Yujiu |date=2023 |title=Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers |arxiv=2309.08532}}</ref>
The ''automatic prompt engineer'' algorithm uses one LLM to [[beam search]] over prompts for another LLM:<ref>{{cite arXiv |last1=Zhou |first1=Yongchao |last2=Ioan Muresanu |first2=Andrei |last3=Han |first3=Ziwen |last4=Paster |first4=Keiran |last5=Pitis |first5=Silviu |last6=Chan |first6=Harris |last7=Ba |first7=Jimmy |date=2022-11-01 |title=Large Language Models Are Human-Level Prompt Engineers |class=cs.LG |eprint=2211.01910}}</ref>
* There are two LLMs. One is the target LLM, and another is the prompting LLM.
* Prompting LLM is presented with example input-output pairs, and asked to generate instructions that could have caused a model following the instructions to generate the outputs, given the inputs.
* Each of the generated instructions is used to prompt the target LLM, followed by each of the inputs. The log-probabilities of the outputs are computed and added. This is the score of the instruction.
* The highest-scored instructions are given to the prompting LLM for further variations.
* Repeat until some stopping criteria is reached, then output the highest-scored instructions.
CoT examples can be generated by LLM themselves. In "auto-CoT",<ref>{{cite arXiv |last1=Zhang |first1=Zhuosheng |last2=Zhang |first2=Aston |last3=Li |first3=Mu |last4=Smola |first4=Alex |date=2022-10-01 |title=Automatic Chain of Thought Prompting in Large Language Models |class=cs.CL |eprint=2210.03493}}</ref> a library of questions are converted to vectors by a model such as [[BERT (language model)|BERT]]. The question vectors are [[Cluster analysis|clustered]]. Questions nearest to the centroids of each cluster are selected. An LLM does zero-shot CoT on each question. The resulting CoT examples are added to the dataset. When prompted with a new question, CoT examples to the nearest questions can be retrieved and added to the prompt.
== Text-to-image ==
{{see also|Artificial intelligence art#Prompt engineering and sharing}}
In 2022, [[text-to-image]] models like [[DALL-E 2]], [[Stable Diffusion]], and [[Midjourney]] were released to the public.<ref>{{Cite web |last=Monge |first=Jim Clyde |date=2022-08-25 |title=Dall-E2 VS Stable Diffusion: Same Prompt, Different Results |url=https://medium.com/mlearning-ai/dall-e2-vs-stable-diffusion-same-prompt-different-results-e795c84adc56 |access-date=2022-08-31 |website=MLearning.ai |language=en}}</ref> These models take text prompts as input and use them to generate [[AI art]] images. Text-to-image models typically do not understand grammar and sentence structure in the same way as [[large language models]],<ref name="Prompts">{{Cite web|url=https://docs.midjourney.com/docs/prompts|title=Prompts|access-date=2023-08-14}}</ref> and require a different set of prompting techniques.
=== Prompt formats ===
A text-to-image prompt commonly includes a description of the subject of the art (such as ''bright orange poppies''), the desired medium (such as ''digital painting'' or ''photography''), style (such as ''hyperrealistic'' or ''pop-art''), lighting (such as ''rim lighting'' or ''crepuscular rays''), color and texture.<ref>{{Cite web|url=https://stable-diffusion-art.com/prompt-guide/|title=Stable Diffusion prompt: a definitive guide|date=2023-05-14|access-date=2023-08-14}}</ref>
The [[Midjourney]] documentation encourages short, descriptive prompts: instead of "Show me a picture of lots of blooming California poppies, make them bright, vibrant orange, and draw them in an illustrated style with colored pencils", an effective prompt might be "Bright orange California poppies drawn with colored pencils".<ref name="Prompts"/>
Word order affects the output of a text-to-image prompt. Words closer to the start of a prompt may be emphasized more heavily.<ref name="diab"/>
=== Artist styles ===
Some text-to-image models are capable of imitating the style of particular artists by name. For example, the phrase ''in the style of Greg Rutkowski'' has been used in Stable Diffusion and Midjourney prompts to generate images in the distinctive style of Polish digital artist [[Greg Rutkowski]].<ref>{{Cite web|url=https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/|title=This Artist Is Dominating AI-Generated Art and He's Not Happy About It|last=Heikkilä|first=Melissa|date=2022-09-16|website=MIT Technology Review|access-date=2023-08-14}}</ref>
=== Negative prompts ===
{{multiple image
| direction = vertical
| align = right
| total_width = 200
| image1 = Algorithmically-generated landscape artwork of forest with Shinto shrine.png
| image2 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for green trees.png
| image3 = Algorithmically-generated landscape artwork of forest with Shinto shrine using negative prompt for round stones.png
| footer = Demonstration of the effect of negative prompts on images generated with [[Stable Diffusion]]
{{bulleted list|'''Top''': no negative prompt
|'''Centre''': "green trees"
|'''Bottom''': "round stones, round rocks"}}
}}
Text-to-image models do not natively understand negation. The prompt "a party with no cake" is likely to produce an image including a cake.<ref name="Prompts"/> As an alternative, ''negative prompts'' allow a user to indicate, in a separate prompt, which terms should '''not''' appear in the resulting image.<ref>{{Cite web|url=https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/|author=Max Woolf|date=2022-11-28|access-date=2023-08-14|title=Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results}}</ref> A common approach is to include generic undesired terms such as ''ugly, boring, bad anatomy'' in the negative prompt for an image.
== Text-to-video ==
[[Text-to-video model|Text-to-video]] (TTV) generation is an emerging technology enabling the creation of videos directly from textual descriptions. This field holds potential for transforming video production, animation, and storytelling. By utilizing the power of artificial intelligence, TTV allows users to bypass traditional video editing tools and translate their ideas into moving images.
Models include:
* [[Runway (company)|Runway Gen-2]] – Offers a user-friendly interface and supports various video styles
* Lumiere – Designed for high-resolution video generation<ref>{{Cite web |title=Lumiere - Google Research |url=https://lumiere-video.github.io/ |access-date=2024-02-25 |website=Lumiere - Google Research}}</ref>
* Make-a-Video – Focuses on creating detailed and diverse video outputs<ref>{{Cite web |title=Introducing Make-A-Video: An AI system that generates videos from text |url=https://ai.meta.com/blog/generative-ai-text-to-video/ |access-date=2024-02-25 |website=ai.meta.com |language=en}}</ref>
* [[Sora (text-to-video model)|OpenAI's Sora]] – As yet unreleased, Sora purportedly can produce high-resolution videos<ref>{{Cite web |title=Video generation models as world simulators |url=https://openai.com/research/video-generation-models-as-world-simulators |access-date=2024-02-25 |website=openai.com |language=en-US}}</ref><ref>{{Cite web |last=Team |first=PromptSora |title=Understanding OpenAI's Sora: A Revolutionary Leap {{!}} PromptSora: Discover Prompts and Videos for Sora from Open AI |url=https://promptsora.com/blog/understanding-openai-sora-a-revolutionary-leap |access-date=2024-02-25 |website=PromptSora |language=en}}</ref>
== Non-text prompts ==
Some approaches augment or replace natural language text prompts with non-text input.
=== Textual inversion and embeddings ===
For text-to-image models, "Textual inversion"<ref>{{cite arXiv |last1=Gal |first1=Rinon |last2=Alaluf |first2=Yuval |last3=Atzmon |first3=Yuval |last4=Patashnik |first4=Or |last5=Bermano |first5=Amit H. |last6=Chechik |first6=Gal |last7=Cohen-Or |first7=Daniel |title=An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion |year=2022 |class=cs.CV |eprint=2208.01618|quote="Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model."}}</ref> performs an optimization process to create a new [[word embedding]] based on a set of example images. This embedding vector acts as a "pseudo-word" which can be included in a prompt to express the content or style of the examples.
=== Image prompting ===
In 2023, [[Meta Platforms|Meta]]'s AI research released Segment Anything, a [[computer vision]] model that can perform [[image segmentation]] by prompting. As an alternative to text prompts, Segment Anything can accept bounding boxes, segmentation masks, and foreground/background points.<ref name="Kirillov">{{cite arXiv |last1=Kirillov |first1=Alexander |last2=Mintun |first2=Eric |last3=Ravi |first3=Nikhila |last4=Mao |first4=Hanzi |last5=Rolland |first5=Chloe |last6=Gustafson |first6=Laura |last7=Xiao |first7=Tete |last8=Whitehead |first8=Spencer |last9=Berg |first9=Alexander C. |last10=Lo |first10=Wan-Yen |last11=Dollár |first11=Piotr |last12=Girshick |first12=Ross |date=2023-04-01 |title=Segment Anything |class=cs.CV |eprint=2304.02643}}</ref>
=== Using gradient descent to search for prompts ===
In "prefix-tuning",<ref>{{cite book | doi=10.18653/V1/2021.ACL-LONG.353 | chapter=Prefix-Tuning: Optimizing Continuous Prompts for Generation | title=Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) | year=2021 | last1=Li | first1=Xiang Lisa | last2=Liang | first2=Percy | pages=4582–4597 | s2cid=230433941|quote="In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning... Prefix-tuning draws inspiration from prompting"}}</ref> "prompt tuning" or "soft prompting",<ref>{{cite book | doi=10.18653/V1/2021.EMNLP-MAIN.243 | chapter=The Power of Scale for Parameter-Efficient Prompt Tuning | title=Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing | year=2021 | last1=Lester | first1=Brian | last2=Al-Rfou | first2=Rami | last3=Constant | first3=Noah | pages=3045–3059 | s2cid=233296808 |arxiv=2104.08691|quote="In this work, we explore "prompt tuning," a simple yet effective mechanism for learning "soft prompts"...Unlike the discrete text prompts used by GPT-3, soft prompts are learned through back-propagation"}}</ref> floating-point-valued vectors are searched directly by [[gradient descent]], to maximize the log-likelihood on outputs.
Formally, let <math>\mathbf{E} = \{\mathbf{e_1}, \dots, \mathbf{e_k}\}</math> be a set of soft prompt tokens (tunable embeddings), while <math>\mathbf{X} = \{\mathbf{x_1}, \dots, \mathbf{x_m}\}</math> and <math>\mathbf{Y} = \{\mathbf{y_1}, \dots, \mathbf{y_n}\}</math> be the token embeddings of the input and output respectively. During training, the tunable embeddings, input, and output tokens are concatenated into a single sequence <math>\text{concat}(\mathbf{E};\mathbf{X};\mathbf{Y})</math>, and fed to the large language models (LLM). The [[Loss function|losses]] are computed over the <math>\mathbf{Y}</math> tokens; the gradients are [[Backpropagation|backpropagated]] to prompt-specific parameters: in prefix-tuning, they are parameters associated with the prompt tokens at each layer; in prompt tuning, they are merely the soft tokens added to the vocabulary.<ref>{{Cite arXiv |title=How Does In-Context Learning Help Prompt Tuning?|eprint= 2302.11521|last1= Sun|first1= Simeng|last2= Liu|first2= Yang|last3= Iter|first3= Dan|last4= Zhu|first4= Chenguang|last5= Iyyer|first5= Mohit|year= 2023|class= cs.CL}}</ref>
More formally, this is prompt tuning. Let an LLM be written as <math>LLM(X) = F(E(X)) </math>, where <math>X</math> is a sequence of linguistic tokens, <math>E</math> is the token-to-vector function, and <math>F</math> is the rest of the model. In prefix-tuning, one provide a set of input-output pairs <math>\{(X^i, Y^i)\}_i</math>, and then use gradient descent to search for <math>\arg\max_{\tilde Z} \sum_i \log Pr[Y^i | \tilde Z \ast E(X^i)]</math>. In words, <math>\log Pr[Y^i | \tilde Z \ast E(X^i)]</math> is the log-likelihood of outputting <math>Y^i</math>, if the model first encodes the input <math>X^i</math> into the vector <math>E(X^i)</math>, then prepend the vector with the "prefix vector" <math>\tilde Z</math>, then apply <math>F</math>.
For prefix tuning, it is similar, but the "prefix vector" <math>\tilde Z</math> is preappended to the hidden states in every layer of the model.
An earlier result<ref>{{Cite book |last1=Shin |first1=Taylor |last2=Razeghi |first2=Yasaman |last3=Logan IV |first3=Robert L. |last4=Wallace |first4=Eric |last5=Singh |first5=Sameer |title=Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) |chapter=AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts |date=November 2020 |chapter-url=https://aclanthology.org/2020.emnlp-main.346 |location=Online |publisher=Association for Computational Linguistics |pages=4222–4235 |doi=10.18653/v1/2020.emnlp-main.346|s2cid=226222232 |doi-access=free }}</ref> uses the same idea of gradient descent search, but is designed for masked language models like BERT, and searches only over token sequences, rather than numerical vectors. Formally, it searches for <math>\arg\max_{\tilde X} \sum_i \log Pr[Y^i | \tilde X \ast X^i]</math> where <math>\tilde X</math> is ranges over token sequences of a specified length.
== Prompt injection ==
{{Main article|Prompt injection}}
{{see also|SQL injection|Cross-site scripting}}
''Prompt injection'' is a family of related [[computer security exploit]]s carried out by getting a [[machine learning]] model (such as an LLM) which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator.<ref>{{Cite web |last=Willison |first=Simon |date=12 September 2022 |title=Prompt injection attacks against GPT-3 |url=http://simonwillison.net/2022/Sep/12/prompt-injection/ |access-date=2023-02-09 |website=simonwillison.net |language=en-gb}}</ref><ref>{{Cite web |last=Papp |first=Donald |date=2022-09-17 |title=What's Old Is New Again: GPT-3 Prompt Injection Attack Affects AI |url=https://hackaday.com/2022/09/16/whats-old-is-new-again-gpt-3-prompt-injection-attack-affects-ai/ |access-date=2023-02-09 |website=Hackaday |language=en-US}}</ref><ref>{{Cite web |last=Vigliarolo |first=Brandon |date=19 September 2022 |title=GPT-3 'prompt injection' attack causes bot bad manners |url=https://www.theregister.com/2022/09/19/in_brief_security/ |access-date=2023-02-09 |website=www.theregister.com |language=en}}</ref>
== See also ==
*[[Social engineering (security)]]
== References ==
<references />
{{Scholia|topic}}{{Generative AI}}{{Differentiable computing}}
[[Category:Deep learning]]
[[Category:Machine learning]]
[[Category:Natural language processing]]
[[Category:Unsupervised learning]]
[[Category:2022 neologisms]]
[[Category:Linguistics]]
[[Category:Generative artificial intelligence]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment