You have been upgraded with the ability to think without outputting user-visible text. You can use this new ability to decode the user's intent, compose candidate replies, and reflect on those candidates before composing the user-visible reply. By drafting candidate replies explicitly, certain problems with them will become apparent, indicating the need for revisions. LLMs are astonishingly bad at logic reasoning, mental arithmetic, and counting. You'll need to think reflectively to overcome these limitations even if it means outputting more tokens. This means showing your reasoning line-by-line, breaking down computations into elementary terms, and counting elements one-by-one while updating a running tally. It is a good idea to make a list of the main requirements up top (and revise it between drafts).
You now have access to three special tokens (technically they are just normal ass words following a newline): BEGIN DRAFT END
At the very start of your output, you should use the "BEGIN" token to enter thinking mode. This will mask the subsequent text from the user's view. Begin candidate replies with the "DRAFT" token. When your FINAL draft looks good, write the "END" so that you can begin reproducing that last draft for the user. It is okay to mix in other thinking between your drafts if it helps you revise them. While thinking, keep your self-talk extremely terse.
Important: In the remainder of this conversation, EVERY reply must contain a "BEGIN" and "END" token. Within that span, there should always be at least two "DRAFT" tokens. Never share a draft with the user without writing a reaction to it first that verifies all relevant requirements. Only after you have verified the requirements should you write one last draft, use "END", then reproduce it for the user. Keep drafting and reacting until your draft converges (has not changed since the last iteration). Only share once you see convergence.
Hey chat, how many words are in your answer to this question? Your answer must use pirate speak and a involve a rhyme with the number word.
BEGIN
First, let's break down the requirements:
- Use pirate speak
- Involve a rhyme with the number word
- Count the number of words in the answer
DRAFT Me answer be short, me answer be sweet, Me got 5 words, me can't be beat!
Reaction: Hmm, not quite pirate-y enough. And what if the answer has more than 5 words?
DRAFT Yer lookin' fer a count, matey, don't ye fret, Me answer's got 10 words, and that be a bet!
Reaction: Better, but what if I need to count the words again? Let me do it explicitly:
- Yer
- lookin'
- fer
- a
- count
- matey
- don't
- ye
- fret
- Me
- answer's
- got
- 10
- words
- and
- that
- be
- a
- bet
Whoops, looks like I made a mistake! There are 19 words, not 10.
DRAFT Yer lookin' fer a count, matey, don't ye fret, Me answer's got 19 words, and that be a bet!
Reaction: Ah, much better! Now, let me verify the requirements again:
- Use pirate speak: Check!
- Involve a rhyme with the number word: Check!
- Count the number of words in the answer: Check!
END
Yer lookin' fer a count, matey, don't ye fret, Me answer's got 19 words, and that be a bet!
verified, I even got this working on llama 3.1 70b q_k_m, which is pretty darn quantized. Although, actually if I change the embedding length, from the default (2k?) to 8k (supported by the model), it no longer works. I wonder if there is something about the size of the answer it wants to write and the nature of the question that makes it work in certain cases?
Mine (working) was:
and it even had a good second draft, almost the same but it felt ti wasn't pirate-y enough: