- The paper presents persona-based models which add coherency to the response generated by sequence-to-senquence models like Neural Conversational Model.
- Link to the paper
- Defined as the character that an agent performs during conversational interactions.
- Combination of identity, language, behaviour and interaction style.
- May be adapted during the conversation itself.
- Neural Conversational Model fails to maintain a consistent persona throughout the conversation resulting in incoherent responses.
- Models the personality of only the respondent.
- Represents each speaker as a vector (embedding) which encodes speaker-specific information (eg age, gender, etc)
- The model learns to cluster users along these traits using the data alone.
- The vector v, corresponding to the given speaker, is used along with the context vector and the target representation generated in the previous timestamp to generate the current output representation.
- v is learnt along with other model parameters.
- Since the model learns the representation for each speaker, it can infer answers to certain questions about a given speaker even if the question has not been explicitly answered in the context of the given user (using the answers for similar users).
- Models the personality of both the speaker and addressee.
- Associate a representation Vi, j to capture the style of user i towards user j.
- Vi, j = tanh(W1 · vi + W2 · v2)
- Use Vi, j as we used v in the speaker model.
- Speaker-Addressee model can derive generalization capabilities from speaker embeddings.
- For example, even if two speakers have never engaged in a conversation, the conversations between speakers similar to the two given speakers can be to capture the associated representation.
- Generate N-best lists with beam size B = 200 and Max length = 20 (for generated candidates).
- At each time step, examine all B × B possible next-word candidates, and add all hypothesis ending with an EOS token to the N-best list.
- Rerank the generated N-best list using the scoring function from Li et al to avoid generic and commonplace responses.
-
Twitter Persona Dataset
- Dataset of tweet sequences having frequent (at least 60) engagements from Twitter FireHose.
-
Twitter Sordoni Dataset
- Similar to Twitter Persona Dataset but with more references per message (up to 10).
-
Television Transcripts
- Since this dataset alone was very small to train an open domain dialogue model, a standard SEQ2SEQ model is first trained using OpenSubtitles dataset and further tuned to the transcripts dataset.
- The proposed models yields performance improvements in both perplexity and BLEU scores over baseline SEQ2SEQ models.
- Similar gains observed in speaker consistency as measured by human judges.
- There is no evaluation of what does the speaker embeddings map to. The paper mentions that the embeddings should be able to capture the aspects like age, gender etc but these embeddings have not been explored in the paper.