Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created November 2, 2025 06:28
Show Gist options
  • Save bigsnarfdude/23ec6b30a53437c436c8c4338ee6678c to your computer and use it in GitHub Desktop.
Save bigsnarfdude/23ec6b30a53437c436c8c4338ee6678c to your computer and use it in GitHub Desktop.
Diffusion Models for Text Classification: A Comprehensive Survey

Diffusion Models for Text Classification: A Comprehensive Survey (2021-2025)

Research on diffusion-based text classification is surprisingly nascent—only 5 core papers directly apply diffusion models to text classification tasks, despite the explosion of diffusion work in NLP for generation. The field emerged in 2022-2024, with ROIC-DM (2024) being the first to use diffusion directly as a text classifier. However, foundational work on diffusion classifiers in computer vision (2021-2023) established the theoretical framework, and discrete diffusion models like D3PM provide the technical foundations for text applications. Most papers focus on adversarial robustness and uncertainty quantification rather than pure accuracy gains, suggesting diffusion's strength lies in providing more reliable, robust classification.

Direct applications of diffusion to text classification

The most significant finding is how limited this research area remains. Between 2021-2025, only a handful of papers explicitly use diffusion models for text classification tasks, making this a genuine research frontier.

ROIC-DM: Robust Text Inference and Classification via Diffusion Model

Authors: Shilong Yuan, Wei Yuan, Hongzhi Yin, Tieke He
Venue: arXiv preprint
Date: January 2024
arXiv: 2401.03514

This groundbreaking paper is the first to introduce diffusion models specifically for text classification and inference tasks. ROIC-DM modifies traditional diffusion models to work as classifiers rather than generative models by gradually adding noise to label vectors during the forward process, then training a denoising network that performs reverse diffusion conditioned on input text to recover the label. A key innovation incorporates pre-trained language models (like BERT) as "advisors" during the denoising process. The model demonstrates superior robustness against adversarial attacks (TextFooler, BERT-Attack) compared to traditional language models, even when equipped with defense mechanisms. Evaluated on AG NEWS, SST-2, and MRPC datasets.

DiffusionABSA: Improving Aspect-Based Sentiment Analysis with Diffusion Models

Authors: Shunyu Liu, Jie Zhou, Qunxi Zhu, Qin Chen, Qingchun Bai, Jun Xiao, Liang He
Venue: LREC-COLING 2024
Date: February 2024
arXiv: 2402.15289
Pages: 10324–10335

DiffusionABSA applies diffusion models to aspect-based sentiment analysis (ABSA), treating aspect extraction and sentiment classification as a progressive denoising task. The model gradually adds noise to aspect terms during training and learns a denoising process that progressively restores these terms. Uses a syntax-aware temporal attention mechanism to capture relationships between aspects and surrounding text. Particularly effective at determining aspect boundaries (start and end indices) for long aspect terms. Evaluated on 8 benchmark ABSA datasets. Code available at github.com/Qlb6x/DiffusionABSA.

DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising

Authors: Zhenhao Li, Huichi Zhou, Marek Rei, Lucia Specia
Venue: ACL 2025 (Volume 1: Long Papers)
Date: June 2024
arXiv: 2407.00248
Pages: 9259-9274

Unlike ROIC-DM which uses diffusion as the classifier itself, DiffuseDef incorporates a diffusion layer as a plug-and-play denoiser between the encoder and classifier. The diffusion layer is trained to predict and remove noise from hidden representations (not from text tokens or labels directly). During inference, adversarial hidden states are noised and then iteratively denoised, with multiple denoised states ensembled for robust predictions. Works as a plug-and-play defense layer for any existing classifier. Evaluated on AG News, IMDB, and QNLI datasets. Code available at github.com/Nickeilf/DiffuseDef.

An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification

Authors: Zhuowei Chen, Lianxi Wang, Yuben Wu, Xinfeng Liao, Yujia Tian, Junyang Zhong
Venue: EMNLP 2024
Date: November 2024
ACL Anthology: 2024.emnlp-main.109

DiffusionCLS leverages a diffusion language model to capture in-domain knowledge and generate pseudo samples for sentiment classification by reconstructing strong label-related tokens. Uses a Noise-Resistant Training objective to help the model generalize in low-resource scenarios. This paper demonstrates using diffusion for data augmentation rather than as the classifier itself, showing an alternative approach to improving text classification.

Reconstructing representations using diffusion models for multimodal sentiment analysis

Authors: Various (MRC-D3AE model)
Venue: Knowledge-Based Systems (Elsevier)
Date: 2024
DOI: 10.1016/j.knosys.2024.112070

Uses "Diverse diffusion denoising autoencoders (D3AE)" to enhance multimodal sentiment analysis by reconstructing multimodal representations. Incorporates machine reading comprehension (MRC) framework to enhance text modality, then uses diffusion models across multiple time intervals for efficient reconstruction. Achieves state-of-the-art performance on CMU-MOSI and CMU-MOSEI datasets. Single-step denoising strategy across multiple time intervals distinguishes this approach.

Discrete diffusion and D3PM for text classification

Discrete diffusion models represent the natural framework for text data, operating directly on discrete tokens rather than requiring continuous embeddings. However, most D3PM work has focused on generation rather than classification.

D3PM: Structured Denoising Diffusion Models in Discrete State-Spaces

Authors: Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rianne van den Berg
Venue: NeurIPS 2021
Date: July 2021
arXiv: 2107.03006

The foundational D3PM paper that introduced Discrete Denoising Diffusion Probabilistic Models, extending diffusion models to discrete data by generalizing the multinomial diffusion model beyond uniform transition probabilities. Introduces corruption processes with transition matrices that mimic Gaussian kernels, nearest neighbors in embedding space, and absorbing states. While primarily focused on generation tasks (character-level text and image generation), it laid the groundwork for all subsequent discrete diffusion work. Achieved strong results on LM1B and introduced an auxiliary cross-entropy loss combined with the variational lower bound.

Generative or Discriminative? Revisiting Text Classification in the Era of Transformers

Authors: Siva Rajesh Kasa, Karan Gupta, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, Santhosh Kumar Kasa, Nikhil Priyatam Pattisapu, Arindam Bhattacharya, Shailendra Agarwal, Vijay Huddar
Venue: ICML 2025
Date: June 2025
arXiv: 2506.12181

This paper presents the first comprehensive evaluation of modern generative and discriminative architectures for text classification, explicitly including Discrete Diffusion models alongside Auto-regressive modeling, Masked Language Modeling, and Encoders. Examines sample efficiency, calibration, noise robustness, and ordinality across diverse text classification scenarios. The discrete diffusion approach is based on Score Entropy Discrete Diffusion (SEDD). Provides practical guidance for selecting appropriate modeling approaches based on real-world constraints like latency and data availability.

Authentic Discrete Diffusion Model (ADD)

Authors: Xiao Li, Jiaqi Zhang, Shuxiang Zhang, Tianshui Chen, Liang Lin, Guangrun Wang
Venue: arXiv preprint
Date: October 2024
arXiv: 2510.01047

ADD fundamentally redefines discrete diffusion by preserving core diffusion characteristics directly in one-hot space, rather than relying on continuous latent spaces or masking policies. Introduces a timestep-conditioned cross-entropy loss between the model's outputs and original one-hot labels, establishing a bridge between discriminative and generative learning. Demonstrates ADD's application to both classification tasks (single-token generation) and text generation tasks (multi-token generation, image captioning), achieving superior performance compared to baselines. Novel framework that unifies discriminative and generative learning in authentic discrete diffusion space.

Score Entropy Discrete Diffusion (SEDD)

Authors: Aaron Lou, Chenlin Meng, Stefano Ermon
Venue: NeurIPS 2023
Date: October 2023
arXiv: 2310.16834

SEDD proposes "score entropy," a novel loss function that extends score matching to discrete spaces, significantly boosting performance for discrete diffusion models. For comparable model sizes, SEDD reduces perplexity by 25-75% compared to existing language diffusion paradigms and is competitive with autoregressive models, outperforming GPT-2. Generates faithful text without requiring distribution annealing. Referenced as the discrete diffusion baseline in multiple classification papers.

A Reparameterized Discrete Diffusion Model for Text Generation

Authors: Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong
Venue: ICLR 2024
Date: February 2023
arXiv: 2302.05737

Develops a reparameterized discrete diffusion model providing an alternative formulation of sampling from discrete diffusion processes. Features more effective training and decoding techniques that could benefit classification applications. While focused on text generation, the improved training and sampling methods establish technical foundations applicable to discriminative tasks.

Generative classifiers using diffusion with Bayes rule

The most theoretically principled approach to diffusion-based classification trains class-conditional models P(text|class) and applies Bayes' theorem to compute P(class|text) for classification.

CARD: Classification and Regression Diffusion Models

Authors: Xizewen Han, Huangjie Zheng, Mingyuan Zhou
Venue: NeurIPS 2022
Date: June 2022
arXiv: 2206.07275

CARD combines a denoising diffusion-based conditional generative model with a pre-trained conditional mean estimator to predict the full distribution of y given x rather than point estimates. Provides instance-level confidence assessment for classification and outstanding conditional distribution prediction, especially when the conditional distribution is multi-modal. Uses continuous diffusion in label space. Outperforms Bayesian neural networks for uncertainty estimation. While designed for general classification and regression, directly applicable to text classification by treating categorical labels as response variables. Code available at github.com/XzwHan/CARD.

Diffusion Guided Language Modeling

Authors: Justin Lovelace, Varsha Kishore, Yiwei Chen, Kilian Weinberger
Venue: ACL 2024 (Findings)
Date: August 2024
ACL Anthology: 2024.findings-acl.887

Uses a guided diffusion model to produce a latent proposal that steers an auto-regressive language model to generate text with desired properties. Inherits the unmatched fluency of the auto-regressive approach and the plug-and-play flexibility of diffusion. Controlling a new attribute requires only training a single logistic regression classifier, which can be used for text classification via the diffusion guidance mechanism. Demonstrates flexible text control while maintaining generation quality.

Diffusion-LM Improves Controllable Text Generation

Authors: Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori Hashimoto
Venue: NeurIPS 2022
Date: May 2022
arXiv: 2205.14217

A non-autoregressive language model based on continuous diffusions that iteratively denoises sequences of Gaussian vectors into word vectors. Can condition on arbitrary classifiers that look at complex, global properties of sentences. Enables simple gradient-based algorithms to perform complex, controllable generation tasks by using classifiers for guidance. While focused on generation, establishes the framework for using external classifiers to guide diffusion processes in text.

Likelihood-Based Diffusion Language Models (Plaid)

Authors: Ishaan Gulrajani, Tatsunori Hashimoto
Venue: arXiv preprint
Date: May 2023
arXiv: 2305.18619

Introduces several methodological improvements for maximum-likelihood training of diffusion language models and studies scaling laws for diffusion models to find compute-optimal training regimes. The models can be used for likelihood-based classification tasks by computing log-likelihoods for different classes, providing a foundation for Bayesian classification approaches with diffusion models.

Robust Classification via a Single Diffusion Model (RDC)

Authors: Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, Jun Zhu
Venue: ICML 2024
Date: May 2023
arXiv: 2305.15241

Proposes Robust Diffusion Classifier (RDC), a generative classifier obtained from a single pre-trained diffusion model to be adversarially robust. RDC calculates class probability p(y|x) via Bayes' theorem by estimating p(x|y) using the conditional likelihood from a diffusion model. The conditional likelihood is approximated by the variational lower bound (ELBO), which involves calculating the noise prediction loss for every class under different noise levels. Introduces multi-head diffusion as a new backbone and develops efficient sampling strategies to reduce computational cost.

Your Diffusion Model is Secretly a Certifiably Robust Classifier

Authors: Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, Jun Zhu
Venue: NeurIPS 2024
Date: February 2024
arXiv: 2402.02316

Provides theoretical guarantees for diffusion classifiers by proving they possess O(1) Lipschitzness and establishing their certified robustness. Generalizes diffusion classifiers to classify Gaussian-corrupted data by introducing "Noised Diffusion Classifiers" (NDCs), deriving evidence lower bounds (ELBOs) for these distributions, and calculating classification probabilities via Bayes' theorem. Achieves over 80% certified robustness on CIFAR-10 under adversarial perturbations with ℓ2 norms less than 0.25, using a single off-the-shelf diffusion model without additional data.

Score-Based Generative Classifiers

Authors: Roland S. Zimmermann, Lukas Schott, Yang Song, Benjamin A. Dunn, David A. Klindt
Venue: NeurIPS 2021 Workshop (Deep Generative Models and Downstream Applications)
Date: October 2021
arXiv: 2110.00473

The foundational work on using score-based generative models (closely related to diffusion models) as classifiers. Proposes score-based generative classifiers (SBGC) that model the conditional likelihood p(x|y) of an image for each label and predict the label that maximizes this likelihood via Bayes' rule: argmax_y p(y|x) = argmax_y p(x|y)p(y). Achieves state-of-the-art classification accuracy for generative classifiers on CIFAR-10. While applied to computer vision, establishes the theoretical foundation for all subsequent diffusion-based classification work.

Zero-shot classification with diffusion models

Most zero-shot diffusion classification work focuses on image classification using text-to-image models, but the theoretical framework applies to text classification.

Your Diffusion Model is Secretly a Zero-Shot Classifier

Authors: Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak
Venue: ICCV 2023
Date: March 2023
arXiv: 2303.16203

Shows that density estimates from large-scale text-to-image diffusion models (Stable Diffusion, Imagen) can be leveraged for zero-shot classification without additional training. The "Diffusion Classifier" approach uses the ELBO to approximate log p(x|c) by measuring how well different text prompts predict the noise added to input images. Achieves 79.1% top-1 accuracy extracting classifiers from class-conditional diffusion models (DiT) trained on ImageNet. Demonstrates strong multimodal compositional reasoning abilities. While focused on image classification, establishes the theoretical framework for using diffusion models as zero-shot classifiers. Project page: diffusion-classifier.github.io.

Text-to-Image Diffusion Models are Zero-Shot Classifiers

Authors: Kevin Clark, Priyank Jaini
Venue: NeurIPS 2023
Date: March 2023
arXiv: 2303.15233

Evaluates text-to-image diffusion models (Stable Diffusion and Imagen) as zero-shot classifiers by using their denoising ability given text descriptions as a proxy for label likelihood. Uses ELBO approximation with diffusion models to classify images based on text descriptions. Achieves state-of-the-art results on shape/texture bias tests and attribute binding tasks. Compares with CLIP on image classification datasets, showing diffusion models can perform zero-shot classification through likelihood estimation.

A Simple and Efficient Baseline for Zero-Shot Generative Classification (Gaussian Diffusion Classifiers)

Authors: Various
Venue: arXiv preprint
Date: December 2024
arXiv: 2412.12594

Proposes Gaussian Diffusion Classifiers (GDC) that dramatically improve efficiency over previous diffusion-based classifiers, reducing classification time from ~1000 seconds to 0.03 seconds per ImageNet image. Combines text-to-image diffusion models with DINOv2 for zero-shot classification. Addresses the computational bottleneck that previously made diffusion classifiers impractical, representing a major step toward practical deployment.

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Venue: TMLR (Transactions on Machine Learning Research)
Date: May 2023
arXiv: 2305.10722

Turns pre-trained text-to-image diffusion models into few-shot discriminative learners by using cross-attention scores from Stable Diffusion to capture mutual influence between visual and textual information. Fine-tunes via attention-based prompt learning for image-text matching. Outperforms CLIP on compositional tasks. Code available at github.com/eric-ai-lab/Discffusion. Project page: sites.google.com/view/discffusion.

Additional relevant work on diffusion for NLP

Several papers explore diffusion mechanisms in text classification through alternative approaches, including graph neural networks and classifier guidance.

Deep Attention Diffusion Graph Neural Networks for Text Classification

Authors: Yonghao Liu, Renchu Guan, Fausto Giunchiglia, Yanchun Liang, Xiaoyue Feng
Venue: EMNLP 2021
Date: November 2021
ACL Anthology: 2021.emnlp-main.642

Proposes DADGNN model for text classification using graph neural networks with diffusion mechanisms. Addresses limitations of GNN-based text classification that only consider one-hop neighborhoods by using attention diffusion to bridge interaction between words and distant neighbors. While "diffusion" here refers to information propagation on graphs rather than denoising diffusion probabilistic models, represents early work connecting diffusion concepts with text classification.

Classifiers Guided Controllable Text Generation for Discrete Diffusion Language Models

Authors: Hang Jiang, Guoyong Cai, Sihui Li
Venue: NLPCC 2024 (Part III)
Date: November 2024

Uses classifiers to guide controllable text generation in discrete diffusion language models, combining diffusion models with classification guidance for improved text generation control. Demonstrates the interplay between classification and generation in discrete diffusion frameworks.

DiffuDetox: A Mixed Diffusion Model for Text Detoxification

Authors: Griffin Adams and others
Venue: ACL 2023
Date: June 2023
arXiv: 2306.08505

Combines conditional and unconditional diffusion models for text detoxification (a form of binary classification). Models are combined using a method inspired by the gradient of an implicit classifier p(c|x), using Bayes rule. The conditional model detoxifies text while the unconditional model guides the sampling process.

Conclusion: An emerging frontier with untapped potential

Diffusion-based text classification remains remarkably underdeveloped compared to the explosion of diffusion work in text generation and image classification. Only 5 papers directly apply diffusion to text classification tasks (2021-2025), with most appearing in 2024. This scarcity reveals a significant research opportunity: the theoretical frameworks exist, the foundational models have been developed, but systematic application to discriminative NLP tasks remains nascent.

The existing work clusters around three core strengths of diffusion classifiers: adversarial robustness (ROIC-DM, DiffuseDef), uncertainty quantification (CARD), and generative flexibility (ADD, generative-or-discriminative comparison). Papers consistently show diffusion models excel not at raw accuracy, but at providing more reliable, certifiable, and interpretable classifications. The certified robustness results (80%+ on CIFAR-10 under adversarial perturbations) suggest diffusion's true value lies in safety-critical applications.

Discrete diffusion models like D3PM, SEDD, and ADD provide the natural technical foundation for text, yet classification applications remain sparse. The recent ICML 2025 paper comparing generative versus discriminative approaches signals growing interest, while dramatic efficiency improvements (1000x speedup in GDC) address previous computational barriers. As diffusion models mature and computational costs decrease, we expect rapid expansion of this research frontier in 2025-2026, particularly for applications requiring robust, calibrated, and explainable text classification.

@bigsnarfdude
Copy link
Author

LLM abuse can be detected through a combination of automated technical methods and human monitoring, focusing on identifying unusual usage patterns, specific linguistic characteristics, and deviations from intended behavior.
Technical Detection Methods
Prompt Analysis and Validation:
Input Sanitization: Implementing robust mechanisms to filter or sanitize user inputs before they reach the LLM to neutralize hidden instructions or malformed data.
Anomaly Detection: Leveraging machine learning algorithms to identify unusual or potentially malicious prompts, such as those containing suspicious keywords, unusual character sequences, or attempts to bypass safety rules (jailbreaking).
LLM-as-a-Judge: Using a separate, potentially fine-tuned, LLM to classify prompts as clean or malicious. This LLM can check for compliance with safety guidelines or semantic similarity to known attacks.
Output Analysis:
Content Moderation: Automatically flagging or filtering outputs that contain hate speech, explicit material, misinformation, or other harmful content using pre-trained classifiers or specialized LLMs.
Fact Verification: Cross-checking claims made by the LLM against reliable, external knowledge bases (e.g., government websites, academic databases) to detect hallucinations or the spread of misinformation.
Linguistic/Statistical Pattern Analysis: Analyzing the LLM's output for specific linguistic patterns (e.g., lower perplexity, less diverse vocabulary, overly formulaic structure) that are characteristic of machine-generated text compared to human writing.
Watermarking: Embedding invisible signals or patterns into the LLM's output during the generation process, which can later be detected to verify if the text originated from a specific model.
System-Level Monitoring:
API Monitoring: Tracking API calls for unusual access patterns, such as a sudden spike in requests or calls with unusual parameters, which could indicate an attacker attempting to abuse the AI infrastructure.
Tracing and Logging: Tracing the flow of prompts and responses through the entire application chain (especially in RAG systems) to identify how an initial innocuous prompt might have mutated into a malicious instruction.
Human-in-the-Loop and Best Practices
Human Review: Implementing a human-in-the-loop system where human experts review flagged cases that require nuanced contextual judgment, especially in critical applications like healthcare or finance.
Adversarial Testing: Continuously testing the LLM with carefully crafted adversarial prompts (e.g., prompt injection, jailbreaks) to discover vulnerabilities and improve detection systems.
Developer Education: Training developers on secure coding practices for AI systems and raising awareness about potential LLM security threats.
Role-Based Access Control (RBAC): Restricting access to certain sensitive AI functionalities based on user roles and trust scores to minimize the attack surface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment