Skip to content

Instantly share code, notes, and snippets.

@masta-g3
Last active November 15, 2024 18:51
Show Gist options
  • Save masta-g3/8f7227397b1053b42e727bbd6abf1d2e to your computer and use it in GitHub Desktop.
Save masta-g3/8f7227397b1053b42e727bbd6abf1d2e to your computer and use it in GitHub Desktop.
Updated 2024-11-15
Cedille: A large autoregressive French language model
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks
Query2doc: Query Expansion with Large Language Models
The Internal State of an LLM Knows When its Lying
Structured information extraction from complex scientific text with fine-tuned large language models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Large Language Models Encode Clinical Knowledge
PoET: A generative model of protein families as sequences-of-sequences
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Modeling Protein Using Large-scale Pretrain Language Model
A Watermark for Large Language Models
GPT is becoming a Turing machine: Here are some ways to program it
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Large Language Models are Zero-Shot Reasoners
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
How is ChatGPT's behavior changing over time?
Meta-Transformer: A Unified Framework for Multimodal Learning
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Getting More out of Large Language Models for Proofs
Teaching Small Language Models to Reason
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Learning to Retrieve In-Context Examples for Large Language Models
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Context-Aware Abbreviation Expansion Using Large Language Models
Focused Transformer: Contrastive Training for Context Scaling
Flash normalization: fast RMSNorm for LLMs
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Long-range Language Modeling with Self-retrieval
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
Towards Generalist Biomedical AI
Shortcut Learning of Large Language Models in Natural Language Understanding
Quantifying Memorization Across Neural Language Models
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Copy Is All You Need
Automatic Chain of Thought Prompting in Large Language Models
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Evaluating the Text-to-SQL Capabilities of Large Language Models
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Are Emergent Abilities of Large Language Models a Mirage?
Enhancing Network Management Using Code Generated by Large Language Models
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks
ThinkSum: Probabilistic reasoning over sets using large language models
On the Tool Manipulation Capability of Open-source Large Language Models
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
WavJourney: Compositional Audio Creation with Large Language Models
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course
Secrets of RLHF in Large Language Models Part I: PPO
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes
Challenges and Applications of Large Language Models
SPOT: Knowledge-Enhanced Language Representations for Information Extraction
Kosmos-2: Grounding Multimodal Large Language Models to the World
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference
SKILL: Structured Knowledge Infusion for Large Language Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Understanding Social Reasoning in Language Models with Language Models
The Science of Detecting LLM-Generated Texts
CausalLM is not optimal for in-context learning
Questioning the Survey Responses of Large Language Models
Extending Context Window of Large Language Models via Positional Interpolation
ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing
Probing Factually Grounded Content Transfer with Factual Ablation
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Pre-Trained Large Language Models for Industrial Control
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Self-Alignment with Instruction Backtranslation
Guiding Pretraining in Reinforcement Learning with Large Language Models
Large Language Models are Zero-Shot Rankers for Recommender Systems
Model evaluation for extreme risks
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL
A Simple and Effective Pruning Approach for Large Language Models
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
PromptChainer: Chaining Large Language Model Prompts through Visual Programming
PIPPA: A Partially Synthetic Conversational Dataset
Let's Verify Step by Step
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
Large Language Models Are Reasoning Teachers
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Connecting Neural Response measurements & Computational Models of language: a non-comprehensive guide
Accelerating LLM Inference with Staged Speculative Decoding
Large Language Models for Supply Chain Optimization
Do Large Language Models know what humans know?
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction
Faithful Chain-of-Thought Reasoning
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Superposition of many models into one
Learning to Model the World with Language
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Unifying Large Language Models and Knowledge Graphs: A Roadmap
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models
QLoRA: Efficient Finetuning of Quantized LLMs
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Co-Writing with Opinionated Language Models Affects Users' Views
Language models show human-like content effects on reasoning
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code
OpenAGI: When LLM Meets Domain Experts
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Beyond Generating Code: Evaluating GPT on a Data Visualization Course
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
Studying Large Language Model Generalization with Influence Functions
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)
From Sparse to Soft Mixtures of Experts
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
Large Language Model Guided Tree-of-Thought
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
When Geometric Deep Learning Meets Pretrained Protein Language Models
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level
Language models are weak learners
How Many Demonstrations Do You Need for In-context Learning?
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Gorilla: Large Language Model Connected with Massive APIs
Automatic Generation of Programming Exercises and Code Explanations using Large Language Models
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models
WebArena: A Realistic Web Environment for Building Autonomous Agents
Language Models can Solve Computer Tasks
ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation?
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
Invariant Language Modeling
Solving Quantitative Reasoning Problems with Language Models
Personality Traits in Large Language Models
Prompting Large Language Models with Speech Recognition Abilities
Selective Annotation Makes Language Models Better Few-Shot Learners
Using Captum to Explain Generative Language Models
Fine-Tuning Language Models with Just Forward Passes
In-context Autoencoder for Context Compression in a Large Language Model
Entity Projection via Machine Translation for Cross-Lingual NER
OctoPack: Instruction Tuning Code Large Language Models
AlpaGasus: Training A Better Alpaca with Fewer Data
Large Language Models Are Human-Level Prompt Engineers
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach
Large Language Models Can Self-Improve
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
More Agents Is All You Need
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Teaching Algorithmic Reasoning via In-context Learning
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Automatic Evaluation of Attribution by Large Language Models
Generative Agents: Interactive Simulacra of Human Behavior
ALERT: Adapting Language Models to Reasoning Tasks
How does the pre-training objective affect what large language models learn about linguistic properties?
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
FLIRT: Feedback Loop In-context Red Teaming
News Summarization and Evaluation in the Era of GPT-3
Galactica: A Large Language Model for Science
Towards Reasoning in Large Language Models: A Survey
Chain-Of-Thought Prompting Under Streaming Batch: A Case Study
Shepherd: A Critic for Language Model Generation
Emergent autonomous scientific research capabilities of large language models
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Universal and Transferable Adversarial Attacks on Aligned Language Models
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Complexity-Based Prompting for Multi-Step Reasoning
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Scaling TransNormer to 175 Billion Parameters
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Learning ASR pathways: A sparse multilingual ASR model
Stay on topic with Classifier-Free Guidance
Constitutional AI: Harmlessness from AI Feedback
Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis
Teaching Arithmetic to Small Transformers
Demystifying GPT Self-Repair for Code Generation
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education
Link-Context Learning for Multimodal LLMs
Large Language Models Perform Diagnostic Reasoning
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
AgentBench: Evaluating LLMs as Agents
Xmodel-LM Technical Report
Simple synthetic data reduces sycophancy in large language models
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
Re-visiting Automated Topic Model Evaluation with Large Language Models
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Adaptive Test Generation Using a Large Language Model
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
PaLM: Scaling Language Modeling with Pathways
Teaching Large Language Models to Self-Debug
Building Cooperative Embodied Agents Modularly with Large Language Models
Urdu text in natural scene images: a new dataset and preliminary text detection
LIMA: Less Is More for Alignment
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs
GPT-NER: Named Entity Recognition via Large Language Models
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Code as Policies: Language Model Programs for Embodied Control
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models
Inspecting and Editing Knowledge Representations in Language Models
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Med-Flamingo: a Multimodal Medical Few-shot Learner
Jigsaw: Large Language Models meet Program Synthesis
Large Language Models Struggle to Learn Long-Tail Knowledge
Llama 2: Open Foundation and Fine-Tuned Chat Models
Textbooks Are All You Need
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Three Bricks to Consolidate Watermarks for Large Language Models
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
One-shot Machine Teaching: Cost Very Few Examples to Converge Faster
Theory of Mind May Have Spontaneously Emerged in Large Language Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Language Is Not All You Need: Aligning Perception with Language Models
Mind's Eye: Grounded Language Model Reasoning through Simulation
StarCoder: may the source be with you!
Self-Critique Prompting with Large Language Models for Inductive Instructions
PaLM 2 Technical Report
Repository-Level Prompt Generation for Large Language Models of Code
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Measuring and Narrowing the Compositionality Gap in Language Models
Differentially Private Fine-tuning of Language Models
A Latent Space Theory for Emergent Abilities in Large Language Models
Reflexion: Language Agents with Verbal Reinforcement Learning
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories
LEACE: Perfect linear concept erasure in closed form
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
FinGPT: Open-Source Financial Large Language Models
Block Belief Propagation for Parameter Learning in Markov Random Fields
Lost in the Middle: How Language Models Use Long Contexts
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
The Hydra Effect: Emergent Self-repair in Language Model Computations
Educational data augmentation in physics education research using ChatGPT
PolyLM: An Open Source Polyglot Large Language Model
Towards Expert-Level Medical Question Answering with Large Language Models
Is GPT-4 a Good Data Analyst?
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models
Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
ReAct: Synergizing Reasoning and Acting in Language Models
Augmenting Language Models with Long-Term Memory
BloombergGPT: A Large Language Model for Finance
A Systematic Evaluation of Large Language Models of Code
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Robot Task Planning and Situation Handling in Open Worlds
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
Emergent Abilities of Large Language Models
Can Large Language Models design a Robot?
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding
DarkBERT: A Language Model for the Dark Side of the Internet
Measuring Faithfulness in Chain-of-Thought Reasoning
Retentive Network: A Successor to Transformer for Large Language Models
Dissociating language and thought in large language models: a cognitive perspective
Large Language Models are Better Reasoners with Self-Verification
Can large language models reason about medical questions?
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
ARB: Advanced Reasoning Benchmark for Large Language Models
Rethinking with Retrieval: Faithful Large Language Model Inference
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Large Language Models as Corporate Lobbyists
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Talking About Large Language Models
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Large Language Models Can Be Easily Distracted by Irrelevant Context
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
OpenICL: An Open-Source Framework for In-context Learning
Emergence of Maps in the Memories of Blind Navigation Agents
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation
Learning to Reason and Memorize with Self-Notes
ChemCrow: Augmenting large-language models with chemistry tools
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Learning to Compress Prompts with Gist Tokens
Unlimiformer: Long-Range Transformers with Unlimited Length Input
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
ChatGPT: Applications, Opportunities, and Threats
Memory Augmented Large Language Models are Computationally Universal
PaLM-E: An Embodied Multimodal Language Model
M2T: Masking Transformers Twice for Faster Decoding
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Auditing large language models: a three-layered approach
Language models in molecular discovery
Offsite-Tuning: Transfer Learning without Full Model
MusicLM: Generating Music From Text
Context-faithful Prompting for Large Language Models
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
GPTutor: a ChatGPT-powered programming tool for code explanation
Larger language models do in-context learning differently
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
Multimodal Chain-of-Thought Reasoning in Language Models
Recitation-Augmented Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Eight Things to Know about Large Language Models
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
A Survey on Model Compression for Large Language Models
Active Retrieval Augmented Generation
Toolformer: Language Models Can Teach Themselves to Use Tools
Evaluating Verifiability in Generative Search Engines
Augmented Language Models: a Survey
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness
Giraffe: Adventures in Expanding Context Lengths in LLMs
LLM As DBA
Scaling Transformer to 1M tokens and beyond with RMT
TidyBot: Personalized Robot Assistance with Large Language Models
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability
Active Prompting with Chain-of-Thought for Large Language Models
A Categorical Archive of ChatGPT Failures
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity
Better Language Models of Code through Self-Improvement
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
The Capacity for Moral Self-Correction in Large Language Models
Poisoning Language Models During Instruction Tuning
Prompt2Model: Generating Deployable Models from Natural Language Instructions
Data Selection for Language Models via Importance Resampling
Enabling Conversational Interaction with Mobile UI using Large Language Models
Evidence of Meaning in Language Models Trained on Programs
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Symbol tuning improves in-context learning in language models
REPLUG: Retrieval-Augmented Black-Box Language Models
Why do Nearest Neighbor Language Models Work?
Prismer: A Vision-Language Model with An Ensemble of Experts
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Self-evolving Agents with reflective and memory-augmented abilities
CALYPSO: LLMs as Dungeon Masters' Assistants
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice
Code Llama: Open Foundation Models for Code
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models
Faithful to Whom? Questioning Interpretability Measures in NLP
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
How Good Are Large Language Models at Out-of-Distribution Detection?
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Can Large Language Models Find And Fix Vulnerable Software?
Large Language Models for Software Engineering: A Systematic Literature Review
Informed Named Entity Recognition Decoding for Generative Language Models
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models
Better Zero-Shot Reasoning with Role-Play Prompting
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis
A Survey on Large Language Model based Autonomous Agents
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Evaluating ChatGPT and GPT-4 for Visual Programming
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Cabrita: closing the gap for foreign languages
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems
ProAgent: Building Proactive Cooperative AI with Large Language Models
Instruction Position Matters in Sequence Generation with Large Language Models
Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Large Language Model as Autonomous Decision Maker
Large Language Models as Superpositions of Cultural Perspectives
Activation Addition: Steering Language Models Without Optimization
Enhancing Recommender Systems with Large Language Model Reasoning Graphs
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
An Empirical Study on Challenging Math Problem Solving with GPT-4
Forward-Backward Reasoning in Large Language Models for Verification
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI
Dynamic Planning with a LLM
"Guinea Pig Trials" Utilizing GPT: A Novel Smart Agent-Based Modeling Approach for Studying Firm Competition and Collusion
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models
Bridging the Gap: Deciphering Tabular Data Using Large Language Model
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Prompting Is Programming: A Query Language for Large Language Models
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Knowledge Graph Prompting for Multi-Document Question Answering
GPT detectors are biased against non-native English writers
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Can Language Models Learn to Listen?
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence
Towards an Understanding of Large Language Models in Software Engineering Tasks
YaRN: Efficient Context Window Extension of Large Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Company Similarity using Large Language Models
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs
Instruction Tuning for Large Language Models: A Survey
Language to Rewards for Robotic Skill Synthesis
Is There Any Social Principle for LLM-Based Agents?
A Study on Robustness and Reliability of Large Language Model Code Generation
Leveraging Large Language Models for Pre-trained Recommender Systems
Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models
LLaSM: Large Language and Speech Model
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Pretraining on the Test Set Is All You Need
The AI Revolution in Education: Will AI Replace or Assist Teachers in Higher Education?
Reinforced Self-Training (ReST) for Language Modeling
Fast Inference from Transformers via Speculative Decoding
LoRA: Low-Rank Adaptation of Large Language Models
Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models
AI Deception: A Survey of Examples, Risks, and Potential Solutions
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Blockwise Parallel Decoding for Deep Autoregressive Models
Assigning AI: Seven Approaches for Students, with Prompts
Conformal Prediction with Large Language Models for Multi-Choice Question Answering
Attention: Marginal Probability is All You Need?
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
XGen-7B Technical Report
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Can Programming Languages Boost Each Other via Instruction Tuning?
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Efficient RLHF: Reducing the Memory Usage of PPO
Universal Self-adaptive Prompting
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
One Wide Feedforward is All You Need
Better Zero-Shot Reasoning with Self-Adaptive Prompting
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
SoTaNa: The Open-Source Software Development Assistant
GPT Can Solve Mathematical Problems Without a Calculator
Physically Grounded Vision-Language Models for Robotic Manipulation
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
FLM-101B: An Open LLM and How to Train It with $100K Budget
LaMDA: Language Models for Dialog Applications
LMDX: Language Model-based Document Information Extraction and Localization
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Do Multilingual Language Models Think Better in English?
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Textbooks Are All You Need II: phi-1.5 technical report
Replacing softmax with ReLU in Vision Transformers
Investigating Answerability of LLMs for Long-Form Question Answering
Vector Search with OpenAI Embeddings: Lucene Is All You Need
The Rise and Potential of Large Language Model Based Agents: A Survey
Cure the headache of Transformers via Collinear Constrained Attention
Uncovering mesa-optimization algorithms in Transformers
Large Language Models for Compiler Optimization
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Chain-of-Verification Reduces Hallucination in Large Language Models
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Compositional Foundation Models for Hierarchical Planning
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
Sparse Autoencoders Find Highly Interpretable Features in Language Models
DreamLLM: Synergistic Multimodal Comprehension and Creation
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Improving Language Models with Advantage-based Offline Policy Gradients
Improving Factuality and Reasoning in Language Models through Multiagent Debate
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Boolformer: Symbolic Regression of Logic Functions with Transformers
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
TP-Aware Dequantization
LASER: LLM Agent with State-Space Exploration for Web Navigation
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Baichuan 2: Open Large-scale Language Models
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Efficient Benchmarking (of Language Models)
Context is Environment
Analyzing Transformer Dynamics as Movement through Embedding Space
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
RMT: Retentive Networks Meet Vision Transformers
Stack-and-Delay: a new codebook pattern for music generation
Neurons in Large Language Models: Dead, N-gram, Positional
Large Language Model for Science: A Study on P vs. NP
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Data Augmentation for Spoken Language Understanding via Pretrained Language Models
Petals: Collaborative Inference and Fine-tuning of Large Models
Scaling Laws for Sparsely-Connected Foundation Models
Kosmos-2.5: A Multimodal Literate Model
PDFTriage: Question Answering over Long, Structured Documents
Statistical Rejection Sampling Improves Preference Optimization
Stabilizing RLHF through Advantage Model and Selective Rehearsal
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Leveraging Contextual Information for Effective Entity Salience Detection
NExT-GPT: Any-to-Any Multimodal LLM
Are Emergent Abilities in Large Language Models just In-Context Learning?
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Large-Scale Automatic Audiobook Creation
Recovering from Privacy-Preserving Masking with Large Language Models
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
RAIN: Your Language Models Can Align Themselves without Finetuning
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Hypothesis Search: Inductive Reasoning with Language Models
Agents: An Open-source Framework for Autonomous Language Agents
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Gated recurrent neural networks discover attention
Contrastive Decoding Improves Reasoning in Large Language Models
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Adapting Large Language Models via Reading Comprehension
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
MindAgent: Emergent Gaming Interaction
Graph Neural Prompting with Large Language Models
Sparks of Artificial General Intelligence: Early experiments with GPT-4
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Efficient Post-training Quantization with FP8 Formats
Taken out of context: On measuring situational awareness in LLMs
Jointly Training Large Autoregressive Multimodal Models
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Curriculum Learning with Adam: The Devil Is in the Wrong Details
OWL: A Large Language Model for IT Operations
Faith and Fate: Limits of Transformers on Compositionality
CodePlan: Repository-level Coding using LLMs and Planning
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Efficient Memory Management for Large Language Model Serving with PagedAttention
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
SCREWS: A Modular Framework for Reasoning with Revisions
Transformer models: an introduction and catalog
Small-scale proxies for large-scale Transformer training instabilities
Effective Long-Context Scaling of Foundation Models
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Qwen Technical Report
Attention Approximates Sparse Distributed Memory
Calibrating LLM-Based Evaluator
Ambiguity-Aware In-Context Learning with Large Language Models
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Vision Transformers Need Registers
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Language Modeling Is Compression
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Aligning Large Multimodal Models with Factually Augmented RLHF
Large Language Models as Optimizers
SlimPajama-DC: Understanding Data Combinations for LLM Training
Finite Scalar Quantization: VQ-VAE Made Simple
Physics of Language Models: Part 3.2, Knowledge Manipulation
Efficient Streaming Language Models with Attention Sinks
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
LLM-grounded Video Diffusion Models
Enable Language Models to Implicitly Learn Self-Improvement From Data
Emergent Analogical Reasoning in Large Language Models
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Large Language Models Cannot Self-Correct Reasoning Yet
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Language Models Represent Space and Time
Retrieval meets Long Context Large Language Models
Borges and AI
Can large language models provide useful feedback on research papers? A large-scale empirical analysis
Ring Attention with Blockwise Transformers for Near-Infinite Context
Can Language Models be Instructed to Protect Personal Information?
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Who's Harry Potter? Approximate Unlearning in LLMs
Low-Resource Languages Jailbreak GPT-4
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
EcoAssistant: Using LLM Assistant More Affordably and Accurately
How FaR Are Large Language Models From Agents with Theory-of-Mind?
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
HeaP: Hierarchical Policies for Web Actions using LLMs
A Long Way to Go: Investigating Length Correlations in RLHF
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Think before you speak: Training Language Models With Pause Tokens
Mistral 7B
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation
Large Language Models can Learn Rules
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Large Language Models Are Zero-Shot Time Series Forecasters
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Learning Interactive Real-World Simulators
FireAct: Toward Language Agent Fine-tuning
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Text Embeddings Reveal (Almost) As Much As Text
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Lemur: Harmonizing Natural Language and Code for Language Agents
LangNav: Language as a Perceptual Representation for Navigation
The LAMBADA dataset: Word prediction requiring a broad discourse context
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Toward Joint Language Modeling for Speech Units and Text
MemGPT: Towards LLMs as Operating Systems
A Zero-Shot Language Agent for Computer Control with Structured Reflection
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
The Consensus Game: Language Model Generation via Equilibrium Search
Table-GPT: Table-tuned GPT for Diverse Table Tasks
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
Arbitrary Length Generalization for Addition
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation
Deep Learning Scaling is Predictable, Empirically
MLQA: Evaluating Cross-lingual Extractive Question Answering
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Intersectional Bias in Hate Speech and Abusive Language Datasets
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning
AI Ethics Issues in Real World: Evidence from AI Incident Database
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
Measuring Mathematical Problem Solving With the MATH Dataset
Can Machines Learn Morality? The Delphi Experiment
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
AndroidEnv: A Reinforcement Learning Platform for Android
Demoting Racial Bias in Hate Speech Detection
Social Bias Frames: Reasoning about Social and Power Implications of Language
Characterising Bias in Compressed Models
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Towards Robust Toxic Content Classification
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety
Towards Continual Knowledge Learning of Language Models
The Pushshift Reddit Dataset
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus
One Epoch Is All You Need
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
NewsQA: A Machine Comprehension Dataset
AmbiPun: Generating Humorous Puns with Ambiguous Context
Deal or No Deal? End-to-End Learning for Negotiation Dialogues
Competition-Level Code Generation with AlphaCode
STaR: Bootstrapping Reasoning With Reasoning
Efficient Neural Architecture Search via Parameter Sharing
Recursively Summarizing Books with Human Feedback
Habitat: A Platform for Embodied AI Research
Generate & Rank: A Multi-task Framework for Math Word Problems
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Mitigating Statistical Bias within Differentially Private Synthetic Data
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
RecGPT: Generative Pre-training for Text-based Recommendation
TruthfulQA: Measuring How Models Mimic Human Falsehoods
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Controlling Style in Generated Dialogue
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Societal Biases in Language Generation: Progress and Challenges
Counterfactual Fairness in Text Classification through Robustness
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Deep Double Descent: Where Bigger Models and More Data Hurt
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations
InCoder: A Generative Model for Code Infilling and Synthesis
Back to the Future: On Potential Histories in NLP
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Sharp Minima Can Generalize For Deep Nets
Self-attention Does Not Need $O(n^2)$ Memory
Measuring the Carbon Intensity of AI in Cloud Instances
SocialIQA: Commonsense Reasoning about Social Interactions
Generating Long Sequences with Sparse Transformers
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
QAmeleon: Multilingual QA with Only 5 Examples
CTRL: A Conditional Transformer Language Model for Controllable Generation
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Is neural language acquisition similar to natural? A chronological probing study
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Buffer Overflow in Mixture of Experts
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Bag of Tricks for Efficient Text Classification
Automatic Detection of Machine Generated Text: A Critical Survey
Adversarial Training for Large Neural Language Models
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
TALM: Tool Augmented Language Models
Training Language Models with Language Feedback
Toxicity in Multilingual Machine Translation at Scale
PEER: A Collaborative Language Model
On the Multilingual Capabilities of Very Large-Scale English Language Models
LLaMA: Open and Efficient Foundation Language Models
SECure: A Social and Environmental Certificate for AI Systems
Gaussian Error Linear Units (GELUs)
RoFormer: Enhanced Transformer with Rotary Position Embedding
Measuring Massive Multitask Language Understanding
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making
Leveraging QA Datasets to Improve Generative Data Augmentation
Decoupled Weight Decay Regularization
A Distributional Approach to Controlled Text Generation
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
The Turking Test: Can Language Models Understand Instructions?
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Language Models (Mostly) Know What They Know
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Towards Understanding and Mitigating Social Biases in Language Models
Discovering and Categorising Language Biases in Reddit
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation
Training Verifiers to Solve Math Word Problems
The Curse of Recursion: Training on Generated Data Makes Models Forget
Compositional Semantic Parsing with Large Language Models
Transforming Question Answering Datasets Into Natural Language Inference Datasets
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
The Values Encoded in Machine Learning Research
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems
Ethical and social risks of harm from Language Models
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Understanding HTML with Large Language Models
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
AudioLM: a Language Modeling Approach to Audio Generation
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Behavior Cloned Transformers are Neurosymbolic Reasoners
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Thou shalt not hate: Countering Online Hate Speech
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Participation is not a Design Fix for Machine Learning
Retrieval Augmentation Reduces Hallucination in Conversation
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize
How Many Data Samples is an Additional Instruction Worth?
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Crosslingual Generalization through Multitask Finetuning
The Curious Case of Neural Text Degeneration
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
VinaLLaMA: LLaMA-based Vietnamese Foundation Model
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Evaluating the Social Impact of Generative AI Systems in Systems and Society
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
Towards A Rigorous Science of Interpretable Machine Learning
An Analysis of the Automatic Bug Fixing Performance of ChatGPT
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models
Defending Against Neural Fake News
Analyzing Dynamic Adversarial Training Data in the Limit
Criticality in Formal Languages and Statistical Physics
Generating Wikipedia by Summarizing Long Sequences
Gender Bias in Contextualized Word Embeddings
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Deep Generative Dual Memory Network for Continual Learning
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Persistent Anti-Muslim Bias in Large Language Models
Mirages: On Anthropomorphism in Dialogue Systems
Deep Learning for Symbolic Mathematics
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
A Survey On Universal Adversarial Attack
Atlas: Few-shot Learning with Retrieval Augmented Language Models
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
A framework for the extraction of Deep Neural Networks by leveraging public data
Recipes for building an open-domain chatbot
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Measuring the Effects of Data Parallelism on Neural Network Training
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
X-SQL: reinforce schema representation with context
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
FastText.zip: Compressing text classification models
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
A General Language Assistant as a Laboratory for Alignment
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly
Transformer tricks: Precomputing the first layer
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Deep Learning Based Text Classification: A Comprehensive Review
Automated Hate Speech Detection and the Problem of Offensive Language
Multi-Dimensional Gender Bias Classification
Extracting Training Data from Large Language Models
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
SPLADE-v3: New baselines for SPLADE
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
FlowQA: Grasping Flow in History for Conversational Machine Comprehension
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey
Improving alignment of dialogue agents via targeted human judgements
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing
Explanation in Artificial Intelligence: Insights from the Social Sciences
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Revealing Persona Biases in Dialogue Systems
GeDi: Generative Discriminator Guided Sequence Generation
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
UL2: Unifying Language Learning Paradigms
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Does Gender Matter? Towards Fairness in Dialogue Systems
Energy and Policy Considerations for Deep Learning in NLP
Tools Fail: Detecting Silent Errors in Faulty Tools
The False Promise of Imitating Proprietary LLMs
Directional Bias Amplification
Hierarchical Text-Conditional Image Generation with CLIP Latents
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons
Task-aware Retrieval with Instructions
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Reading Wikipedia to Answer Open-Domain Questions
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Teaching language models to support answers with verified quotes
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
MasakhaNER: Named Entity Recognition for African Languages
Predicting the Type and Target of Offensive Posts in Social Media
Learning to Model Editing Processes
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles
Quantifying the Carbon Emissions of Machine Learning
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Chasing Carbon: The Elusive Environmental Footprint of Computing
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion
Distilling Reasoning Capabilities into Smaller Language Models
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
WebGPT: Browser-assisted question-answering with human feedback
Making Large Language Models Better Reasoners with Step-Aware Verifier
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
SGPT: GPT Sentence Embeddings for Semantic Search
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models
Building a Conversational Agent Overnight with Dialogue Self-Play
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
Neural Machine Translation of Rare Words with Subword Units
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Know What You Don't Know: Unanswerable Questions for SQuAD
Longformer: The Long-Document Transformer
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
A Constructive Prediction of the Generalization Error Across Scales
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
KERMIT: Generative Insertion-Based Modeling for Sequences
mGPT: Few-Shot Learners Go Multilingual
The Natural Language Decathlon: Multitask Learning as Question Answering
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents
A Survey of Race, Racism, and Anti-Racism in NLP
Unraveling the Hidden Environmental Impacts of AI Solutions for Environment
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Hyperbolic Image-Text Representations
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Pretraining Language Models with Human Preferences
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English
MTEB: Massive Text Embedding Benchmark
Interscript: A dataset for interactive learning of scripts through error feedback
Looped Transformers as Programmable Computers
Inner Monologue: Embodied Reasoning through Planning with Language Models
No Language Left Behind: Scaling Human-Centered Machine Translation
Collaborative Storytelling with Large-scale Neural Language Models
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
Recipes for Safety in Open-domain Chatbots
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Pre-Trained Language Models for Interactive Decision-Making
Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Formal Algorithms for Transformers
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
HellaSwag: Can a Machine Really Finish Your Sentence?
Teaching Language Models to Self-Improve through Interactive Demonstrations
Ranking LLM-Generated Loop Invariants for Program Verification
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
When can transformers reason with abstract symbols?
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language Models are Few-shot Multilingual Learners
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
AutoMix: Automatically Mixing Language Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Pre-trained Summarization Distillation
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Improving Large Language Model Fine-tuning for Solving Math Problems
Language Models are General-Purpose Interfaces
Llemma: An Open Language Model For Mathematics
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Gender Bias in Machine Translation
Towards a Human-like Open-Domain Chatbot
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
A Network-based End-to-End Trainable Task-oriented Dialogue System
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Cloze-driven Pretraining of Self-attention Networks
Universal Language Model Fine-tuning for Text Classification
OPT: Open Pre-trained Transformer Language Models
Towards Zero-Label Language Learning
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models
Fine-tuned Language Models are Continual Learners
3D-GPT: Procedural 3D Modeling with Large Language Models
PAL: Program-aided Language Models
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Large Language Models for Software Engineering: Survey and Open Problems
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots
Self-critiquing models for assisting human evaluators
Towards Understanding Sycophancy in Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Finetuned Language Models Are Zero-Shot Learners
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Generating Sequences by Learning to Self-Correct
The Depth-to-Width Interplay in Self-Attention
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
Internet-augmented language models through few-shot prompting for open-domain question answering
GLM-130B: An Open Bilingual Pre-trained Model
Three scenarios for continual learning
Eureka: Human-Level Reward Design via Coding Large Language Models
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
An Explanation of In-context Learning as Implicit Bayesian Inference
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Snapshot Ensembles: Train 1, get M for free
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
On the Planning Abilities of Large Language Models -- A Critical Investigation
Efficient Estimation of Word Representations in Vector Space
Visualizing the Loss Landscape of Neural Nets
Contrastive Preference Learning: Learning from Human Feedback without RL
High-Resolution Image Synthesis with Latent Diffusion Models
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
H2O Open Ecosystem for State-of-the-art Large Language Models
Calibrate Before Use: Improving Few-Shot Performance of Language Models
All-in-One Image-Grounded Conversational Agents
Interactive Task Planning with Language Models
Can AI-Generated Text be Reliably Detected?
BitNet: Scaling 1-bit Transformers for Large Language Models
Scaling Laws for Neural Language Models
Self-Refine: Iterative Refinement with Self-Feedback
Adversarial Environment Generation for Learning to Navigate the Web
Cross-Lingual Language Model Meta-Pretraining
Creative Robot Tool Use with Large Language Models
Simple and Effective Multi-Paragraph Reading Comprehension
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
VeRA: Vector-based Random Matrix Adaptation
Open-Ended Learning Leads to Generally Capable Agents
Exploring the Boundaries of GPT-4 in Radiology
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Eliciting Human Preferences with Language Models
One-Shot Learning from a Demonstration with Hierarchical Latent Language
OpenAgents: An Open Platform for Language Agents in the Wild
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Specific versus General Principles for Constitutional AI
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Task2Vec: Task Embedding for Meta-Learning
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Tuna: Instruction Tuning using Feedback from Large Language Models
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Transcending Scaling Laws with 0.1% Extra Compute
InstructExcel: A Benchmark for Natural Language Instruction in Excel
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Understanding Retrieval Augmentation for Long-Form Question Answering
A Neural Conversational Model
Exploring the Limits of Language Modeling
Scaling Instruction-Finetuned Language Models
Learning Performance-Improving Code Edits
Training Compute-Optimal Large Language Models
Instruction Tuning with GPT-4
Holistic Evaluation of Language Models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Large Language Models as Analogical Reasoners
Negative Training for Neural Dialogue Response Generation
On the Opportunities and Risks of Foundation Models
Dissecting In-Context Learning of Translations in GPTs
Carbon Emissions and Large Neural Network Training
Faithful Reasoning Using Large Language Models
Detecting Pretraining Data from Large Language Models
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Unified Language Model Pre-training for Natural Language Understanding and Generation
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Predictability and Surprise in Large Generative Models
Alignment of Language Agents
Zephyr: Direct Distillation of LM Alignment
Binding Language Models in Symbolic Languages
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
The Evolved Transformer
Detecting Hate Speech with GPT-3
Learning to summarize from human feedback
Efficient Large Scale Language Modeling with Mixtures of Experts
Jailbreaking Black Box Large Language Models in Twenty Queries
How do Language Models Bind Entities in Context?
Program Synthesis with Large Language Models
Challenges in Detoxifying Language Models
A Deep Reinforced Model for Abstractive Summarization
Moral Foundations of Large Language Models
Training Production Language Models without Memorizing User Data
A Deep Reinforcement Learning Chatbot
RT-1: Robotics Transformer for Real-World Control at Scale
Entity Tracking in Language Models
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Controlled Decoding from Language Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
FP8-LM: Training FP8 Large Language Models
The Perils & Promises of Fact-checking with Large Language Models
Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)?
Unsolved Problems in ML Safety
Woodpecker: Hallucination Correction for Multimodal Large Language Models
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Data-Centric Financial Large Language Models
CodeFusion: A Pre-trained Diffusion Model for Code Generation
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Personas as a Way to Model Truthfulness in Language Models
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
CLEX: Continuous Length Extrapolation for Large Language Models
ALCUNA: Large Language Models Meet New Knowledge
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Large Language Models as Generalizable Policies for Embodied Tasks
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Linear Representations of Sentiment in Large Language Models
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
MM-VID: Advancing Video Understanding with GPT-4V(ision)
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
ChipNeMo: Domain-Adapted LLMs for Chip Design
What's In My Big Data?
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve
Idempotent Generative Network
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
NEFTune: Noisy Embeddings Improve Instruction Finetuning
The Impact of Depth and Width on Transformer Language Model Generalization
FlashDecoding++: Faster Large Language Model Inference on GPUs
Skywork: A More Open Bilingual Foundation Model
GRIM: GRaph-based Interactive narrative visualization for gaMes
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Does GPT-4 Pass the Turing Test?
Text Rendering Strategies for Pixel Language Models
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Learning From Mistakes Makes LLM Better Reasoner
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation
Ultra-Long Sequence Distributed Transformer
Ziya2: Data-centric Learning is All LLMs Need
GLaMM: Pixel Grounding Large Multimodal Model
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Unveiling Safety Vulnerabilities of Large Language Models
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Levels of AGI: Operationalizing Progress on the Path to AGI
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
CogVLM: Visual Expert for Pretrained Language Models
Tailoring Self-Rationalizers with Multi-Reward Distillation
NExT-Chat: An LMM for Chat, Detection and Segmentation
The Efficiency Misnomer
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Training Dynamics of Contextual N-Grams in Language Models
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Gzip versus bag-of-words for text classification
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
GPT4All: An Ecosystem of Open Source Compressed Language Models
Evaluating Large Language Models: A Comprehensive Survey
Leveraging Large Language Models for Automated Proof Synthesis in Rust
GPTScore: Evaluate as You Desire
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Simple and Controllable Music Generation
Can LLMs Follow Simple Rules?
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Memory Augmented Language Models through Mixture of Word Experts
Language Models can be Logical Solvers
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
ADaPT: As-Needed Decomposition and Planning with Language Models
FinGPT: Large Generative Models for a Small Language
Simplifying Transformer Blocks
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
Prompt Engineering a Prompt Engineer
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Accelerating Large Language Model Decoding with Speculative Sampling
Alternating Updates for Efficient Transformers
White-Box Transformers via Sparse Rate Reduction
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
LayoutPrompter: Awaken the Design Ability of Large Language Models
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Trusted Source Alignment in Large Language Models
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
The ART of LLM Refinement: Ask, Refine, and Trust
Fine-tuning Language Models for Factuality
A Survey on Language Models for Code
DiLoCo: Distributed Low-Communication Training of Language Models
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Fusion-Eval: Integrating Evaluators with LLMs
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
SiRA: Sparse Mixture of Low Rank Adaptation
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Contrastive Chain-of-Thought Prompting
Learning to Filter Context for Retrieval-Augmented Generation
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
System 2 Attention (is something you might need too)
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Language Models are Multilingual Chain-of-Thought Reasoners
ProAgent: From Robotic Process Automation to Agentic Process Automation
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Exponentially Faster Language Modelling
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Testing Language Model Agents Safely in the Wild
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Orca 2: Teaching Small Language Models How to Reason
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
On Leakage of Code Generation Evaluation Datasets
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
SelfEval: Leveraging the discriminative nature of generative models for evaluation
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Transformer Memory as a Differentiable Search Index
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
DeiT III: Revenge of the ViT
Scaling Vision Transformers to 22 Billion Parameters
On Calibration of Modern Neural Networks
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Attention Is All You Need
Acceleration via Fractal Learning Rate Schedules
Transformers learn in-context by gradient descent
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Toy Models of Superposition
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Unified Scaling Laws for Routed Language Models
CLIPPO: Image-and-Language Understanding from Pixels Only
Task-Specific Skill Localization in Fine-tuned Language Models
Discovering Latent Knowledge in Language Models Without Supervision
OCR-free Document Understanding Transformer
Language Models are Few-Shot Learners
Progress measures for grokking via mechanistic interpretability
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems
Language Models as Agent Models
Learning Models of Individual Behavior in Chess
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
Ask Me Anything: A simple strategy for prompting language models
Training language models to follow instructions with human feedback
Sequence to Sequence Learning with Neural Networks
SegGPT: Segmenting Everything In Context
A data-driven approach for learning to control computers
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Unifying Vision, Text, and Layout for Universal Document Processing
Memorizing Transformers
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
A Succinct Summary of Reinforcement Learning
Symbolic Discovery of Optimization Algorithms
Confronting Reward Model Overoptimization with Constrained RLHF
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
A Cookbook of Self-Supervised Learning
Training Language Models with Language Feedback at Scale
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Adversarial Examples for Evaluating Reading Comprehension Systems
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
ImageBind: One Embedding Space To Bind Them All
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Scaling Data-Constrained Language Models
Efficient LLM Inference on CPUs
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Efficiently Scaling Transformer Inference
One Model To Learn Them All
Brain decoding: toward real-time reconstruction of visual perception
GLU Variants Improve Transformer
Vision Transformers with Mixed-Resolution Tokenization
HyperNetworks
InRank: Incremental Low-Rank Learning
Text-to-Image Diffusion Models are Zero-Shot Classifiers
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
MAGVLT: Masked Generative Vision-and-Language Transformer
DINOv2: Learning Robust Visual Features without Supervision
What learning algorithm is in-context learning? Investigations with linear models
Any-to-Any Generation via Composable Diffusion
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Shortformer: Better Language Modeling using Shorter Inputs
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
PaLI: A Jointly-Scaled Multilingual Language-Image Model
The alignment problem from a deep learning perspective
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Jailbreaking is Best Solved by Definition
Multimodal Analogical Reasoning over Knowledge Graphs
Segment Everything Everywhere All at Once
DocPrompting: Generating Code by Retrieving the Docs
Emergent Tool Use From Multi-Agent Autocurricula
Root Mean Square Layer Normalization
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Efficient Training of Language Models to Fill in the Middle
AI for Mathematics: A Cognitive Science Perspective
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
The First Room-Temperature Ambient-Pressure Superconductor
Segment Anything
Less is More: Parameter-Free Text Classification with Gzip
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
A Generalist Agent
Meet in the Middle: A New Pre-training Paradigm
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Can Humans Do Less-Than-One-Shot Learning?
Diffusion-LM Improves Controllable Text Generation
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Text-to-3D using Gaussian Splatting
Precise Zero-Shot Dense Retrieval without Relevance Labels
Brainformers: Trading Simplicity for Efficiency
DETRs Beat YOLOs on Real-time Object Detection
OtterHD: A High-Resolution Multi-modality Model
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
ConvNets Match Vision Transformers at Scale
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models
Scaling Robot Learning with Semantically Imagined Experience
Do LLMs exhibit human-like response biases? A case study in survey design
READ: Recurrent Adaptation of Large Transformers
Benchmarking Neural Network Training Algorithms
Automatic Gradient Descent: Deep Learning without Hyperparameters
Layer Normalization
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Implicit Representations of Meaning in Neural Language Models
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable
SqueezeLLM: Dense-and-Sparse Quantization
Optimisation & Generalisation in Networks of Neurons
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Transformers as Recognizers of Formal Languages: A Survey on Expressivity
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Decoupled Context Processing for Context Augmented Language Modeling
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
The Transient Nature of Emergent In-Context Learning in Transformers
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Matryoshka Diffusion Models
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Beyond neural scaling laws: beating power law scaling via data pruning
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Going Deeper with Convolutions
TimeGPT-1
Capabilities of GPT-4 on Medical Challenge Problems
Training Large Language Models Efficiently with Sparsity and Dataflow
Optimal Policies Tend to Seek Power
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Thinking Like Transformers
Why think step by step? Reasoning emerges from the locality of experience
Mixture-of-Experts with Expert Choice Routing
GPT-4 Technical Report
Scaling Expert Language Models with Unsupervised Domain Discovery
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Mass-Editing Memory in a Transformer
Erasing Concepts from Diffusion Models
Physics of Language Models: Part 1, Context-Free Grammar
Flamingo: a Visual Language Model for Few-Shot Learning
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Semantic Tokenizer for Enhanced Natural Language Processing
On Limitations of the Transformer Architecture
A Survey of Large Language Models
Affordances from Human Videos as a Versatile Representation for Robotics
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Conditioning Predictive Models: Risks and Strategies
Implicit Chain of Thought Reasoning via Knowledge Distillation
Scaling Laws for Transfer
Risks from Learned Optimization in Advanced Machine Learning Systems
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Bayesian Optimization of Catalysts With In-context Learning
Teach LLMs to Phish: Stealing Private Information from Language Models
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
Knowledge Graphs
Language Modelling with Pixels
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Chinchilla Scaling: A replication attempt
Retrofitting Word Vectors to Semantic Lexicons
CoLT5: Faster Long-Range Transformers with Conditional Computation
Deep contextualized word representations
Boosted Prompt Ensembles for Large Language Models
Recurrent Memory Transformer
Multitask Prompted Training Enables Zero-Shot Task Generalization
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
On the Turing Completeness of Modern Neural Network Architectures
Generalized Out-of-Distribution Detection: A Survey
AugGPT: Leveraging ChatGPT for Text Data Augmentation
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Human-Timescale Adaptation in an Open-Ended Task Space
Sigmoid Loss for Language Image Pre-Training
OpenScene: 3D Scene Understanding with Open Vocabularies
Nougat: Neural Optical Understanding for Academic Documents
SoundStorm: Efficient Parallel Audio Generation
Text and Code Embeddings by Contrastive Pre-Training
Fine-Tuning Language Models from Human Preferences
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
Effective Theory of Transformers at Initialization
ST-MoE: Designing Stable and Transferable Sparse Expert Models
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Natural Selection Favors AIs over Humans
ART: Automatic multi-step reasoning and tool-use for large language models
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Visual Instruction Tuning
Efficiently Modeling Long Sequences with Structured State Spaces
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Mastering Diverse Domains through World Models
Simplified State Space Layers for Sequence Modeling
Offline RL for Natural Language Generation with Implicit Language Q Learning
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Self-supervised Learning: Generative or Contrastive
Towards Automated Circuit Discovery for Mechanistic Interpretability
Neural Story Planning
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Dota 2 with Large Scale Deep Reinforcement Learning
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
The Matrix Calculus You Need For Deep Learning
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models
DeepNet: Scaling Transformers to 1,000 Layers
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
LLMs cannot find reasoning errors, but can correct them!
Pretraining Without Attention
Large language models are not zero-shot communicators
Semi-supervised Sequence Learning
Improving language models by retrieving from trillions of tokens
Synthetic Data from Diffusion Models Improves ImageNet Classification
Level Generation Through Large Language Models
How Does Generative Retrieval Scale to Millions of Passages?
State Spaces Aren't Enough: Machine Translation Needs Attention
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Evaluating Large Language Models Trained on Code
Injecting structural hints: Using language models to study inductive biases in language learning
The case for 4-bit precision: k-bit Inference Scaling Laws
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Downstream Datasets Make Surprisingly Good Pretraining Corpora
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark
Fast Transformer Decoding: One Write-Head is All You Need
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities
Towards Deep Learning Models Resistant to Adversarial Attacks
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Large Language Models as General Pattern Machines
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Fast and forward stable randomized algorithms for linear least-squares problems
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
Twist Decoding: Diverse Generators Guide Each Other
Monolith: Real Time Recommendation System With Collisionless Embedding Table
On-Device Training Under 256KB Memory
Meta-Learning in Neural Networks: A Survey
The Linear Representation Hypothesis and the Geometry of Large Language Models
The Power of Scale for Parameter-Efficient Prompt Tuning
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Spreading vectors for similarity search
REFINER: Reasoning Feedback on Intermediate Representations
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Low-code LLM: Visual Programming over LLMs
Decoding speech perception from non-invasive brain recordings
Towards Agile Text Classifiers for Everyone
Cramming: Training a Language Model on a Single GPU in One Day
Text-to-Table: A New Way of Information Extraction
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
ViperGPT: Visual Inference via Python Execution for Reasoning
Spatial-Language Attention Policies for Efficient Robot Learning
Improved Baselines with Visual Instruction Tuning
Decision Transformer: Reinforcement Learning via Sequence Modeling
What Algorithms can Transformers Learn? A Study in Length Generalization
Tracking Everything Everywhere All at Once
Bad Global Minima Exist and SGD Can Reach Them
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Optimizing Memory Mapping Using Deep Reinforcement Learning
A General Theoretical Paradigm to Understand Learning from Human Preferences
Beyond Words: A Comprehensive Survey of Sentence Representations
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Adding Gradient Noise Improves Learning for Very Deep Networks
Positional Description Matters for Transformers Arithmetic
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Calibrated Language Models Must Hallucinate
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Online Decision Transformer
Benchmarking Large Language Models for News Summarization
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Scalable Extraction of Training Data from (Production) Language Models
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Visual In-Context Prompting
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
GAIA: a benchmark for General AI Assistants
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text
Chain-of-Thought Reasoning is a Policy Improvement Operator
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Thinking Fast and Slow in Large Language Models
Towards Accurate Differential Diagnosis with Large Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Vanishing Gradients in Reinforcement Finetuning of Language Models
The History and Risks of Reinforcement Learning and Human Feedback
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Video Language Planning
Thread of Thought Unraveling Chaotic Contexts
PaSS: Parallel Speculative Sampling
SeaLLMs -- Large Language Models for Southeast Asia
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
An LLM Compiler for Parallel Function Calling
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Magicoder: Source Code Is All You Need
SILC: Improving Vision Language Pretraining with Self-Distillation
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
An Early Evaluation of GPT-4V(ision)
Farzi Data: Autoregressive Data Distillation
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Towards a Unified View of Parameter-Efficient Transfer Learning
Beyond Surface: Probing LLaMA Across Scales and Layers
TiC-CLIP: Continual Training of CLIP Models
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
GOAT: GO to Any Thing
Nash Learning from Human Feedback
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Axiomatic Preference Modeling for Longform Question Answering
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
Efficient Monotonic Multihead Attention
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Are LLMs Useful in the Poorest Schools? theTeacherAI in Sierra Leone
De-Diffusion Makes Text a Strong Cross-Modal Interface
Dolphins: Multimodal Language Model for Driving
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Efficient Transformer Knowledge Distillation: A Performance Review
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments
Instruction-tuning Aligns LLMs to the Human Brain
Large Language Model Alignment: A Survey
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Instruction-Following Evaluation for Large Language Models
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Pre-Training to Learn in Context
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Large Language Models for Mathematicians
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Language Model Inversion
Training Chain-of-Thought via Latent-Variable Inference
The Quantization Model of Neural Scaling
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
TinyGSM: achieving >80% on GSM8k with small language models
Context Tuning for Retrieval Augmented Generation
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
TigerBot: An Open Multilingual Multitask LLM
PromptBench: A Unified Library for Evaluation of Large Language Models
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Challenges with unsupervised LLM knowledge discovery
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Honeybee: Locality-enhanced Projector for Multimodal LLM
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
ProTIP: Progressive Tool Retrieval Improves Planning
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
SparQ Attention: Bandwidth-Efficient LLM Inference
Silkie: Preference Distillation for Large Visual Language Models
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Algorithmic Collusion by Large Language Models
Mathematical Language Models: A Survey
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Pixel Aligned Language Models
PathFinder: Guided Search over Multi-Step Reasoning Paths
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vision-Language Models as a Source of Rewards
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Language-Informed Visual Concept Learning
Evaluation of Large Language Models for Decision Making in Autonomous Driving
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Extending Context Window of Large Language Models via Semantic Compression
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Formal Aspects of Language Modeling
Large Language Models on Graphs: A Comprehensive Survey
Merlin:Empowering Multimodal LLMs with Foresight Minds
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming
Generating Illustrated Instructions
Alignment for Honesty
Paloma: A Benchmark for Evaluating Language Model Fit
Self-Evaluation Improves Selective Generation in Large Language Models
Nomic Embed: Training a Reproducible Long Context Text Embedder
Rejuvenating image-GPT as Strong Visual Representation Learners
Object Recognition as Next Token Prediction
Foundation Models in Robotics: Applications, Challenges, and the Future
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Data Management For Large Language Models: A Survey
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Knowledge Distillation of Large Language Models
Faithful Persona-based Conversational Dataset Generation with Large Language Models
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Weight subcloning: direct initialization of transformers using larger pretrained ones
Segment and Caption Anything
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
OneLLM: One Framework to Align All Modalities with Language
Steering Llama 2 via Contrastive Activation Addition
VILA: On Pre-training for Visual Language Models
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
HyperAttention: Long-context Attention in Near-Linear Time
LLM360: Towards Fully Transparent Open-Source LLMs
Efficient Transformers with Dynamic Token Pooling
GIVT: Generative Infinite-Vocabulary Transformers
Modeling Context in Referring Expressions
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Text-Conditioned Resampler For Long Form Video Understanding
Gemini: A Family of Highly Capable Multimodal Models
LLMs are Not Just Next Token Predictors
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Cascade Speculative Drafting for Even Faster LLM Inference
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
VideoPoet: A Large Language Model for Zero-Shot Video Generation
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
AppAgent: Multimodal Agents as Smartphone Users
Time is Encoded in the Weights of Finetuned Language Models
Generative Multimodal Models are In-Context Learners
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
An In-depth Look at Gemini's Language Abilities
Retrieval-Augmented Generation for Large Language Models: A Survey
Intriguing Properties of Quantization at Scale
Parrot Captions Teach CLIP to Spot Text
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
YAYI 2: Multilingual Open-Source Large Language Models
Reasons to Reject? Aligning Language Models with Judgments
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Exploiting Novel GPT-4 APIs
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
LLM4VG: Large Language Models Evaluation for Video Grounding
Shai: A large language model for asset management
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Supervised Knowledge Makes Large Language Models Better In-context Learners
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases
The LLM Surgeon
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Task Contamination: Language Models May Not Be Few-Shot Anymore
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Learning Vision from Models Rivals Learning Vision from Data
TinyLlama: An Open-Source Small Language Model
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation
Making Large Language Models A Better Foundation For Dense Retrieval
LARP: Language-Agent Role Play for Open-World Games
A Survey of Reasoning with Foundation Models
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
Towards the Law of Capacity Gap in Distilling Language Models
At Which Training Stage Does Code Data Help LLMs Reasoning?
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
A Comprehensive Study of Knowledge Editing for Large Language Models
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Orion-14B: Open-source Multilingual Large Language Models
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
DocLLM: A layout-aware generative language model for multimodal document understanding
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
GeoGalactica: A Scientific Large Language Model in Geoscience
Improving Text Embeddings with Large Language Models
Boosting Large Language Model for Speech Synthesis: An Empirical Study
TrustLLM: Trustworthiness in Large Language Models
Unicron: Economizing Self-Healing LLM Training at Scale
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Proving Test Set Contamination in Black Box Language Models
LLaMA Pro: Progressive LLaMA with Block Expansion
LLM Augmented LLMs: Expanding Capabilities through Composition
LLaVA-$φ$: Efficient Multi-Modal Assistant with Small Language Model
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
Understanding LLMs: A Comprehensive Overview from Training to Inference
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
A Vision Check-up for Language Models
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope
GPT-4V(ision) is a Generalist Web Agent, if Grounded
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Mind2Web: Towards a Generalist Agent for the Web
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DocGraphLM: Documental Graph Language Model for Information Extraction
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
TOFU: A Task of Fictitious Unlearning for LLMs
Transformers are Multi-State RNNs
Secrets of RLHF in Large Language Models Part II: Reward Modeling
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
Towards Conversational Diagnostic AI
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Efficient LLM inference solution on Intel GPU
I am a Strange Dataset: Metalinguistic Tests for Language Models
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models
The Impact of Reasoning Step Length on Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Mixtral of Experts
ChatQA: Building GPT-4 Level Conversational QA Models
TeleChat Technical Report
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach
MaLA-500: Massive Language Adaptation of Large Language Models
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
State of What Art? A Call for Multi-Prompt LLM Evaluation
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Compressing Context to Enhance Inference Efficiency of Large Language Models
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks
VMamba: Visual State Space Model
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Self-Rewarding Language Models
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Asynchronous Local-SGD Training for Language Modeling
ReFT: Reasoning with Reinforced Fine-Tuning
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Tuning Language Models by Proxy
Scalable Pre-training of Large Autoregressive Image Models
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Extending LLMs' Context Window with 100 Samples
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
SPADE: Synthesizing Assertions for Large Language Model Pipelines
Foundations of Vector Retrieval
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Evaluating the Moral Beliefs Encoded in LLMs
Boosting Theory-of-Mind Performance in Large Language Models via Prompting
MambaByte: Token-free Selective State Space Model
RakutenAI-7B: Extending Large Language Models for Japanese
MM-LLMs: Recent Advances in MultiModal Large Language Models
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Small Language Model Meets with Reinforced Vision Vocabulary
WARM: On the Benefits of Weight Averaged Reward Models
In-Context Learning for Extreme Multi-Label Classification
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
What Are Tools Anyway? A Survey from the Language Model Perspective
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Mission: Impossible Language Models
Benchmarking LLMs via Uncertainty Quantification
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
H2O-Danube-1.8B Technical Report
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Representation Engineering: A Top-Down Approach to AI Transparency
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Efficient Tool Use with Chain-of-Abstraction Reasoning
YOLO-World: Real-Time Open-Vocabulary Object Detection
Weaver: Foundation Models for Creative Writing
Weak-to-Strong Jailbreaking on Large Language Models
Transfer Learning for Text Diffusion Models
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Watermarking Makes Language Models Radioactive
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Generative Expressive Robot Behaviors using Large Language Models
Efficient Exploration for LLMs
Can Large Language Models Understand Context?
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
OLMo: Accelerating the Science of Language Models
Tree Prompting: Efficient Task Adaptation without Fine-Tuning
CroissantLLM: A Truly Bilingual French-English Language Model
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model
Transforming and Combining Rewards for Aligning Large Language Models
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Scaling Laws for Downstream Task Performance of Large Language Models
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Seven Failure Points When Engineering a Retrieval Augmented Generation System
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Multi-line AI-assisted Code Authoring
Self-Discover: Large Language Models Self-Compose Reasoning Structures
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Training-Free Consistent Text-to-Image Generation
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Rethinking Optimization and Architecture for Tiny Language Models
LiPO: Listwise Preference Optimization through Learning-to-Rank
BlackMamba: Mixture of Experts for State-Space Models
Rethinking Interpretability in the Era of Large Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
K-Level Reasoning with Large Language Models
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Specialized Language Models with Cheap Inference from Limited Domain Data
Repeat After Me: Transformers are Better than State Space Models at Copying
A Survey on Hallucination in Large Vision-Language Models
Corrective Retrieval Augmented Generation
A Comprehensive Survey of Compression Algorithms for Language Models
Leveraging Large Language Models for NLG Evaluation: A Survey
The Power of Noise: Redefining Retrieval for RAG Systems
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Red Teaming Visual Language Models
Knowledge Fusion of Large Language Models
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Lexinvariant Language Models
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
Mathematical Capabilities of ChatGPT
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
Large Language Models for Mathematical Reasoning: Progresses and Challenges
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Driving Everywhere with Large Language Model Policy Adaptation
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
SpiRit-LM: Interleaved Spoken and Written Language Model
Multilingual E5 Text Embeddings: A Technical Report
In-Context Principle Learning from Mistakes
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Hydragen: High-Throughput LLM Inference with Shared Prefixes
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Fast Timing-Conditioned Latent Audio Diffusion
Direct Language Model Alignment from Online AI Feedback
Grandmaster-Level Chess Without Search
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Tandem Transformers for Inference Efficient LLMs
World Model on Million-Length Video And Language With RingAttention
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Suppressing Pink Elephants with Direct Principle Feedback
Policy Improvement using Language Feedback Models
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Scaling Laws for Fine-Grained Mixture of Experts
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
ODIN: Disentangled Reward Mitigates Hacking in RLHF
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Generative Representational Instruction Tuning
ChemLLM: A Chemical Large Language Model
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
DeAL: Decoding-time Alignment for Large Language Models
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
SubGen: Token Generation in Sublinear Time and Memory
Keyframer: Empowering Animation Design using Large Language Models
Large Language Model for Table Processing: A Survey
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Approaching Human-Level Forecasting with Language Models
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Premise Order Matters in Reasoning with Large Language Models
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Chain-of-Thought Reasoning Without Prompting
BitDelta: Your Fine-Tune May Only Be Worth One Bit
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Data Engineering for Scaling Language Models to 128K Context
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
How to Train Data-Efficient LLMs
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
Arrows of Time for Large Language Models
Coercing LLMs to do and reveal (almost) anything
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
User-LLM: Efficient LLM Contextualization with User Embeddings
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Instruction-tuned Language Models are Better Knowledge Learners
The FinBen: An Holistic Financial Benchmark for Large Language Models
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
The boundary of neural network trainability is fractal
Reformatted Alignment
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
OneBit: Towards Extremely Low-bit Large Language Models
CoLLaVO: Crayon Large Language and Vision mOdel
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
RLVF: Learning from Verbal Feedback without Overgeneralization
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Efficient Guided Generation for Large Language Models
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Generative Language Modeling for Automated Theorem Proving
Automated Unit Test Improvement using Large Language Models at Meta
LLM Agents can Autonomously Hack Websites
Large Language Models: A Survey
In-Context Retrieval-Augmented Language Models
Consolidating Attention Features for Multi-view Image Editing
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Scaling Up LLM Reviews for Google Ads Content Moderation
Subobject-level Image Tokenization
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
EvoPrompting: Language Models for Code-Level Neural Architecture Search
Goal Driven Discovery of Distributional Differences via Language Descriptions
ChatMusician: Understanding and Generating Music Intrinsically with LLM
GPTVQ: The Blessing of Dimensionality for LLM Quantization
FuseChat: Knowledge Fusion of Chat Models
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Large Language Models for Data Annotation: A Survey
LoRA+: Efficient Low Rank Adaptation of Large Models
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Towards Optimal Learning of Language Models
Evaluating Very Long-Term Conversational Memory of LLM Agents
Training-Free Long-Context Scaling of Large Language Models
Disentangled 3D Scene Generation with Layout Learning
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Nemotron-4 15B Technical Report
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Towards Open-ended Visual Quality Comparison
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
MOSAIC: A Modular System for Assistive and Interactive Cooking
Priority Sampling of Large Language Models for Compilers
Simple linear attention language models balance the recall-throughput tradeoff
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
StarCoder 2 and The Stack v2: The Next Generation
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Simulacra as Conscious Exotica
Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence
Enhancing Vision-Language Pre-training with Rich Supervisions
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Emergent and Predictable Memorization in Large Language Models
Design2Code: How Far Are We From Automating Front-End Engineering?
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Empowering Large Language Model Agents through Action Learning
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
RT-H: Action Hierarchies Using Language
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Datasets for Large Language Models: A Comprehensive Survey
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
Do Efficient Transformers Really Save Computation?
MathPrompter: Mathematical Reasoning using Large Language Models
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Can Large Language Models Reason and Plan?
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Common 7B Language Models Already Possess Strong Math Capabilities
Yi: Open Foundation Models by 01.AI
Teaching Large Language Models to Reason with Reinforcement Learning
SaulLM-7B: A pioneering Large Language Model for Law
Online Adaptation of Language Models with a Memory of Amortized Contexts
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Learning to Decode Collaboratively with Multiple Language Models
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
The Unreasonable Effectiveness of Eccentric Automatic Prompts
A Survey on Evaluation of Large Language Models
The pitfalls of next-token prediction
Stealing Part of a Production Language Model
Algorithmic progress in language models
Thinking Tokens for Language Modeling
Is Cosine-Similarity of Embeddings Really About Similarity?
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Can't Remember Details in Long Documents? You Need Some R&R
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Retrieval-Augmented Generation for AI-Generated Content: A Survey
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
3D-VLA: A 3D Vision-Language-Action Generative World Model
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
GPT on a Quantum Computer
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
GiT: Towards Generalist Vision Transformer through Universal Language Interface
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Social Skill Training with Large Language Models
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Veagle: Advancements in Multimodal Representation Learning
Simple and Scalable Strategies to Continually Pre-train Large Language Models
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
Language models scale reliably with over-training and on downstream tasks
Gemma: Open Models Based on Gemini Research and Technology
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
On the Societal Impact of Open Foundation Models
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Chronos: Learning the Language of Time Series
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
ORPO: Monolithic Preference Optimization without Reference Model
MoAI: Mixture of All Intelligence for Large Language and Vision Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
DeepSeek-VL: Towards Real-World Vision-Language Understanding
How Far Are We from Intelligent Visual Deductive Reasoning?
Small Models are Valuable Plug-ins for Large Language Models
Backtracing: Retrieving the Cause of the Query
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Learning to Generate Better Than Your LLM
Meta-in-context learning in large language models
LERF: Language Embedded Radiance Fields
Eliciting Latent Predictions from Transformers with the Tuned Lens
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Resurrecting Recurrent Neural Networks for Long Sequences
An Overview on Language Models: Recent Developments and Outlook
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library
A Survey of Evaluation Metrics Used for NLG Systems
SummEval: Re-evaluating Summarization Evaluation
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
Logits of API-Protected LLMs Leak Proprietary Information
Knowledge Conflicts for LLMs: A Survey
Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model
Will GPT-4 Run DOOM?
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Large language models surpass human experts in predicting neuroscience results
Reliable, Adaptable, and Attributable Language Models with Retrieval
You Need to Pay Better Attention
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Stable LM 2 1.6B Technical Report
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
A Survey on Data Selection for Language Models
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails
Repetition Improves Language Model Embeddings
How Transformers Learn Causal Structure with Gradient Descent
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models
Analysing The Impact of Sequence Composition on Language Model Pre-Training
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Bayesian Reward Models for LLM Alignment
KMMLU: Measuring Massive Multitask Language Understanding in Korean
Dissecting Human and LLM Preferences
Exploring Value Biases: How LLMs Deviate Towards the Ideal
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Why are Sensitive Functions Hard for Transformers?
Agents Need Not Know Their Purpose
Copyright Traps for Large Language Models
DoRA: Weight-Decomposed Low-Rank Adaptation
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Rethinking Machine Unlearning for Large Language Models
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Improving Black-box Robustness with In-Context Rewriting
Secret Collusion Among Generative AI Agents
Natural Language Reinforcement Learning
Universal Neural Functionals
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
LESS: Selecting Influential Data for Targeted Instruction Tuning
Building Your Own Product Copilot: Challenges, Opportunities, and Needs
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Continual Learning for Large Language Models: A Survey
Towards Efficient and Exact Optimization of Language Model Alignment
HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Spike No More: Stabilizing the Pre-training of Large Language Models
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Zoology: Measuring and Improving Recall in Efficient Language Models
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
LoBaSS: Gauging Learnability in Supervised Fine-tuning Data
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Instruction Tuning with Human Curriculum
MatFormer: Nested Transformer for Elastic Inference
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
xVal: A Continuous Number Encoding for Large Language Models
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Human Feedback is not Gold Standard
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Do language models plan ahead for future tokens?
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Improving Language Plasticity via Pretraining with Active Forgetting
AdANNS: A Framework for Adaptive Semantic Search
Strategic Reasoning with Language Models
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
Sparse is Enough in Scaling Transformers
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
A Theory on Adam Instability in Large-Scale Machine Learning
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Are Language Models Worse than Humans at Following Prompts? It's Complicated
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition
Transformer Language Models without Positional Encodings Still Learn Positional Information
Sequence Parallelism: Long Sequence Training from System Perspective
Bio-inspired Structure Identification in Language Embeddings
Transformers without Tears: Improving the Normalization of Self-Attention
Neural Text Generation with Unlikelihood Training
MASS: Masked Sequence to Sequence Pre-training for Language Generation
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
TnT-LLM: Text Mining at Scale with Large Language Models
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Larimar: Large Language Models with Episodic Memory Control
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer
RAFT: Adapting Language Model to Domain Specific RAG
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Language Agents as Optimizable Graphs
Comparative Study of Large Language Model Architectures on Frontier
Optimizing Distributed Training on Frontier for Large Language Models
Striped Attention: Faster Ring Attention for Causal Transformers
Block-Recurrent Transformers
Addressing Some Limitations of Transformers with Feedback Memory
Reverse Training to Nurse the Reversal Curse
Evaluating Frontier Models for Dangerous Capabilities
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
When Do We Not Need Larger Vision Models?
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Towards 3D Molecule-Text Interpretation in Language Models
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Mixture of Soft Prompts for Controllable Data Generation
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
Evolutionary Optimization of Model Merging Recipes
Semiparametric Token-Sequence Co-Supervision
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
On Learning to Summarize with Large Language Models as References
Scalable Prompt Generation for Semi-supervised Learning with Language Models
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
MyVLM: Personalizing VLMs for User-Specific Queries
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Recourse for reclamation: Chatting with generative language models
On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
The MiniPile Challenge for Data-Efficient Language Models
OmniNet: Omnidirectional Representations from Transformers
Arcee's MergeKit: A Toolkit for Merging Large Language Models
FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
The Case for Co-Designing Model Architectures with Hardware
The Unreasonable Ineffectiveness of the Deeper Layers
Improving Text-to-Image Consistency via Automatic Prompt Optimization
InternLM2 Technical Report
AIOS: LLM Agent Operating System
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Can large language models explore in-context?
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
VidLA: Video-Language Alignment at Scale
Compiler generated feedback for Large Language Models
sDPO: Don't Use Your Data All at Once
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners
LLM4Decompile: Decompiling Binary Code with Large Language Models
Getting the most out of your tokenizer for pre-training and domain adaptation
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Editing Large Language Models: Problems, Methods, and Opportunities
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Long-form factuality in large language models
Towards a World-English Language Model for On-Device Virtual Assistants
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
LITA: Language Instructed Temporal-Localization Assistant
TextCraftor: Your Text Encoder Can be Image Quality Controller
Mechanistic Design and Scaling of Hybrid Architectures
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Blockwise Parallel Transformer for Large Context Models
Large Language Models Can Be Strong Differentially Private Learners
Head-wise Shareable Attention for Large Language Models
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
ReALM: Reference Resolution As Language Modeling
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
DiJiang: Efficient Large Language Models through Compact Kernelization
Jamba: A Hybrid Transformer-Mamba Language Model
Localizing Paragraph Memorization in Language Models
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Communicative Agents for Software Development
Preference Ranking Optimization for Human Alignment
The CRINGE Loss: Learning what language not to model
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
FABLES: Evaluating faithfulness and content selection in book-length summarization
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
WavLLM: Towards Robust and Adaptive Speech Large Language Model
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text
ST-LLM: Large Language Models Are Effective Temporal Learners
Advancing LLM Reasoning Generalists with Preference Trees
Best Practices and Lessons Learned on Synthetic Data for Language Models
Long-context LLMs Struggle with Long In-context Learning
HyperCLOVA X Technical Report
Poro 34B and the Blessing of Multilinguality
Octopus v2: On-device language model for super agent
Are large language models superhuman chemists?
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
Auxiliary task demands mask the capabilities of smaller language models
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Data Interpreter: An LLM Agent For Data Science
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Training LLMs over Neurally Compressed Text
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
ReFT: Representation Finetuning for Language Models
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Noise-Aware Training of Layout-Aware Language Models
AI and the Problem of Knowledge Collapse
Learning to Plan and Generate Text with Citations
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models
ALOHa: A New Measure for Hallucination in Captioning Models
Efficient Multi-Vector Dense Retrieval Using Bit Vectors
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Iterative Forward Tuning Boosts In-context Learning in Language Models
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Stream of Search (SoS): Learning to Search in Language
Large Product Key Memory for Pretrained Language Models
Large Memory Layers with Product Keys
BRAVE: Broadening the visual encoding of vision-language models
Adapting LLaMA Decoder to Vision Transformer
RULER: What's the Real Context Size of Your Long-Context Language Models?
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Reconstructing Hand-Held Objects in 3D
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
MuPT: A Generative Symbolic Music Pretrained Transformer
OmniFusion Technical Report
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
CodecLM: Aligning Language Models with Tailored Synthetic Data
SambaLingo: Teaching Large Language Models New Languages
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Koala: Key frame-conditioned long video-LLM
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Understanding Emergent Abilities of Language Models from the Loss Perspective
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code
Making Large Language Models Better Data Creators
On Surgical Fine-tuning for Language Encoders
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets
Embedding Democratic Values into Social Media AIs via Societal Objective Functions
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
Less is More: Selective Layer Finetuning with SubTuning
Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling
Cut the CARP: Fishing for zero-shot story evaluation
LLoCO: Learning Long Contexts Offline
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Rho-1: Not All Tokens Are What You Need
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Audio Dialogues: Dialogues dataset for audio and music understanding
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Entity-Level Sentiment Analysis (ELSA): An exploratory task survey
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Mechanics of Next Token Prediction with Self-Attention
Scaling Laws of RoPE-based Extrapolation
Pre-training Small Base LMs with Fewer Tokens
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Toward a Theory of Tokenization in LLMs
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Learn Your Reference Model for Real Good Alignment
Large Language Models are as persuasive as humans, but why? About the cognitive effort and moral-emotional language of LLM arguments
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
TransformerFAM: Feedback attention is working memory
On Speculative Decoding for Multimodal Large Language Models
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Generative Disco: Text-to-Video Generation for Music Visualization
Self-playing Adversarial Language Game Enhances LLM Reasoning
Compression Represents Intelligence Linearly
The Illusion of State in State-Space Models
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past
A Thorough Examination of Decoding Methods in the Era of LLMs
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Should You Mask 15% in Masked Language Modeling?
Finetuning Pretrained Transformers into RNNs
BLINK: Multimodal Large Language Models Can See but Not Perceive
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Fewer Truncations Improve Language Modeling
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Many-Shot In-Context Learning
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Exploring the landscape of large language models: Foundations, techniques, and challenges
Automated Social Science: Language Models as Scientist and Subjects
Language Models Still Struggle to Zero-shot Reason about Time Series
Stepwise Alignment for Constrained Language Model Policy Optimization
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Language Imbalance Can Boost Cross-lingual Generalisation
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Large Language Models are Few-Shot Health Learners
How Far Can We Go with Practical Function-Level Program Repair?
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
A Survey on Retrieval-Augmented Text Generation for Large Language Models
A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior
State Space Model for New-Generation Network Alternative to Transformers: A Survey
LLM In-Context Recall is Prompt Dependent
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Towards Large Language Models as Copilots for Theorem Proving in Lean
Characterizing LLM Abstention Behavior in Science QA with Context Perturbations
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Aligning language models with human preferences
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation
RAR-b: Reasoning as Retrieval Benchmark
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Deep Reinforcement Learning with a Natural Language Action Space
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
FlowMind: Automatic Workflow Generation with LLMs
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
DataComp: In search of the next generation of multimodal datasets
Stable and low-precision training for large-scale vision-language models
Multi-Head Mixture-of-Experts
Transformers Can Represent $n$-gram Language Models
Pegasus-v1 Technical Report
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
SnapKV: LLM Knows What You are Looking for Before Generation
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
A Survey on Self-Evolution of Large Language Models
Retrieval Head Mechanistically Explains Long-Context Factuality
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
SPLATE: Sparse Late Interaction Retrieval
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
AgentKit: Flow Engineering with Graphs, not Coding
Rethinking LLM Memorization through the Lens of Adversarial Compression
What's the Magic Word? A Control Theory of LLM Prompting
Adapting Language Models to Compress Contexts
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design
LMentry: A Language Model Benchmark of Elementary Language Tasks
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
Graph Machine Learning in the Era of Large Language Models (LLMs)
NExT: Teaching Large Language Models to Reason about Code Execution
"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment
Can Language Models Solve Olympiad Programming?
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
Make Your LLM Fully Utilize the Context
Weak-to-Strong Extrapolation Expedites Alignment
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Continual Learning of Large Language Models: A Comprehensive Survey
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
Tele-FLM Technical Report
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
MoDE: CLIP Data Experts via Clustering
Universal Adversarial Triggers Are Not Universal
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Improving Dictionary Learning with Gated Sparse Autoencoders
BASS: Batched Attention-optimized Speculative Sampling
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Image Segmentation Using Text and Image Prompts
Holistic Safety and Responsibility Evaluations of Advanced AI Models
WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation
Efficient Continual Pre-training for Building Domain Specific Large Language Models
DeLighT: Deep and Light-weight Transformer
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
GeckOpt: LLM System Efficiency via Intent-Based Tool Selection
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Relational Graph Convolutional Networks for Sentiment Analysis
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Nyonic Technical Report
LLM Evaluators Recognize and Favor Their Own Generations
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
A Survey of Generative Search and Recommendation in the Era of Large Language Models
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
A Primer on the Inner Workings of Transformer-based Language Models
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
zkLLM: Zero Knowledge Proofs for Large Language Models
A Survey on the Memory Mechanism of Large Language Model based Agents
Large Language Model Agent as a Mechanical Designer
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Near to Mid-term Risks and Opportunities of Open Source Generative AI
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Benchmarking Mobile Device Control Agents across Diverse Configurations
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Assessing The Potential Of Mid-Sized Language Models For Clinical QA
Conformal Prediction for Natural Language Processing: A Survey
Dual Modalities of Text: Visual and Textual Generative Pre-training
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback
Predicting Emergent Abilities with Infinite Resolution Evaluation
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
Hallucination of Multimodal Large Language Models: A Survey
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Benchmarking Benchmark Leakage in Large Language Models
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
ChuXin: 1.6B Technical Report
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
LEGENT: Open Platform for Embodied Agents
From Persona to Personalization: A Survey on Role-Playing Language Agents
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Autonomous LLM-driven research from data to human-verifiable research papers
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Beyond Words: A Mathematical Framework for Interpreting Large Language Models
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
Ranked List Truncation for Large Language Model-based Re-Ranking
Building a Large Japanese Web Corpus for Large Language Models
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
DOCCI: Descriptions of Connected and Contrasting Images
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Better & Faster Large Language Models via Multi-token Prediction
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Extending Llama-3's Context Ten-Fold Overnight
Octopus v4: Graph of language models
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting
How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library
Faster Convergence for Transformer Fine-tuning with Line Search Methods
Linear Transformers Are Secretly Fast Weight Programmers
FLAME: Factuality-Aware Alignment for Large Language Models
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
In-Context Learning Creates Task Vectors
WildChat: 1M ChatGPT Interaction Logs in the Wild
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"
LLM-AD: Large Language Model based Audio Description System
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Self-Play Preference Optimization for Language Model Alignment
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Automatic Creative Selection with Cross-Modal Matching
Harmonic LLMs are Trustworthy
On Training a Neural Network to Explain Binaries
In-Context Learning with Long-Context Models: An In-Depth Exploration
Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning
Aligning LLM Agents by Learning Latent Preference from User Edits
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Neural Networks Learn Statistics of Increasing Complexity
Emerging Properties in Self-Supervised Vision Transformers
Advancing Multimodal Medical Capabilities of Gemini
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Controllable Text Generation in the Instruction-Tuning Era
MANTIS: Interleaved Multi-Image Instruction Tuning
A Philosophical Introduction to Language Models - Part II: The Way Forward
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
How do Large Language Models Handle Multilingualism?
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
Modeling Emotions and Ethics with Large Language Models
Structured Chemistry Reasoning with Large Language Models
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving
Characterising the Creative Process in Humans and Large Language Models
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
AlphaMath Almost Zero: process Supervision without process
MAmmoTH2: Scaling Instructions from the Web
Is Flash Attention Stable?
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
What matters when building vision-language models?
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Understanding LLMs Requires More Than Statistical Generalization
Efficient and Economic Large Language Model Inference with Attention Offloading
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
Large Language Models are Inconsistent and Biased Evaluators
101 Billion Arabic Words Dataset
What is Sentiment Meant to Mean to Language Models?
GPT-4 passes most of the 297 written Polish Board Certification Examinations
Text Quality-Based Pruning for Efficient Training of Language Models
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
On the Evaluation of Machine-Generated Reports
Automatic Programming: Large Language Models and Beyond
Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs
Multi-hop Question Answering over Knowledge Graphs using Large Language Models
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Parallel Structures in Pre-training Data Yield In-Context Learning
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
ReZero is All You Need: Fast Convergence at Large Depth
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
A Transformer with Stack Attention
xLSTM: Extended Long Short-Term Memory
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches
Assemblage: Automatic Binary Dataset Construction for Machine Learning
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
CLLMs: Consistency Large Language Models
You Only Cache Once: Decoder-Decoder Architectures for Language Models
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Chain of Thoughtlessness: An Analysis of CoT in Planning
LLMs Can Patch Up Missing Relevance Judgments in Evaluation
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
How Susceptible are Large Language Models to Ideological Manipulation?
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
Can We Use Large Language Models to Fill Relevance Judgment Holes?
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models
Automating the Enterprise with Foundation Models
Enhancing Q-Learning with Large Language Model Heuristics
Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure
Language Modeling Using Tensor Trains
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation
Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Large Language Models (LLMs) as Agents for Augmented Democracy
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Natural Language Processing RELIES on Linguistics
Probing Multimodal LLMs as World Models for Driving
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models
Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation
A Causal Explainable Guardrails for Large Language Models
In-Context Symbolic Regression: Leveraging Language Models for Function Discovery
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models
Value Augmented Sampling for Language Model Alignment and Personalization
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages
Sub-goal Distillation: A Method to Improve Small Language Agents
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Linearizing Large Language Models
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
LMD3: Language Model Data Density Dependence
State-Free Inference of State-Space Models: The Transfer Function Approach
Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance
Masked Structural Growth for 2x Faster Language Model Pre-training
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
A Generalist Learner for Multifaceted Medical Image Interpretation
The Platonic Representation Hypothesis
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking
Zero-Shot Tokenizer Transfer
RLHF Workflow: From Reward Modeling to Online RLHF
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation
SUTRA: Scalable Multilingual Language Model Architecture
ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization
Large Language Models as Planning Domain Generators
Explaining Text Similarity in Transformer Models
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent
Exposing Attention Glitches with Flip-Flop Language Modeling
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
CinePile: A Long Video Question Answering Dataset and Benchmark
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models
Understanding the performance gap between online and offline alignment algorithms
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
SpeechVerse: A Large-scale Generalizable Audio Language Model
Compositional Text-to-Image Generation with Dense Blob Representations
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness
People cannot distinguish GPT-4 from a human in a Turing test
LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities
What Can Natural Language Processing Do for Peer Review?
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Improving Transformers with Dynamically Composable Multi-Head Attention
Word2World: Generating Stories and Worlds through Large Language Models
Ask Again, Then Fail: Large Language Models' Vacillations in Judgement
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
Measuring Implicit Bias in Explicitly Unbiased Large Language Models
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Many-Shot In-Context Learning in Multimodal Foundation Models
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
LoRA Learns Less and Forgets Less
Using ChatGPT for Thematic Analysis
Are Large Pre-Trained Language Models Leaking Your Personal Information?
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre
HMT: Hierarchical Memory Transformer for Long Context Language Processing
Air Gap: Protecting Privacy-Conscious Conversational Agents
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models
LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages
MarkLLM: An Open-Source Toolkit for LLM Watermarking
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Towards Uncertainty-Aware Language Agent
Observational Scaling Laws and the Predictability of Language Model Performance
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
Inducing Group Fairness in LLM-Based Decisions
CELA: Cost-Efficient Language Model Alignment for CTR Prediction
RDRec: Rationale Distillation for LLM-based Recommendation
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers
INDUS: Effective and Efficient Language Models for Scientific Applications
Dynamic data sampler for cross-language transfer learning in large language models
Grounded 3D-LLM with Referent Tokens
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining
MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization
WavCraft: Audio Editing and Generation with Large Language Models
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Transformers learn to implement preconditioned gradient descent for in-context learning
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Towards Modular LLMs by Building and Reusing a Library of LoRAs
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Latent State Estimation Helps UI Agents to Reason
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Large Language Models Meet NLP: A Survey
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Blind Baselines Beat Membership Inference Attacks for Foundation Models
Your Transformer is Secretly Linear
Can AI Relate: Testing Large Language Model Response for Mental Health Support
Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!
Large Language Models are Biased Reinforcement Learners
ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks
Keep It Private: Unsupervised Privatization of Online Text
Generative AI and Large Language Models for Cyber Security: All Insights You Need
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations
Leveraging Reinforcement Learning and Large Language Models for Code Optimization
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
Large Language Models Are Not Robust Multiple Choice Selectors
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Not All Language Model Features Are Linear
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Dense Connector for MLLMs
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns
Bitune: Bidirectional Instruction-Tuning
Lessons from the Trenches on Reproducible Evaluation of Language Models
Multi-turn Reinforcement Learning from Preference Human Feedback
Base of RoPE Bounds Context Length
Top-Down Partitioning for Efficient List-Wise Ranking
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
Agent Planning with World Knowledge Model
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Distributed Speculative Inference of Large Language Models
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
RAGE Against the Machine: Retrieval-Augmented LLM Explanations
Efficient Multimodal Large Language Models: A Survey
Natural Language Can Help Bridge the Sim2Real Gap
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models
Infinite Limits of Multi-head Transformer Dynamics
News Recommendation with Category Description by a Large Language Model
Evaluation of the Programming Skills of Large Language Models
AI-Assisted Assessment of Coding Practices in Modern Code Review
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Super Tiny Language Models
RE-Adapt: Reverse Engineered Adaptation of Large Language Models
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Instruction Tuning With Loss Over Instructions
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM
SignLLM: Sign Languages Production Large Language Models
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Are Long-LLMs A Necessity For Long-Context Tasks?
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Extracting Prompts by Inverting LLM Outputs
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Aya 23: Open Weight Releases to Further Multilingual Progress
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Tailoring Vaccine Messaging with Common-Ground Opinions
Efficient Adversarial Training in LLMs with Continuous Attacks
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
Neural Scaling Laws for Embodied AI
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation
"The Death of Wikipedia?" -- Exploring the Impact of ChatGPT on Wikipedia Engagement
Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning
Eliciting Latent Knowledge from Quirky Language Models
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Matryoshka Multimodal Models
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Transformers Can Do Arithmetic with the Right Embeddings
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning
An Introduction to Vision-Language Modeling
Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
Zamba: A Compact 7B SSM Hybrid Model
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
MoEUT: Mixture-of-Experts Universal Transformers
DAGER: Exact Gradient Inversion for Large Language Models
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
The Impact of Positional Encoding on Length Generalization in Transformers
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Phase Transitions in the Output Distribution of Large Language Models
Crafting Interpretable Embeddings by Asking LLMs Questions
gzip Predicts Data-dependent Scaling Laws
Spectral Editing of Activations for Large Language Model Alignment
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Learning to Reason via Program Generation, Emulation, and Search
Hacc-Man: An Arcade Game for Jailbreaking LLMs
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval
FinTextQA: A Dataset for Long-form Financial Question Answering
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
Don't Forget to Connect! Improving RAG with Graph-based Reranking
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models
Faithful Logical Reasoning via Symbolic Chain-of-Thought
2BP: 2-Stage Backpropagation
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Fine-tuning Large Language Models with Sequential Instructions
Evaluating the Factual Consistency of Large Language Models Through News Summarization
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Robust Preference Optimization through Reward Model Distillation
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Matryoshka Query Transformer for Large Vision-Language Models
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets
Offline Regularised Reinforcement Learning for Large Language Models Alignment
LLMs achieve adult human performance on higher-order theory of mind tasks
On the Role of Attention Masks and LayerNorm in Transformers
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Preference Learning Algorithms Do Not Learn Preference Rankings
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Contextual Position Encoding: Learning to Count What's Important
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Linking In-context Learning in Transformers to Human Episodic Memory
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Bayesian Online Natural Gradient (BONG)
Data Augmentation Vision Transformer for Fine-grained Image Classification
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Don't drop your samples! Coherence-aware training benefits Conditional diffusion
Large Language Models Can Self-Improve At Web Agent Tasks
Group Robust Preference Optimization in Reward-free RLHF
Evaluating Large Language Model Biases in Persona-Steered Generation
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Aligning to Thousands of Preferences via System Message Generalization
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
Generating Query Recommendations via LLMs
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
Position: Foundation Agents as the Paradigm Shift for Decision Making
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
A Survey on Vision-Language-Action Models for Embodied AI
Large Language Models Can Self-Correct with Minimal Effort
Language Models with Conformal Factuality Guarantees
Prompt Optimization with Human Feedback
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction
RealitySummary: On-Demand Mixed Reality Document Enhancement using Large Language Models
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars
Certifiably Robust RAG against Retrieval Corruption
Want To Reduce Labeling Cost? GPT-3 Can Help
Embedding-Aligned Language Models
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
Large Language Models are Zero-Shot Next Location Predictors
There and Back Again: The AI Alignment Paradox
Expanded Gating Ranges Improve Activation Functions
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Stress-Testing Capability Elicitation With Password-Locked Models
Knowledge Circuits in Pretrained Transformers
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Learning the Language of Protein Structure
Zyda: A 1.3T Dataset for Open Language Modeling
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Towards Scalable Automated Alignment of LLMs: A Survey
Pretrained Hybrids with MAD Skills
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Controlling Large Language Model Agents with Entropic Activation Steering
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Transfer Q Star: Principled Decoding for LLM Alignment
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
To Believe or Not to Believe Your LLM
Scalable MatMul-free Language Modeling
Meta-Designing Quantum Experiments with Language Models
Extended Mind Transformers
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback
How to Understand Whole Software Repository?
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Automated Focused Feedback Generation for Scientific Writing Assistance
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Item-Language Model for Conversational Recommendation
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Parrot: Multilingual Visual Instruction Tuning
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
A Study of Optimizations for Fine-tuning Large Language Models
Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
The Impossibility of Fair LLMs
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Are We Done with MMLU?
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
Pre-trained Large Language Models Use Fourier Features to Compute Addition
CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
DiffUHaul: A Training-Free Method for Object Dragging in Images
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
ABodyBuilder3: Improved and scalable antibody structure predictions
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models
DsDm: Model-Aware Dataset Selection with Datamodels
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Improving Alignment and Robustness with Short Circuiting
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models
Matching Anything by Segmenting Anything
What Do Language Models Learn in Context? The Structured Task Hypothesis
Scaling and evaluating sparse autoencoders
Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Iteration Head: A Mechanistic Study of Chain-of-Thought
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention
Does your data spark joy? Performance gains from domain upsampling at the end of training
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
CRAG -- Comprehensive RAG Benchmark
Mixture-of-Agents Enhances Large Language Model Capabilities
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
MAIRA-2: Grounded Radiology Report Generation
Proofread: Fixes All Errors with One Tap
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
Large Language Model Confidence Estimation via Black-Box Access
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Towards a Personal Health Large Language Model
Tx-LLM: A Large Language Model for Therapeutics
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Unified Text-to-Image Generation and Retrieval
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
BERTs are Generative In-Context Learners
Is Free Self-Alignment Possible?
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Creativity Has Left the Chat: The Price of Debiasing Language Models
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
Can Language Models Serve as Text-Based World Simulators?
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Contrastive learning of T cell receptor representations
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
On the Reliability of Watermarks for Large Language Models
A Survey of Diffusion Models in Natural Language Processing
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Learning to Grow Pretrained Models for Efficient Transformer Training
An Image is Worth 32 Tokens for Reconstruction and Generation
Simple and Effective Masked Diffusion Language Models
Instant 3D Human Avatar Generation using Image Diffusion Models
TextGrad: Automatic "Differentiation" via Text
Spectrum: Targeted Training on Signal to Noise Ratio
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Multimodal Belief Prediction
McEval: Massively Multilingual Code Evaluation
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Merging Improves Self-Critique Against Jailbreak Attacks
Confabulation: The Surprising Value of Large Language Model Hallucinations
The Prompt Report: A Systematic Survey of Prompting Techniques
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
LLM Dataset Inference: Did you train on my dataset?
Towards Lifelong Learning of Large Language Models: A Survey
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Attention as a Hypernetwork
ConStat: Performance-Based Contamination Detection in Large Language Models
What If We Recaption Billions of Web Images with LLaMA-3?
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Discovering Preference Optimization Algorithms with and for Large Language Models
Large Language Models Must Be Taught to Know What They Don't Know
An Empirical Study of Mamba-based Language Models
Collective Constitutional AI: Aligning a Language Model with Public Input
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Explore the Limits of Omni-modal Pretraining at Scale
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Large Language Model Unlearning via Embedding-Corrupted Prompts
Grounding Multimodal Large Language Models in Actions
BertaQA: How Much Do Language Models Know About Local Culture?
VCR: Visual Caption Restoration
Hibou: A Family of Foundational Vision Transformers for Pathology
Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Improving Retrieval for RAG based Question Answering Models on Financial Documents
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Transformers meet Neural Algorithmic Reasoners
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
OpenVLA: An Open-Source Vision-Language-Action Model
ReMI: A Dataset for Reasoning with Multiple Images
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Investigating the translation capabilities of Large Language Models trained on parallel data only
Multi-Agent Software Development through Cross-Team Collaboration
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
HelpSteer2: Open-source dataset for training top-performing reward models
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Real2Code: Reconstruct Articulated Objects via Code Generation
DafnyBench: A Benchmark for Formal Software Verification
Estimating the Hallucination Rate of Generative AI
RWKV-CLIP: A Robust Vision-Language Representation Learner
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Promptagator: Few-shot Dense Retrieval From 8 Examples
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
InPars: Data Augmentation for Information Retrieval using Large Language Models
Reconciling Kaplan and Chinchilla Scaling Laws
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Are you still on track!? Catching LLM Task Drift with Activations
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
Quantifying Variance in Evaluation Benchmarks
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Evaluation of Large Language Models: STEM education and Gender Stereotypes
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation
Mixture-of-Subspaces in Low-Rank Adaptation
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions
GEB-1.3B: Open Lightweight Large Language Model
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Large language model validity via enhanced conformal prediction methods
Decoding the Diversity: A Review of the Indic AI Research Landscape
Advancing High Resolution Vision-Language Models in Biomedicine
Bayesian Statistical Modeling with Predictors from LLMs
Self-Supervised Speech Representations are More Phonetic than Semantic
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Needle In A Multimodal Haystack
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
DataComp-LM: In search of the next generation of training sets for language models
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem
The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Language Modeling with Editable External Knowledge
WPO: Enhancing RLHF with Weighted Preference Optimization
VideoLLM-online: Online Video Large Language Model for Streaming Video
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Task Me Anything
Refusal in Language Models Is Mediated by a Single Direction
DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models
Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis
GUICourse: From General Vision Language Models to Versatile GUI Agents
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
In-Context Editing: Learning Knowledge from Self-Induced Distributions
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
Breaking the Attention Bottleneck
STAR: SocioTechnical Approach to Red Teaming Language Models
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
AudioPaLM: A Large Language Model That Can Speak and Listen
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Improving Multi-Agent Debate with Sparse Communication Topology
Meta Reasoning for Large Language Models
A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
Unifying Multimodal Retrieval via Document Screenshot Embedding
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
OLMES: A Standard for Language Model Evaluations
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation
Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning
VoCo-LLaMA: Towards Vision Compression with Large Language Models
TroL: Traversal of Layers for Large Language and Vision Models
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Statistical Uncertainty in Word Embeddings: GloVe-V
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Large Scale Transfer Learning for Tabular Data via Language Modeling
Transcoders Find Interpretable LLM Feature Circuits
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Tokenization Falling Short: The Curse of Tokenization
Can LLM be a Personalized Judge?
NAST: Noise Aware Speech Tokenization for Speech Language Models
Bootstrapping Language Models with DPO Implicit Rewards
The Impact of Initialization on LoRA Finetuning Dynamics
StatBot.Swiss: Bilingual Open Data Exploration in Natural Language
Adversarial Attacks on Multimodal Agents
Estimating Knowledge in Large Language Models Without Generating a Single Token
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Instruction Pre-Training: Language Models are Supervised Multitask Learners
LLMatDesign: Autonomous Materials Discovery with Large Language Models
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
AgentReview: Exploring Peer Review Dynamics with LLM Agents
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Are LLMs Naturally Good at Synthetic Tabular Data Generation?
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
Measuring memorization in RLHF for code completion
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
garak: A Framework for Security Probing Large Language Models
Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities
GenQA: Generating Millions of Instructions from a Handful of Prompts
Transferring Knowledge from Large Foundation Models to Small Downstream Models
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Evidence of a log scaling law for political persuasion with large language models
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
Improving Visual Commonsense in Language Models via Multiple Image Generation
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
HARE: HumAn pRiors, a key to small language model Efficiency
Delving into ChatGPT usage in academic writing through excess vocabulary
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Interpretability of Language Models via Task Spaces
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data
CodeRAG-Bench: Can Retrieval Augment Code Generation?
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Large Language Models are Null-Shot Learners
SGLang: Efficient Execution of Structured Language Model Programs
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions
Learning to Retrieve Iteratively for In-Context Learning
Jailbreaking as a Reward Misspecification Problem
Information Guided Regularization for Fine-tuning Language Models
Unlocking the Global Synergies in Low-Rank Adapters
Towards Retrieval Augmented Generation over Large Video Libraries
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation
Exploring Design Choices for Building Language-Specific LLMs
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
Data Contamination Can Cross Language Barriers
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Probing the Decision Boundaries of In-context Learning in Large Language Models
CancerLLM: A Large Language Model in Cancer Domain
CarLLaVA: Vision language models for camera-only closed-loop driving
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
Long Context Transfer from Language to Vision
Efficient Continual Pre-training by Mitigating the Stability Gap
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Sparse High Rank Adapters
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
WARP: On the Benefits of Weight Averaged Rewarded Policies
Scaling Laws for Linear Complexity Language Models
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
FIRST: Faster Improved Listwise Reranking with Single Token Decoding
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
Confidence Regulation Neurons in Language Models
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
Hallucination is Inevitable: An Innate Limitation of Large Language Models
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate
PostMark: A Robust Blackbox Watermark for Large Language Models
Can LLMs Learn Macroeconomic Narratives from Social Media?
Embodied Instruction Following in Unknown Environments
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Data curation via joint example selection further accelerates multimodal learning
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
LongIns: A Challenging Long-context Instruction-based Exam for LLMs
Multi-property Steering of Large Language Models with Dynamic Activation Composition
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Benchmarking Mental State Representations in Language Models
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
Delving into the Utilisation of ChatGPT in Scientific Publications in Astronomy
How to Compute the Probability of a Word
Unlocking Continual Learning Abilities in Language Models
Large Language Models Assume People are More Rational than We Really are
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Finding Transformer Circuits with Edge Pruning
A mathematical perspective on Transformers
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
LLMs' Classification Performance is Overclaimed
Cross-Modality Safety Alignment
Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology
Preference Distillation for Personalized Generative Recommendation
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Associative Recurrent Memory Transformer
Symbolic Learning Enables Self-Evolving Agents
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
From Rewriting to Remembering: Common Ground for Conversational QA Models
Adversarial Search Engine Optimization for Large Language Models
A Closer Look into Mixture-of-Experts in Large Language Models
Multimodal foundation world models for generalist embodied agents
Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Efficacy of Language Model Self-Play in Non-Zero-Sum Games
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Large Language Models are Interpretable Learners
Are Language Models Actually Useful for Time Series Forecasting?
CAVE: Controllable Authorship Verification Explanations
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
One Thousand and One Pairs: A "novel" challenge for long-context language models
Breaking the Frame: Image Retrieval by Visual Overlap Prediction
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
A Benchmark for Learning to Translate a New Language from One Grammar Book
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Aligning Teacher with Student Preferences for Tailored Training Data Generation
Simulating Classroom Education with LLM-Empowered Agents
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Can LLMs Learn by Teaching? A Preliminary Study
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Is Programming by Example solved by LLMs?
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
LiveBench: A Challenging, Contamination-Free LLM Benchmark
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation
Revealing Fine-Grained Values and Opinions in Large Language Models
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs
News Deja Vu: Connecting Past and Present with Semantic Search
Contrastive Entity Coreference and Disambiguation for Historical Texts
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models
Self-Retrieval: Building an Information Retrieval System with One Large Language Model
Cognitive Architectures for Language Agents
Adaptable Logical Control for Large Language Models
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages
A Critical Study of What Code-LLMs (Do Not) Learn
"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Understanding and Mitigating Language Confusion in LLMs
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Scaling Synthetic Data Creation with 1,000,000,000 Personas
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
The Remarkable Robustness of LLMs: Stages of Inference?
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Following Length Constraints in Instructions
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Direct Preference Knowledge Distillation for Large Language Models
Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning
Monitoring Latent World States in Language Models with Propositional Probes
RouteLLM: Learning to Route LLMs with Preference Data
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
RaTEScore: A Metric for Radiology Report Generation
PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Scaling Laws for Fact Memorization of Large Language Models
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
RegMix: Data Mixture as Regression for Language Model Pre-training
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
ColPali: Efficient Document Retrieval with Vision Language Models
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER
MIRAI: Evaluating LLM Agents for Event Forecasting
Searching for Best Practices in Retrieval-Augmented Generation
$\text{Memory}^3$: Language Modeling with Explicit Memory
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
M2QA: Multi-domain Multilingual Question Answering
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Brevity is the soul of wit: Pruning long files for code generation
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
From RAG to RICHES: Retrieval Interlaced with Sequence Generation
LiteSearch: Efficacious Tree Search for LLM
Detection and Measurement of Syntactic Templates in Generated Text
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Compressing Search with Language Models
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization
ProgressGym: Alignment with a Millennium of Moral Progress
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
Changing Answer Order Can Decrease MMLU Accuracy
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
A Review of Large Language Models and Autonomous Agents in Chemistry
Agentless: Demystifying LLM-based Software Engineering Agents
Eliminating Position Bias of Language Models: A Mechanistic Approach
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset
FLoRA: Low-Rank Core Space for N-dimension
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
TokenPacker: Efficient Visual Projector for Multimodal LLM
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
On the Limitations of Fine-tuned Judge Models for LLM Evaluation
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media
A Solvable Model of Neural Scaling Laws
Hopfield Networks is All You Need
Improving Transformer Models by Reordering their Sublayers
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses
Prompt Stability Scoring for Text Annotation with Large Language Models
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
AI-native Memory: A Pathway from LLMs Towards AGI
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
From Efficient Multimodal Models to World Models: A Survey
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
LLMs can learn self-restraint through iterative self-reflection
ReGround: Improving Textual and Spatial Grounding at No Cost
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Large language models can accurately predict searcher preferences
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Large Language Models Enable Few-Shot Clustering
LM vs LM: Detecting Factual Errors via Cross Examination
Perspectives on Large Language Models for Relevance Judgment
Human-like Summarization Evaluation with ChatGPT
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
How Does Quantization Affect Multilingual LLMs?
Are Large Language Models Consistent over Value-laden Questions?
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
Tree Search for Language Model Agents
Towards Compositionality in Concept Learning
Unified Auto-Encoding with Masked Diffusion
GraphEdit: Large Language Models for Graph Structure Learning
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
LLM-Select: Feature Selection with Large Language Models
Improving Reward Models with Synthetic Critiques
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
On scalable oversight with weak LLMs judging strong LLMs
Fast Forwarding Low-Rank Training
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Mixture of A Million Experts
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Anthropocentric bias and the possibility of artificial cognition
AgentInstruct: Toward Generative Teaching with Agentic Flows
HEMM: Holistic Evaluation of Multimodal Foundation Models
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
52B to 1T: Lessons Learned via Tele-FLM Series
Reasoning in Large Language Models: A Geometric Perspective
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling
Synthetic Multimodal Question Generation
Unveiling Encoder-Free Vision-Language Models
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
Distilling System 2 into System 1
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Granular Privacy Control for Geolocation with Vision Language Models
VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Multi-Object Hallucination in Vision-Language Models
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
LLMBox: A Comprehensive Library for Large Language Models
Training Task Experts through Retrieval Based Distillation
Language Models Encode Collaborative Signals in Recommendation
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation
Machine Unlearning Fails to Remove Data Poisoning Attacks
BeHonest: Benchmarking Honesty in Large Language Models
Emu: Generative Pretraining in Multimodality
Enabling Large Language Models to Generate Text with Citations
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
Vision language models are blind
Composable Interventions for Language Models
A Single Transformer for Scalable Vision-Language Modeling
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
Decoding-Time Language Model Alignment with Multiple Objectives
WebCanvas: Benchmarking Web Agents in Online Environments
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Visual representations in the human brain are aligned with large language models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension
Inference Performance Optimization for Large Language Models on CPUs
LETS-C: Leveraging Language Embedding for Time Series Classification
Just read twice: closing the recall gap for recurrent language models
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Forcing Diffuse Distributions out of Language Models
Evaluating LLMs at Detecting Errors in LLM Responses
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
Label Supervised LLaMA Finetuning
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Review-LLM: Harnessing Large Language Models for Personalized Review Generation
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
MAVIS: Mathematical Visual Instruction Tuning
Automata-based constraints for language model decoding
GTA: A Benchmark for General Tool Agents
SEED-Story: Multimodal Long Story Generation with Large Language Model
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On
PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Genomic Language Models: Opportunities and Challenges
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Self-Recognition in Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Teaching Transformers Causal Reasoning through Axiomatic Training
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
Why are Visually-Grounded Language Models Bad at Image Classification?
LoQT: Low Rank Adapters for Quantized Training
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Lynx: An Open Source Hallucination Evaluation Model
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
Human-like Episodic Memory for Infinite Context LLMs
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
H2O-Danube3 Technical Report
Context Embeddings for Efficient Answer Generation in RAG
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
RoboMorph: Evolving Robot Morphology using Large Language Models
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
New Desiderata for Direct Preference Optimization
Characterizing Prompt Compression Methods for Long Context Inference
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Accuracy is Not All You Need
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs
Universal Neurons in GPT2 Language Models
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Qwen2 Technical Report
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
LAB-Bench: Measuring Capabilities of Language Models for Biology Research
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models
Representing Rule-based Chatbots with Transformers
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Benchmarking Language Model Creativity: A Case Study on Code Generation
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
Spontaneous Reward Hacking in Iterative Self-Refinement
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
LLM Circuit Analyses Are Consistent Across Training and Scale
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together
Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce
When is the consistent prediction likely to be a correct prediction?
Transformer tricks: Removing weights for skipless transformers
Transformers represent belief state geometry in their residual stream
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
A Survey on LoRA of Large Language Models
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Patch-Level Training for Large Language Models
E5-V: Universal Embeddings with Multimodal Large Language Models
Case2Code: Learning Inductive Reasoning with Synthetic Data
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections
The Art of Saying No: Contextual Noncompliance in Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
Practical Unlearning for Large Language Models
Does Refusal Training in LLMs Generalize to the Past Tense?
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization
Understanding Reference Policies in Direct Preference Optimization
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation
PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation
Weak-to-Strong Reasoning
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation
Benchmarking Vision Language Models for Cultural Understanding
DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Scaling Granite Code Models to 128K Context
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Lean-STaR: Learning to Interleave Thinking and Proving
GAVEL: Generating Games Via Evolution and Language Models
Transformer Layers as Painters
AUITestAgent: Automatic Requirements Oriented GUI Function Testing
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Training on the Test Task Confounds Evaluation and Emergence
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing
PaliGemma: A versatile 3B VLM for transfer
A Survey on Mixture of Experts
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Consent in Crisis: The Rapid Decline of the AI Data Commons
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
The Vision of Autonomic Computing: Can LLMs Make It a Reality?
EVLM: An Efficient Vision-Language Model for Visual Understanding
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
SciCode: A Research Coding Benchmark Curated by Scientists
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
VideoGameBunny: Towards vision assistants for video games
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization
NNsight and NDIF: Democratizing Access to Foundation Model Internals
Fractal Patterns May Illuminate the Success of Next-Token Prediction
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
NV-Retriever: Improving text embedding models with effective hard-negative mining
Efficient Retrieval with Learned Similarities
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Gated Linear Attention Transformers with Hardware-Efficient Training
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Discrete Flow Matching
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
BOND: Aligning LLMs with Best-of-N Distillation
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Shared Imagination: LLMs Hallucinate Alike
Aligning Large Language Models with Human: A Survey
Compact Language Models via Pruning and Knowledge Distillation
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Demystifying Chains, Trees, and Graphs of Thoughts
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing
PERSONA: A Reproducible Testbed for Pluralistic Alignment
Scalify: scale propagation for efficient low-precision LLM training
Reinforced Prompt Personalization for Recommendation with Large Language Models
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Course-Correction: Safety Alignment Using Synthetic Preferences
Longhorn: State Space Models are Amortized Online Learners
u-$μ$P: The Unit-Scaled Maximal Update Parametrization
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Fluent Student-Teacher Redteaming
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Very Large-Scale Multi-Agent Simulation in AgentScope
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
$VILA^2$: VILA Augmented VILA
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach
Visual Haystacks: Answering Harder Questions About Sets of Images
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Prover-Verifier Games improve legibility of LLM outputs
Exploring Advanced Large Language Models with LLMsuite
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering
The Art of Refusal: A Survey of Abstention in Large Language Models
SALMON: Self-Alignment with Instructable Reward Models
Small Molecule Optimization with Large Language Models
Generation Constraint Scaling Can Mitigate Hallucination
A Survey on Employing Large Language Models for Text-to-SQL Tasks
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
Prompt Injection Attacks on Large Language Models in Oncology
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Diffusion Feedback Helps CLIP See Better
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
PersonaGym: Evaluating Persona Agents and LLMs
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
Transformers need glasses! Information over-squashing in language tasks
ThinK: Thinner Key Cache by Query-Driven Pruning
Meltemi: The first open Large Language Model for Greek
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
Machine Unlearning in Generative AI: A Survey
A Large Encoder-Decoder Family of Foundation Models For Chemical Language
AI-Assisted Generation of Difficult Math Questions
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Demystifying Verbatim Memorization in Large Language Models
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
The Llama 3 Herd of Models
ShieldGemma: Generative AI Content Moderation Based on Gemma
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Adaptive Retrieval-Augmented Generation for Conversational Systems
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Latxa: An Open Language Model and Evaluation Suite for Basque
Improving Retrieval Augmented Language Model with Self-Reasoning
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Data Contamination Report from the 2024 CONDA Shared Task
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Tamper-Resistant Safeguards for Open-Weight LLMs
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
OmniParser for Pure Vision Based GUI Agent
Finch: Prompt-guided Key-Value Cache Compression
Gemma 2: Improving Open Language Models at a Practical Size
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Apple Intelligence Foundation Language Models
Multi-group Uncertainty Quantification for Long-form Text Generation
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Transformers are Universal In-context Learners
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation
Leveraging LLM Reasoning Enhances Personalized Recommender Systems
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
A Survey of Mamba
Jailbreaking Text-to-Image Models with LLM-Based Agents
Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks
Generative Retrieval with Preference Optimization for E-commerce Search
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation
Improving Retrieval in Sponsored Search by Leveraging Query Context Signals
GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering
Crafting the Path: Robust Query Rewriting for Information Retrieval
Harnessing Large Language Models for Multimodal Product Bundling
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era
Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models
Vortex under Ripplet: An Empirical Study of RAG-enabled Applications
MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling
Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I
AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation
Retrieval-augmented generation in multilingual settings
Optimization of Retrieval-Augmented Generation Context with Outlier Detection
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification
LumberChunker: Long-Form Narrative Document Segmentation
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
Improving Zero-shot LLM Re-Ranker with Risk Minimization
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
Retrieval Augmented Zero-Shot Text Classification
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking
StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation
PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation
Unified Active Retrieval for Retrieval Augmented Generation
LLM-enhanced Reranking in Recommender Systems
Intermediate Distillation: Data-Efficient Distillation from Black-Box LLMs for Information Retrieval
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs
Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens
A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search
Async Learned User Embeddings for Ads Delivery Optimization
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation
MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model
Evaluating the External and Parametric Knowledge Fusion of Large Language Models
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers
RAG Does Not Work for Enterprises
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models
Voice Jailbreak Attacks Against GPT-4o
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
DeeperImpact: Optimizing Sparse Learned Index Structures
Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference
RaFe: Ranking Feedback Improves Query Rewriting for RAG
Question-Based Retrieval using Atomic Units for Enterprise RAG
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy
Redefining Information Retrieval of Structured Database via Large Language Models
Contextualization with SPLADE for High Recall Retrieval
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Comparative Analysis of Retrieval Systems in the Real World
Semi-Parametric Retrieval via Binary Token Index
Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
Retrieval-Oriented Knowledge for Click-Through Rate Prediction
Leveraging Large Language Models for Multimodal Search
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
From Matching to Generation: A Survey on Generative Information Retrieval
Retrieval Augmented Generation for Domain-specific Question Answering
Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding
Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL
Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers
Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing
Recall-Augmented Ranking: Enhancing Click-Through Rate Prediction Accuracy with Cross-Stage Data
The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation
Efficient Prompting Methods for Large Language Models: A Survey
Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models
PMG : Personalized Multimodal Generation with Large Language Models
RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Generative Information Retrieval Evaluation
End-to-end training of Multimodal Model and ranking Model
Event-enhanced Retrieval in Real-time Search
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models
Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Shallow Cross-Encoders for Low-Latency Retrieval
Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models
Generate then Retrieve: Conversational Response Retrieval Using LLMs as Answer and Query Generators
Are Large Language Models Good at Utility Judgments?
SelfIE: Self-Interpretation of Large Language Model Embeddings
Make Large Language Model a Better Ranker
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check
CoLLEGe: Concept Embedding Generation for Large Language Models
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation
JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning
Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning
Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases
Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback
ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce
LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction
Embedding-based search in JetBrains IDEs
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges
ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework
Meta-Task Prompting Elicits Embeddings from Large Language Models
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation
Corpus-Steered Query Expansion with Large Language Models
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
ESE: Espresso Sentence Embeddings
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions
Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models
Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge
ARKS: Active Retrieval in Knowledge Soup for Code Generation
Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
TriSampler: A Better Negative Sampling Principle for Dense Retrieval
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
T-RAG: Lessons from the LLM Trenches
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models
Non-autoregressive Generative Models for Reranking Recommendation
History, Development, and Principles of Large Language Models-An Introductory Survey
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback
Leveraging LLMs for Unsupervised Dense Retriever Ranking
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Retrieve to Explain: Evidence-driven Predictions with Language Models
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
Locally-Adaptive Quantization for Streaming Vector Search
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA
When Large Language Models Meet Vector Databases: A Survey
Data-efficient Fine-tuning for LLM-based Recommendation
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Re3val: Reinforced and Reranked Generative Retrieval
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models
Generative Dense Retrieval: Memory Can Be a Burden
The Chronicles of RAG: The Retriever, the Chunk and the Generator
Curator: Efficient Indexing for Multi-Tenant Vector Databases
Bridging the Preference Gap between Retrievers and LLMs
InRanker: Distilled Rankers for Zero-shot Information Retrieval
Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback
Unsupervised hard Negative Augmentation for contrastive learning
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation
Large Language Models are Not Stable Recommender Systems
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Unlocking the Potential of Large Language Models for Explainable Recommendations
Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems
Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM
Dense X Retrieval: What Retrieval Granularity Should We Use?
End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models
Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders
On Retrieval Augmentation and the Limitations of Language Model Training
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Text Retrieval with Multi-Stage Re-Ranking Models
LLatrieval: LLM-Verified Retrieval for Verifiable Generation
CoverBench: A Challenging Benchmark for Complex Claim Verification
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
Exploring Fine-tuning ChatGPT for News Recommendation
Self-Taught Evaluators
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads
Language Model Can Listen While Speaking
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
Can LLMs predict the convergence of Stochastic Gradient Descent?
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
LLaVA-OneVision: Easy Visual Task Transfer
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Generative Retrieval with Few-shot Indexing
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
Synthesizing Text-to-SQL Data from Weak and Strong LLMs
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data
EXAONE 3.0 7.8B Instruction Tuned Language Model
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Learning Task Decomposition to Assist Humans in Competitive Programming
Better Alignment with Instruction Back-and-Forth Translation
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation
ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Diffusion Guided Language Modeling
Conversational Prompt Engineering
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
Pairwise Judgment Formulation for Semantic Embedding Model in Web Search
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Interpreting Attention Layer Outputs with Sparse Autoencoders
Fine-tuning language models to find agreement among humans with diverse preferences
VITA: Towards Open-Source Interactive Omni Multimodal LLM
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks
Early Exit Strategies for Approximate k-NN Search in Dense Retrieval
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Relevance Filtering for Embedding-based Retrieval
OpenResearcher: Unleashing AI for Accelerated Scientific Research
Enhancing Relevance of Embedding-based Retrieval at Walmart
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Natural Language Outlines for Code: Literate Programming in the LLM Era
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation
PhysBERT: A Text Embedding Model for Physics Scientific Literature
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Med42-v2: A Suite of Clinical LLMs
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Layerwise Recurrent Router for Mixture-of-Experts
Prompt Tuning as User Inherent Profile Inference Machine
Large Language Model Agent in Financial Trading: A Survey
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models
Design Proteins Using Large Language Models: Enhancements and Comparative Analyses
Hermes 3 Technical Report
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Aquila2 Technical Report
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Hierarchical Structured Neural Network for Retrieval
BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance
Can Large Language Models Understand Symbolic Graphics Programs?
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Post-Training Sparse Attention with Double Sparsity
Large language models can be zero-shot anomaly detectors for time series?
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
FuseChat: Knowledge Fusion of Chat Models
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
NL2OR: Solve Complex Operations Research Problems Using Natural Language Inputs
Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models
Min P Sampling: Balancing Creativity and Coherence at High Temperature
LLM Stability: A detailed analysis with some surprises
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
A Survey on Benchmarks of Multimodal Large Language Models
Where is the signal in tokenization space?
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering
Can Large Language Models Reason? A Characterization via 3-SAT
Large language models can consistently generate high-quality content for election disinformation operations
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Meta Knowledge for Retrieval Augmented Large Language Models
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models
Graph Retrieval-Augmented Generation: A Survey
Patched MOA: optimizing inference for diverse software development tasks
Patched RTC: evaluating LLMs for diverse software development tasks
InstructCoder: Instruction Tuning Large Language Models for Code Editing
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
HMoE: Heterogeneous Mixture of Experts for Language Modeling
Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval
Analysis of Plan-based Retrieval for Grounded Text Generation
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
Goldfish: Monolingual Language Models for 350 Languages
BLADE: Benchmarking Language Model Agents for Data-Driven Science
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
LLM Pruning and Distillation in Practice: The Minitron Approach
Critique-out-Loud Reward Models
FocusLLM: Scaling LLM's Context by Parallel Decoding
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
StructuredRAG: JSON Response Formatting with Large Language Models
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
Mistral-SPLADE: LLMs for for better Learned Sparse Retrieval
CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs
Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data
Flexora: Flexible Low Rank Adaptation for Large Language Models
Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information
Controllable Text Generation for Large Language Models: A Survey
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Drama Engine: A Framework for Narrative Agents
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
Automating Thought of Search: A Journey Towards Soundness and Completeness
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
Matmul or No Matmal in the Era of 1-bit LLMs
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
QUB-Cirdan at "Discharge Me!": Zero shot discharge letter generation by open-source LLM
Exploring Backdoor Attacks against Large Language Model-based Decision Making
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Visual Perception by Large Language Model's Weights
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization
Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs
InstructionCP: A fast approach to transfer Large Language Models into target language
KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Are Large Language Models Chameleons?
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery
Can Graph Learning Improve Task Planning?
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners
Language Generation with Strictly Proper Scoring Rules
Compressing Large Language Models using Low Rank and Low Precision Decomposition
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
Mechanistic Interpretability of Binary and Ternary Transformers
Enhanced Robot Arm at the Edge with NLP and Vision Systems
Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
THREAD: Thinking Deeper with Recursive Spawning
Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Autoformalizing Euclidean Geometry
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
MotionLLM: Multimodal Motion-Language Learning with Large Language Models
Exploring the LLM Journey from Cognition to Expression with Linear Representations
A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing
Laurel: Generating Dafny Assertions Using Large Language Models
LLMs for User Interest Exploration in Large-scale Recommendation Systems
Devil's Advocate: Anticipatory Reflection for LLM Agents
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference
A statistical framework for weak-to-strong generalization
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
Finetuning Large Language Model for Personalized Ranking
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Keypoint-based Progressive Chain-of-Thought Distillation for LLMs
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models
Claim Verification in the Age of Large Language Models: A Survey
Streaming Long Video Understanding with Large Language Models
Your Large Language Models Are Leaving Fingerprints
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Why Not Transform Chat Large Language Models to Non-English?
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework
Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models
DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification
How to set AdamW's weight decay as you scale model and dataset size
Safety Alignment for Vision Language Models
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Large Language Models are Effective Priors for Causal Graph Discovery
HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model
WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
LIRE: listwise reward enhancement for preference alignment
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Towards Evaluating and Building Versatile Large Language Models for Medicine
LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Domain-specific long text classification from sparse relevant information
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews
Insights from Benchmarking Frontier Language Models on Web App Code Generation
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Semantic Alignment for Multimodal Large Language Models
Memory-Efficient LLM Training with Online Subspace Descent
A Survey of Hallucination in Large Foundation Models
MEDCO: Medical Education Copilots Based on A Multi-Agent Framework
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
MobileQuant: Mobile-friendly Quantization for On-device Language Models
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs
LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
Efficient Detection of Toxic Prompts in Large Language Models
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
A Web-Based Solution for Federated Learning with LLM-Based Automation
NanoFlow: Towards Optimal Large Language Model Serving Throughput
A Survey of Large Language Models for European Languages
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Challenges and Responses in the Practice of Large Language Models
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
Inverse Scaling: When Bigger Isn't Better
Generative Verifiers: Reward Modeling as Next-Token Prediction
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
PAT: Pruning-Aware Tuning for Large Language Models
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express
Agentic Retrieval-Augmented Generation for Time Series Analysis
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing
A Law of Next-Token Prediction in Large Language Models
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Efficient LLM Scheduling by Learning to Rank
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Decentralized LLM Inference over Edge Networks with Energy Harvesting
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature
Geometry of Lightning Self-Attention: Identifiability and Dimension
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Conan-embedding: General Text Embedding with More and Better Negative Samples
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Awes, Laws, and Flaws From Today's LLM Research
Persuasion Games using Large Language Models
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation
A Survey on Evaluating Large Language Models in Code Generation Tasks
Law of Vision Representation in MLLMs
SynDL: A Large-Scale Synthetic Test Collection
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
Understanding the User: An Intent-Based Ranking Dataset
Iterative Graph Alignment
Icing on the Cake: Automatic Code Summarization at Ericsson
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
LLMs generate structurally realistic social networks but overestimate political homophily
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents
LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
InkubaLM: A small language model for low-resource African languages
SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Automatic Differential Diagnosis using Transformer-Based Multi-Label Sequence Classification
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems
CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
MemLong: Memory-Augmented Retrieval for Long Text Modeling
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
Selective Preference Optimization via Token-Level Reward Function Estimation
Impact of ChatGPT on the writing style of condensed matter physicists
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
ImageBind-LLM: Multi-modality Instruction Tuning
Transformers as Support Vector Machines
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection
RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer
OLMoE: Open Mixture-of-Experts Language Models
BEAVER: An Enterprise Benchmark for Text-to-SQL
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Contemporary Model Compression on Large Language Models Inference
rerankers: A Lightweight Python Library to Unify Ranking Methods
FuzzCoder: Byte-level Fuzzing Test via Large Language Model
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models
Focus Agent: LLM-Powered Virtual Focus Group
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks
AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction
In Defense of RAG in the Era of Long-Context Language Models
Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models
ProGRes: Prompted Generative Rescoring on ASR n-Best
Augmented Reality without Borders: Achieving Precise Localization Without Maps
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
CogVLM2: Visual Language Models for Image and Video Understanding
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need
In-Context Imitation Learning via Next-Token Prediction
A Practitioner's Guide to Continual Multimodal Pretraining
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Configurable Foundation Models: Building LLMs from a Modular Perspective
Towards a Unified View of Preference Learning for Large Language Models: A Survey
A Comparative Study of Pre-training and Self-training
Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models?
RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models
Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Unforgettable Generalization in Language Models
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI
Imitating Language via Scalable Inverse Reinforcement Learning
Statically Contextualizing Large Language Models with Typed Holes
ContextCite: Attributing Model Generation to Context
TinyAgent: Function Calling at the Edge
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
Ruri: Japanese General Text Embeddings
On-Device Language Models: A Comprehensive Review
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments
Building Math Agents with Multi-Turn Iterative Preference Learning
Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges
Attention Heads of Large Language Models: A Survey
Planning In Natural Language Improves LLM Search For Code Generation
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
Extracting Paragraphs from LLM Token Activations
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Large Language Model-Based Agents for Software Engineering: A Survey
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
Evolution of Social Norms in LLM Agents using Natural Language
A Static Evaluation of Code Completion by Large Language Models
Universal Transformers
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation
The Compressor-Retriever Architecture for Language Model OS
A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine
A Survey for Large Language Models in Biomedicine
Watermarking Techniques for Large Language Models: A Survey
Genetic Approach to Mitigate Hallucination in Generative IR
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs
Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models
An overview of domain-specific foundation model: key technologies, applications and challenges
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data
MoRe Fine-Tuning with 10x Fewer Parameters
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
Benchmarking Chinese Knowledge Rectification in Large Language Models
A System and Benchmark for LLM-based Q\&A on Heterogeneous Data
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning
Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
Achieving Peak Performance for Large Language Models: A Systematic Review
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
Improving Pretraining Data Using Perplexity Correlations
LLMs Will Always Hallucinate, and We Need to Live With This
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
How Does Code Pretraining Affect Language Model Task Performance?
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Radiology-Llama2: Best-in-Class Large Language Model for Radiology
Synthetic continued pretraining
Agent Workflow Memory
Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering
STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM
What is the Role of Small Models in the LLM Era: A Survey
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?
Length Desensitization in Directed Preference Optimization
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Can Large Language Models Unlock Novel Scientific Research Ideas?
SongCreator: Lyrics-based Universal Song Generation
Self-Harmonized Chain of Thought
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Generative Hierarchical Materials Search
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
What Makes a Maze Look Like a Maze?
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Retrieval Augmented Thought Process for Private Data Handling in Healthcare
Dense Reward for Free in Reinforcement Learning from Human Feedback
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT
Representation Tuning
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
Alleviating Hallucinations in Large Language Models with Scepticism Modeling
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
Harmonic Reasoning in Large Language Models
STLM Engineering Report: Dropout
Towards Automated Machine Learning Research
Optimization Hyper-parameter Laws for Large Language Models
Residual Stream Analysis with Multi-Layer SAEs
LAST: Language Model Aware Speech Tokenization
A Fused Large Language Model for Predicting Startup Success
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers
Accelerating Large Language Model Training with Hybrid GPU-based Compression
Training on the Benchmark Is Not All You Need
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
LanguaShrink: Reducing Token Overhead with Psycholinguistics
EPO: Hierarchical LLM Agents with Environment Preference Optimization
Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games
Harmonized Speculative Sampling
Why transformers are obviously good models of language
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
How transformers learn structured data: insights from hierarchical filtering
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
Search-Based LLMs for Code Optimization
Memorization In In-Context Learning
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
Demystifying the Communication Characteristics for Distributed Transformer Models
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Performance Law of Large Language Models
Importance Weighting Can Help Large Language Models Self-Improve
Acquiring Bidirectionality via Large and Small Language Models
Extracting Sentence Embeddings from Pretrained Transformer Models
Instruct Large Language Models to Generate Scientific Literature Survey Step by Step
LLMs can Schedule
A Unified Framework for Model Editing
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies
Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data
Animate, or Inanimate, That is the Question for Large Language Models
Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
From Words to Worth: Newborn Article Impact Prediction with LLM
Is Child-Directed Speech Effective Training Data for Language Models?
Automated Theorem Provers Help Improve Large Language Model Reasoning
SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages
Cross-layer Attention Sharing for Large Language Models
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer
Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models
On the Resilience of Multi-Agent Systems with Malicious Agents
Disentangling Dense Embeddings with Sparse Autoencoders
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
Entropy, Thermodynamics and the Geometrization of the Language Model
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
CultureVo: The Serious Game of Utilizing Gen AI for Enhancing Cultural Intelligence
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
LLMs' Understanding of Natural Language Revealed
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models
Do Language Models Have a Critical Period for Language Acquisition?
Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications
Towards Effective and Efficient Continual Pre-training of Large Language Models
Climbing the Complexity Ladder with Expressive Attention
Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies
I Could've Asked That: Reformulating Unanswerable Questions
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment
On the Design and Analysis of LLM-Based Algorithms
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
A mathematical framework of intelligence and consciousness based on Riemannian Geometry
Enhancing Training Efficiency Using Packing with Flash Attention
Banishing LLM Hallucinations Requires Rethinking Generalization
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser
Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
A Notion of Complexity for Theory of Mind via Discrete World Models
Tree Cross Attention
Sentence Bottleneck Autoencoders from Transformer Language Models
Neural Machine Translation without Embeddings
Agents in Software Engineering: Survey, Landscape, and Vision
Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions
Programming Refusal with Conditional Activation Steering
AIPO: Improving Training Objective for Iterative Preference Optimization
Your Weak LLM is Secretly a Strong Teacher for Alignment
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task
Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents
CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks
LLM Critics Help Catch LLM Bugs
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
Reasoning with Language Model is Planning with World Model
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
BERT Rediscovers the Classical NLP Pipeline
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Assessing Adversarial Robustness of Large Language Models: An Empirical Study
Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs
LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning
Instigating Cooperation among LLM Agents Using Adaptive Information Modulation
Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
On the Diagram of Thought
CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
AudioBERT: Audio Knowledge Augmented Language Model
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Qwen2.5-Coder Technical Report
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
A Controlled Study on Long Context Extension and Generalization in LLMs
GRIN: GRadient-INformed MoE
LLMs + Persona-Plug = Personalized LLMs
Human-like Affective Cognition in Foundation Models
Designing Interfaces for Multimodal Vector Search Applications
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
A Framework for Ranking Content Providers Using Prompt Engineering and Self-Attention Network
Scaling FP8 training to trillion-token LLMs
NVLM: Open Frontier-Class Multimodal LLMs
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Towards Time Series Reasoning with LLMs
Learning Spatially-Aware Language and Audio Embedding
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models
LOLA -- An Open-Source Massively Multilingual Large Language Model
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer
Semformer: Transformer Language Models with Semantic Planning
Embedding Geometries of Contrastive Language-Image Pre-Training
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
On the limits of agency in agent-based models
Schrodinger's Memory: Large Language Models
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study
Stable Language Model Pre-training by Reducing Embedding Variability
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
The Expressive Power of Transformers with Chain of Thought
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Training Language Models to Self-Correct via Reinforcement Learning
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
Enhancing E-commerce Product Title Translation with Retrieval-Augmented Generation and Large Language Models
Language Models Learn to Mislead Humans via RLHF
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
MEXMA: Token-level objectives improve sentence representations
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
BERT-VBD: Vietnamese Multi-Document Summarization Framework
Measuring Human and AI Values based on Generative Psychometrics with Large Language Models
RoMath: A Mathematical Reasoning Benchmark in Romanian
Compressing LLMs: The Truth is Rarely Pure and Never Simple
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Knowledge-Based Domain-Oriented Data Augmentation for Enhancing Unsupervised Sentence Embedding
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances
Retrieval-Augmented Test Generation: How Far Are We?
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues
Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights
Linear Recency Bias During Training Improves Transformers' Fit to Reading Times
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Making Large Language Models into World Models with Precondition and Effect Knowledge
Linguini: A benchmark for language-agnostic linguistic reasoning
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking
SLIMER-IT: Zero-Shot NER on Italian Language
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization
Adaptive Large Language Models By Layerwise Attention Shortcuts
Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors
MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences
Language Models "Grok" to Copy
Autoregressive + Chain of Thought $\simeq$ Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs
When Context Leads but Parametric Memory Follows in Large Language Models
SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses
Mixture of Diverse Size Experts
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent
Semi-Supervised Reward Modeling via Iterative Self-Training
Spectral Filters, Dark Signals, and Attention Sinks
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection
ChainBuddy: An AI Agent System for Generating LLM Pipelines
ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion
RRM: Robust Reward Model Training Mitigates Reward Hacking
AutoVerus: Automated Proof Generation for Rust Code
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning
Jailbreaking Large Language Models with Symbolic Mathematics
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments
An adapted large language model facilitates multiple medical tasks in diabetes care
KTO: Model Alignment as Prospect Theoretic Optimization
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Towards Understanding Grokking: An Effective Theory of Representation Learning
What Makes Good In-Context Examples for GPT-$3$?
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Learning from Contrastive Prompts: Automated Optimization and Adaptation
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Phantom of Latent for Large Language and Vision Models
Target-Aware Language Modeling via Granular Data Sampling
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
A Case Study of Web App Coding with OpenAI Reasoning Models
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
LLM-Assisted Visual Analytics: Opportunities and Challenges
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling
Instruction Following without Instruction Tuning
OmniBench: Towards The Future of Universal Omni-Language Models
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
A Survey on the Honesty of Large Language Models
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents
Making Text Embedders Few-Shot Learners
Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA
EuroLLM: Multilingual Language Models for Europe
Small Language Models: Survey, Measurements, and Insights
Reward-Robust RLHF in LLMs
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning
Block-Attention for Low-Latency RAG
Federated Large Language Models: Current Progress and Future Directions
Visual Prompting in Multimodal Large Language Models: A Survey
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems
A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
Disentangling Questions from Query Generation for Task-Adaptive Retrieval
Boosting Healthcare LLMs Through Retrieved Context
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models
A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions
Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models
EgoLM: Multi-Modal Language Model of Egocentric Motions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Looped Transformers for Length Generalization
Automatic Instruction Evolving for Large Language Models
Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
Infer Human's Intentions Before Following Natural Language Instructions
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends
VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference
Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs
Multi-language Unit Test Generation using LLMs
CLUE: Concept-Level Uncertainty Estimation for Large Language Models
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models
Alignment-Aware Model Extraction Attacks on Large Language Models
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs
Hypothesizing Missing Causal Variables with LLMs
Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs
Membership Inference Attacks Against In-Context Learning
Deploying a Retrieval based Response Model for Task Oriented Dialogues
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Large Language Models Can Understanding Depth from Monocular Images
Addition is All You Need for Energy-efficient Language Models
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Can Models Learn Skill Composition from Examples?
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Hyper-Connections
Visual Question Decomposition on Multimodal Large Language Models
DiaSynth -- Synthetic Dialogue Generation Framework
On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation
LML: Language Model Learning a Dataset for Data-Augmented Prediction
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Emu3: Next-Token Prediction is All You Need
Learning the Latent Rules of a Game from Data: A Chess Story
Cottention: Linear Transformers With Cosine Attention
Do We Need Domain-Specific Embedding Models? An Empirical Investigation
Data Analysis in the Era of Generative AI
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
SciDFM: A Large Language Model with Mixture-of-Experts for Science
Generative Retrieval Meets Multi-Graded Relevance
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models
An Adversarial Perspective on Machine Unlearning for AI Safety
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
Natural Language Processing Methods for the Study of Protein-Ligand Interactions
Solving math word problems with process- and outcome-based feedback
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG
Law of the Weakest Link: Cross Capabilities of Large Language Models
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs
Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling
HelpSteer2-Preference: Complementing Ratings with Preferences
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Quantifying Generalization Complexity for Large Language Models
Not All LLM Reasoners Are Created Equal
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
FactAlign: Long-form Factuality Alignment of Large Language Models
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Contrastive Localized Language-Image Pre-Training
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Large Language Models as Markov Chains
Distilling an End-to-End Voice Assistant Without Instruction Training Data
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
General Preference Modeling with Preference Representations for Aligning Language Models
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head
Understanding Higher-Order Correlations Among Semantic Components in Embeddings
Calibrating Language Models with Adaptive Temperature Scaling
On the Inductive Bias of Stacking Towards Improving Reasoning
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Intelligence at the Edge of Chaos
Contextual Document Embeddings
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
AutoTrain: No-code training for state-of-the-art models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
The Perfect Blend: Redefining RLHF with Mixture of Judges
How Much Can RAG Help the Reasoning of LLM?
ENTP: Encoder-only Next Token Prediction
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
A General Framework for Producing Interpretable Semantic Text Embeddings
Showing LLM-Generated Code Selectively Based on Confidence of LLMs
Autoregressive Large Language Models are Computationally Universal
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
Intrinsic Evaluation of RAG Systems for Deep-Logic Questions
Erasing Conceptual Knowledge from Language Models
Selective Attention Improves Transformer
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning
In-context Learning in Presence of Spurious Correlations
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Efficient $1$-bit tensor approximations
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
Differential Transformer
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
DEPT: Decoupled Embeddings for Pre-training Language Models
Fast State Restoration in LLM Serving with HCache
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Reward-RAG: Enhancing RAG with Reward Driven Supervision
Named Clinical Entity Recognition Benchmark
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Why Do We Need Weight Decay in Modern Deep Learning?
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
Algorithmic Capabilities of Random Transformers
Inference Scaling for Long-Context Retrieval Augmented Generation
Preference Optimization as Probabilistic Inference
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization
LongGenBench: Long-context Generation Benchmark
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
ToolGen: Unified Tool Retrieval and Calling via Generation
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
A generative framework to bridge data-driven models and scientific theories in language neuroscience
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Steering Large Language Models between Code Execution and Textual Reasoning
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Archon: An Architecture Search Framework for Inference-Time Techniques
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Data Selection via Optimal Control for Language Models
Upcycling Large Language Models into Mixture of Experts
Temporal Reasoning Transfer from Text to Video
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
MM-Ego: Towards Building Egocentric Multimodal LLMs
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Can Transformers Reason Logically? A Study in SAT Solving
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Personalized Visual Instruction Tuning
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Pixtral 12B
Self-Boosting Large Language Models with Synthetic Preference Data
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders
Multimodal Situational Safety
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
CursorCore: Assist Programming through Aligning Anything
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Falcon Mamba: The First Competitive Attention-free 7B Language Model
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
Does Spatial Cognition Emerge in Frontier Models?
Round and Round We Go! What makes Rotary Positional Encodings useful?
Large Language Model Enhanced Text-to-SQL Generation: A Survey
Tracking Universal Features Through Fine-Tuning and Model Merging
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
Response Tuning: Aligning Large Language Models without Instruction
Collective Critics for Creative Story Generation
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
Emergent properties with repeated examples
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Intriguing Properties of Large Language and Vision Models
Benchmarking Agentic Workflow Generation
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Think Twice: A Human-like Two-stage Conversational Agent for Emotional Response Generation
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents
Vector-ICL: In-context Learning with Continuous Vector Representations
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
LLM Cascade with Multi-Objective Optimal Consideration
No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks
LLMs Are In-Context Reinforcement Learners
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Accelerated Preference Optimization for Large Language Model Alignment
How to Train Long-Context Language Models (Effectively)
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
SimpleStrat: Diversifying Language Model Generation with Stratification
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory
Baichuan-Omni Technical Report
KV Prediction for Improved Time to First Token
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Benign Overfitting in Single-Head Attention
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
RL, but don't do anything I wouldn't do
From Tokens to Words: On the Inner Lexicon of LLMs
Neuron-Level Sequential Editing for Large Language Models
Mixture of Attentions For Speculative Decoding
Integrating Natural Language Prompting Tasks in Introductory Programming Courses
Benign or Not-Benign Overfitting in Token Selection of Attention Mechanism
Causal Inference with Large Language Model: A Survey
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
PeerArg: Argumentative Peer Review with LLMs
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Thinking LLMs: General Instruction Following with Thought Generation
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Tree of Problems: Improving structured problem solving with compositionality
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
Think While You Generate: Discrete Diffusion with Planned Denoising
Strong Model Collapse
Fundamental Limitations on Subquadratic Alternatives to Transformers
On The Computational Complexity of Self-Attention
Primer: Searching for Efficient Transformers for Language Modeling
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
Agent-as-a-Judge: Evaluate Agents with Agents
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Empirical Study of Mutual Reinforcement Effect and Application in Few-shot Text Classification Tasks via Prompt
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models
What Matters in Transformers? Not All Attention is Needed
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
A Hitchhiker's Guide to Scaling Law Estimation
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations
Agentic Information Retrieval
In-Context Learning Enables Robot Action Prediction in LLMs
Exploring Model Kinship for Merging Large Language Models
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL
BenTo: Benchmark Task Reduction with In-Context Transferability
Revealing the Barriers of Language Agents in Planning
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Prompt Compression for Large Language Models: A Survey
Model Balancing Helps Low-data Training and Fine-tuning
The Moral Case for Using Language Model Agents for Recommendation
OMCAT: Omni Context Aware Transformer
FLARE: Faithful Logic-Aided Reasoning and Exploration
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Persistent Topological Features in Large Language Models
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Large Language Model Evaluation via Matrix Nuclear-Norm
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Parameter-Efficient Fine-Tuning of State Space Models
How Do Multilingual Models Remember? Investigating Multilingual Factual Recall Mechanisms
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities
LightRAG: Simple and Fast Retrieval-Augmented Generation
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Retrospective Learning from Interactions
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Harnessing Webpage UIs for Text-Rich Visual Understanding
Looking Inward: Language Models Can Learn About Themselves by Introspection
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Persistent Pre-Training Poisoning of LLMs
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation
Roadmap towards Superhuman Speech Understanding using Large Language Models
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
A Little Human Data Goes A Long Way
AERO: Softmax-Only LLMs for Efficient Private Inference
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
Improving Instruction-Following in Language Models through Activation Steering
JudgeBench: A Benchmark for Evaluating LLM-based Judges
From Commands to Prompts: LLM-based Semantic File System for AIOS
MoH: Multi-Head Attention as Mixture-of-Head Attention
When Attention Sink Emerges in Language Models: An Empirical View
Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key
FlatQuant: Flatness Matters for LLM Quantization
MedMobile: A mobile-sized language model with expert-level clinical capabilities
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
TopoLM: brain-like spatio-functional organization in a topographic language model
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings
Teaching Models to Balance Resisting and Accepting Persuasion
Do LLMs "know" internally when they follow instructions?
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Goal Inference from Open-Ended Dialog
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Pre-training Distillation for Large Language Models: A Design Space Exploration
Improve Vision Language Model Chain-of-thought Reasoning
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Baichuan Alignment Technical Report
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Decomposing The Dark Matter of Sparse Autoencoders
Sparse Universal Transformer
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Diverging Preferences: When do Annotators Disagree and do Models Know?
Do LLMs estimate uncertainty well in instruction-following?
Large Language Models Are Overparameterized Text Encoders
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy
Generative Reward Models
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception
Content Enhanced BERT-based Text-to-SQL Generation
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
In-context learning and Occam's razor
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Zero-shot Model-based Reinforcement Learning using Large Language Models
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Transformers are Efficient Compilers, Provably
LongReward: Improving Long-context Large Language Models with AI Feedback
Automatically Interpreting Millions of Features in Large Language Models
You can remove GPT2's LayerNorm by fine-tuning
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Value Residual Learning For Alleviating Attention Concentration In Transformers
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Aligning Large Language Models via Self-Steering Optimization
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
Beyond Retrieval: Generating Narratives in Conversational Recommender Systems
Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?
STAR: A Simple Training-free Approach for Recommendations using Large Language Models
SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search
Improving Pinterest Search Relevance Using Large Language Models
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Pyramid Vector Quantization for LLMs
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
Stick-breaking Attention
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Frontiers in Intelligent Colonoscopy
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy
LLM-based Optimization of Compound AI Systems: A Survey
Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers
M-RewardBench: Evaluating Reward Models in Multilingual Settings
MedINST: Meta Dataset of Biomedical Instructions
ALTA: Compiler-Based Analysis of Transformers
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Should We Really Edit Language Models? On the Evaluation of Edited Language Models
Why Does the Effective Context Length of LLMs Fall Short?
RRADistill: Distilling LLMs' Passage Ranking Ability for Document Re-Ranking of Long-Tail Queries in a Search Engine
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch
LOGO -- Long cOntext aliGnment via efficient preference Optimization
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Language Models are Symbolic Learners in Arithmetic
Balancing Label Quantity and Quality for Scalable Elicitation
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline
SpinQuant: LLM quantization with learned rotations
WAFFLE: Multi-Modal Model for Automated Front-End Development
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Can Knowledge Editing Really Correct Hallucinations?
When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Provably Robust Watermarks for Open-Source Language Models
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
Rethinking Softmax: Self-Attention with Polynomial Activations
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning
Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
DreamLIP: Language-Image Pre-training with Long Captions
Inductive Biases and Variable Creation in Self-Attention Mechanisms
An LLM Agent for Automatic Geospatial Data Analysis
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Long Term Memory: The Foundation of AI Self-Evolution
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
LeanAgent: Lifelong Learning for Formal Theorem Proving
Little Giants: Synthesizing High-Quality Embedding Data at Scale
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
A Survey of Conversational Search
Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction
How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts
Teach Multimodal LLMs to Comprehend Electrocardiographic Images
Knowledge Graph Enhanced Language Agents for Recommendation
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Counting Ability of Large Language Models and Impact of Tokenization
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Reflection-Bench: probing AI intelligence with reflection
PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
CoqPilot, a plugin for LLM-based generation of proofs
Measuring memorization through probabilistic discoverable extraction
Computational Bottlenecks of Training Small-scale Large Language Models
Mixture of Parrots: Experts improve memorization more than reasoning
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
A Survey of Small Language Models
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Plan$\times$RAG: Planning-guided Retrieval Augmented Generation
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Language Models And A Second Opinion Use Case: The Pocket Professional
Fast Best-of-N Decoding via Speculative Rejection
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
RARe: Retrieval Augmented Retrieval with In-Context Examples
Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction
Large Language Models Reflect the Ideology of their Creators
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
A Survey on Data Synthesis and Augmentation for Large Language Models
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Understanding Synthetic Context Extension via Retrieval Heads
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Attacking Vision-Language Computer Agents via Pop-ups
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
CLEAR: Character Unlearning in Textual and Visual Modalities
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Distinguishing Ignorance from Error in LLM Hallucinations
Learning and Unlearning of Fabricated Knowledge in Language Models
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
Accelerating Direct Preference Optimization with Prefix Sharing
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels
QTIP: Quantization with Trellises and Incoherence Processing
EMMA: End-to-End Multimodal Model for Autonomous Driving
SciPIP: An LLM-based Scientific Paper Idea Proposer
Zipfian Whitening
On Memorization of Large Language Models in Logical Reasoning
Stealing User Prompts from Mixture of Experts
Toxicity of the Commons: Curating Open-Source Pre-Training Data
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
SelfCodeAlign: Self-Alignment for Code Generation
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Nearest Neighbor Normalization Improves Multimodal Retrieval
Language Models can Self-Lengthen to Generate Long Texts
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts
Weight decay induces low-rank attention layers
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Toward Understanding In-context vs. In-weight Learning
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
AAAR-1.0: Assessing AI's Potential to Assist Research
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
Failure Modes of LLMs for Causal Reasoning on Narratives
Are Decoder-Only Large Language Models the Silver Bullet for Code Search?
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Physics in Next-token Prediction
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
GPT or BERT: why not both?
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Context Parallelism for Scalable Million-Token Inference
RAGViz: Diagnose and Visualize Retrieval-Augmented Generation
DynaSaur: Large Language Agents Beyond Predefined Actions
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Survey of Cultural Awareness in Language Models: Text and Beyond
LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Beyond Utility: Evaluating LLM as Recommender
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
Personalization of Large Language Models: A Survey
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
How Does Critical Batch Size Scale in Pre-training?
Scaling Optimal LR Across Token Horizons
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Not All Memories are Created Equal: Learning to Forget by Expiring
Inference Optimal VLMs Need Only One Visual Token but Larger Models
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Sample-Efficient Alignment for LLMs
LLaMo: Large Language Model-based Molecular Graph Assistant
Controlling Language and Diffusion Models by Transporting Activations
A Scalable Communication Protocol for Networks of Large Language Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval
Wave Network: An Ultra-Small Language Model
Model Equality Testing: Which Model Is This API Serving?
A linguistic analysis of undesirable outcomes in the era of generative AI
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Long Context RAG Performance of Large Language Models
LASER: Attention with Exponential Transformation
Photon: Federated LLM Pre-Training
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
Teaching Models to Improve on Tape
Evolving Alignment via Asymmetric Self-Play
Scaling LLM Inference with Optimized Sample Compute Allocation
Self-Consistency Preference Optimization
Tiny Transformers Excel at Sentence Compression
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Towards Reliable Alignment: Uncertainty-aware RLHF
Abrupt Learning in Transformers: A Case Study on Matrix Completion
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
O1 Replication Journey: A Strategic Progress Report -- Part 1
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Methods of improving LLM training stability
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Generalized Probabilistic Attention Mechanism in Transformers
Economic Anthropology in the Era of Generative Artificial Intelligence
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
MoDification: Mixture of Depths Made Easy
Speciesism in Natural Language Processing Research
Reducing the Transformer Architecture to a Minimum
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence
Hypothesis Testing the Circuit Hypothesis in LLMs
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding
Conformity in Large Language Models
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing
A Case for AI Consciousness: Language Agents and Global Workspace Theory
Local and Global Decoding in Text Generation
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Is Parameter Collision Hindering Continual Learning in LLMs?
Reverse Modeling in Large Language Models
On the Proper Treatment of Tokenization in Psycholinguistics
Post-edits Are Preferences Too
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
EmbedLLM: Learning Compact Representations of Large Language Models
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1
Mitigating Memorization In Language Models
House of Cards: Massive Weights in LLMs
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training
RisingBALLER: A player is a token, a match is a sentence, A path towards a foundational model for football players data analytics
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Self-Updatable Large Language Models with Parameter Integration
Are LLMs Aware that Some Questions are not Open-ended?
Vision Language Models See What You Want but not What You See
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Analyzing The Language of Visual Tokens
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Model merging with SVD to tie the Knots
Best Practices for Distilling Large Language Models into BERT for Web Search Ranking
Interpretable Language Modeling via Induction-head Ngram Models
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method
GUI Agents with Foundation Models: A Comprehensive Survey
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
DELIFT: Data Efficient Language model Instruction Fine Tuning
Aioli: A Unified Optimization Framework for Language Model Data Mixing
LBPE: Long-token-first Tokenization to Improve Large Language Models
Balancing Pipeline Parallelism with Vocabulary Parallelism
Fox-1 Technical Report
STAND-Guard: A Small Task-Adaptive Content Moderation Model
Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale
Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning
LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions
Scattered Forest Search: Smarter Code Space Exploration with LLMs
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking
ZipNN: Lossless Compression for AI Models
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Number Cookbook: Number Understanding of Language Models and How to Improve It
Mixtures of In-Context Learners
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models
AFlow: Automating Agentic Workflow Generation
Recycled Attention: Efficient inference for long-context language models
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Counterfactual Generation from Language Models
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge
Game-theoretic LLM: Agent Workflow for Negotiation Games
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
More Expressive Attention with Negative Weights
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering
Learning Code Preference via Synthetic Evolution
Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation
Scaling Laws for Precision
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Likelihood as a Performance Gauge for Retrieval-Augmented Generation
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Entropy Controllable Direct Preference Optimization
SecEncoder: Logs are All You Need in Security
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Toward Optimal Search and Retrieval for RAG
The Super Weight in Large Language Models
Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Towards Low-bit Communication for Tensor Parallel LLM Inference
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
Stronger Models are NOT Stronger Teachers for Instruction Tuning
Hardware and Software Platform Inference
Direct Preference Optimization Using Sparse Feature-Level Constraints
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
Can sparse autoencoders be used to decompose and interpret steering vectors?
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models
Large Language Models Can Self-Improve in Long-context Reasoning
Language Models as Causal Effect Generators
Model Stealing for Any Low-Rank Language Model
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs
XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL
Controllable Context Sensitivity and the Knob Behind It
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Pie: Pooling CPU Memory for LLM Inference
Cut Your Losses in Large-Vocabulary Language Models
LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?
Squeezed Attention: Accelerating Long Context Length LLM Inference
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment