Last active
November 15, 2024 18:51
-
-
Save masta-g3/8f7227397b1053b42e727bbd6abf1d2e to your computer and use it in GitHub Desktop.
Updated 2024-11-15
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cedille: A large autoregressive French language model | |
The Wisdom of Hindsight Makes Language Models Better Instruction Followers | |
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks | |
Query2doc: Query Expansion with Large Language Models | |
The Internal State of an LLM Knows When its Lying | |
Structured information extraction from complex scientific text with fine-tuned large language models | |
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models | |
Large Language Models Encode Clinical Knowledge | |
PoET: A generative model of protein families as sequences-of-sequences | |
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training | |
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services | |
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs | |
Modeling Protein Using Large-scale Pretrain Language Model | |
A Watermark for Large Language Models | |
GPT is becoming a Turing machine: Here are some ways to program it | |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | |
Large Language Models are Zero-Shot Reasoners | |
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models | |
How is ChatGPT's behavior changing over time? | |
Meta-Transformer: A Unified Framework for Multimodal Learning | |
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | |
Getting More out of Large Language Models for Proofs | |
Teaching Small Language Models to Reason | |
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | |
Learning to Retrieve In-Context Examples for Large Language Models | |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
Context-Aware Abbreviation Expansion Using Large Language Models | |
Focused Transformer: Contrastive Training for Context Scaling | |
Flash normalization: fast RMSNorm for LLMs | |
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | |
Long-range Language Modeling with Self-retrieval | |
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI | |
Towards Generalist Biomedical AI | |
Shortcut Learning of Large Language Models in Natural Language Understanding | |
Quantifying Memorization Across Neural Language Models | |
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models | |
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models | |
Copy Is All You Need | |
Automatic Chain of Thought Prompting in Large Language Models | |
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models | |
Decomposed Prompting: A Modular Approach for Solving Complex Tasks | |
Evaluating the Text-to-SQL Capabilities of Large Language Models | |
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models | |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
Are Emergent Abilities of Large Language Models a Mirage? | |
Enhancing Network Management Using Code Generated by Large Language Models | |
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks | |
ThinkSum: Probabilistic reasoning over sets using large language models | |
On the Tool Manipulation Capability of Open-source Large Language Models | |
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm | |
WavJourney: Compositional Audio Creation with Large Language Models | |
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course | |
Secrets of RLHF in Large Language Models Part I: PPO | |
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models | |
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning | |
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes | |
Challenges and Applications of Large Language Models | |
SPOT: Knowledge-Enhanced Language Representations for Information Extraction | |
Kosmos-2: Grounding Multimodal Large Language Models to the World | |
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference | |
SKILL: Structured Knowledge Infusion for Large Language Models | |
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models | |
Understanding Social Reasoning in Language Models with Language Models | |
The Science of Detecting LLM-Generated Texts | |
CausalLM is not optimal for in-context learning | |
Questioning the Survey Responses of Large Language Models | |
Extending Context Window of Large Language Models via Positional Interpolation | |
ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing | |
Probing Factually Grounded Content Transfer with Factual Ablation | |
Teach LLMs to Personalize -- An Approach inspired by Writing Education | |
Pre-Trained Large Language Models for Industrial Control | |
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | |
LongNet: Scaling Transformers to 1,000,000,000 Tokens | |
Self-Alignment with Instruction Backtranslation | |
Guiding Pretraining in Reinforcement Learning with Large Language Models | |
Large Language Models are Zero-Shot Rankers for Recommender Systems | |
Model evaluation for extreme risks | |
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks | |
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL | |
A Simple and Effective Pruning Approach for Large Language Models | |
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors | |
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback | |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates | |
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT | |
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | |
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | |
PromptChainer: Chaining Large Language Model Prompts through Visual Programming | |
PIPPA: A Partially Synthetic Conversational Dataset | |
Let's Verify Step by Step | |
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics | |
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts | |
Large Language Models Are Reasoning Teachers | |
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models | |
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence | |
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations | |
Connecting Neural Response measurements & Computational Models of language: a non-comprehensive guide | |
Accelerating LLM Inference with Staged Speculative Decoding | |
Large Language Models for Supply Chain Optimization | |
Do Large Language Models know what humans know? | |
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction | |
Faithful Chain-of-Thought Reasoning | |
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts | |
Superposition of many models into one | |
Learning to Model the World with Language | |
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | |
Unifying Large Language Models and Knowledge Graphs: A Roadmap | |
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | |
QLoRA: Efficient Finetuning of Quantized LLMs | |
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment | |
Co-Writing with Opinionated Language Models Affects Users' Views | |
Language models show human-like content effects on reasoning | |
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking | |
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code | |
OpenAGI: When LLM Meets Domain Experts | |
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies | |
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models | |
Beyond Generating Code: Evaluating GPT on a Data Visualization Course | |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | |
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition | |
LLM-Rec: Personalized Recommendation via Prompting Large Language Models | |
Studying Large Language Model Generalization with Influence Functions | |
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) | |
From Sparse to Soft Mixtures of Experts | |
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization | |
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation | |
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models | |
Large Language Model Guided Tree-of-Thought | |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | |
When Geometric Deep Learning Meets Pretrained Protein Language Models | |
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level | |
Language models are weak learners | |
How Many Demonstrations Do You Need for In-context Learning? | |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | |
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? | |
Gorilla: Large Language Model Connected with Massive APIs | |
Automatic Generation of Programming Exercises and Code Explanations using Large Language Models | |
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models | |
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models | |
WebArena: A Realistic Web Environment for Building Autonomous Agents | |
Language Models can Solve Computer Tasks | |
ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation? | |
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling | |
Invariant Language Modeling | |
Solving Quantitative Reasoning Problems with Language Models | |
Personality Traits in Large Language Models | |
Prompting Large Language Models with Speech Recognition Abilities | |
Selective Annotation Makes Language Models Better Few-Shot Learners | |
Using Captum to Explain Generative Language Models | |
Fine-Tuning Language Models with Just Forward Passes | |
In-context Autoencoder for Context Compression in a Large Language Model | |
Entity Projection via Machine Translation for Cross-Lingual NER | |
OctoPack: Instruction Tuning Code Large Language Models | |
AlpaGasus: Training A Better Alpaca with Fewer Data | |
Large Language Models Are Human-Level Prompt Engineers | |
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales | |
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction | |
WizardCoder: Empowering Code Large Language Models with Evol-Instruct | |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning | |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | |
Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach | |
Large Language Models Can Self-Improve | |
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks | |
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | |
More Agents Is All You Need | |
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models | |
Teaching Algorithmic Reasoning via In-context Learning | |
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | |
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs | |
The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python | |
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding | |
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | |
Automatic Evaluation of Attribution by Large Language Models | |
Generative Agents: Interactive Simulacra of Human Behavior | |
ALERT: Adapting Language Models to Reasoning Tasks | |
How does the pre-training objective affect what large language models learn about linguistic properties? | |
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | |
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought | |
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks | |
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality | |
FLIRT: Feedback Loop In-context Red Teaming | |
News Summarization and Evaluation in the Era of GPT-3 | |
Galactica: A Large Language Model for Science | |
Towards Reasoning in Large Language Models: A Survey | |
Chain-Of-Thought Prompting Under Streaming Batch: A Case Study | |
Shepherd: A Critic for Language Model Generation | |
Emergent autonomous scientific research capabilities of large language models | |
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language | |
Social Simulacra: Creating Populated Prototypes for Social Computing Systems | |
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | |
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs | |
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis | |
Universal and Transferable Adversarial Attacks on Aligned Language Models | |
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | |
Complexity-Based Prompting for Multi-Step Reasoning | |
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | |
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | |
Scaling TransNormer to 175 Billion Parameters | |
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | |
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation | |
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | |
Learning ASR pathways: A sparse multilingual ASR model | |
Stay on topic with Classifier-Free Guidance | |
Constitutional AI: Harmlessness from AI Feedback | |
Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis | |
Teaching Arithmetic to Small Transformers | |
Demystifying GPT Self-Repair for Code Generation | |
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education | |
Link-Context Learning for Multimodal LLMs | |
Large Language Models Perform Diagnostic Reasoning | |
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback | |
AgentBench: Evaluating LLMs as Agents | |
Xmodel-LM Technical Report | |
Simple synthetic data reduces sycophancy in large language models | |
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | |
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | |
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models | |
Re-visiting Automated Topic Model Evaluation with Large Language Models | |
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting | |
Adaptive Test Generation Using a Large Language Model | |
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning | |
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models | |
PaLM: Scaling Language Modeling with Pathways | |
Teaching Large Language Models to Self-Debug | |
Building Cooperative Embodied Agents Modularly with Large Language Models | |
Urdu text in natural scene images: a new dataset and preliminary text detection | |
LIMA: Less Is More for Alignment | |
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs | |
GPT-NER: Named Entity Recognition via Large Language Models | |
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge | |
Code as Policies: Language Model Programs for Embodied Control | |
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | |
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models | |
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models | |
Inspecting and Editing Knowledge Representations in Language Models | |
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents | |
Large language models effectively leverage document-level context for literary translation, but critical errors persist | |
Med-Flamingo: a Multimodal Medical Few-shot Learner | |
Jigsaw: Large Language Models meet Program Synthesis | |
Large Language Models Struggle to Learn Long-Tail Knowledge | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | |
Textbooks Are All You Need | |
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges | |
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | |
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | |
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | |
Three Bricks to Consolidate Watermarks for Large Language Models | |
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation | |
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | |
One-shot Machine Teaching: Cost Very Few Examples to Converge Faster | |
Theory of Mind May Have Spontaneously Emerged in Large Language Models | |
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | |
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | |
Tiny LVLM-eHub: Early Multimodal Experiments with Bard | |
Language Is Not All You Need: Aligning Perception with Language Models | |
Mind's Eye: Grounded Language Model Reasoning through Simulation | |
StarCoder: may the source be with you! | |
Self-Critique Prompting with Large Language Models for Inductive Instructions | |
PaLM 2 Technical Report | |
Repository-Level Prompt Generation for Large Language Models of Code | |
L-Eval: Instituting Standardized Evaluation for Long Context Language Models | |
Measuring and Narrowing the Compositionality Gap in Language Models | |
Differentially Private Fine-tuning of Language Models | |
A Latent Space Theory for Emergent Abilities in Large Language Models | |
Reflexion: Language Agents with Verbal Reinforcement Learning | |
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories | |
LEACE: Perfect linear concept erasure in closed form | |
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods | |
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models | |
Voyager: An Open-Ended Embodied Agent with Large Language Models | |
FinGPT: Open-Source Financial Large Language Models | |
Block Belief Propagation for Parameter Learning in Markov Random Fields | |
Lost in the Middle: How Language Models Use Long Contexts | |
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks | |
Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation | |
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | |
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | |
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding | |
The Hydra Effect: Emergent Self-repair in Language Model Computations | |
Educational data augmentation in physics education research using ChatGPT | |
PolyLM: An Open Source Polyglot Large Language Model | |
Towards Expert-Level Medical Question Answering with Large Language Models | |
Is GPT-4 a Good Data Analyst? | |
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | |
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions | |
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models | |
Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data | |
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond | |
ReAct: Synergizing Reasoning and Acting in Language Models | |
Augmenting Language Models with Long-Term Memory | |
BloombergGPT: A Large Language Model for Finance | |
A Systematic Evaluation of Large Language Models of Code | |
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models | |
Robot Task Planning and Situation Handling in Open Worlds | |
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences | |
Emergent Abilities of Large Language Models | |
Can Large Language Models design a Robot? | |
KoLA: Carefully Benchmarking World Knowledge of Large Language Models | |
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding | |
DarkBERT: A Language Model for the Dark Side of the Internet | |
Measuring Faithfulness in Chain-of-Thought Reasoning | |
Retentive Network: A Successor to Transformer for Large Language Models | |
Dissociating language and thought in large language models: a cognitive perspective | |
Large Language Models are Better Reasoners with Self-Verification | |
Can large language models reason about medical questions? | |
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective | |
ARB: Advanced Reasoning Benchmark for Large Language Models | |
Rethinking with Retrieval: Faithful Large Language Model Inference | |
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models | |
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning | |
Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning | |
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners | |
Large Language Models as Corporate Lobbyists | |
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework | |
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation | |
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models | |
Talking About Large Language Models | |
Platypus: Quick, Cheap, and Powerful Refinement of LLMs | |
Large Language Models Can Be Easily Distracted by Irrelevant Context | |
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | |
OpenICL: An Open-Source Framework for In-context Learning | |
Emergence of Maps in the Memories of Blind Navigation Agents | |
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers | |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining | |
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention | |
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation | |
Learning to Reason and Memorize with Self-Notes | |
ChemCrow: Augmenting large-language models with chemistry tools | |
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor | |
Learning to Compress Prompts with Gist Tokens | |
Unlimiformer: Long-Range Transformers with Unlimited Length Input | |
StructGPT: A General Framework for Large Language Model to Reason over Structured Data | |
ChatGPT: Applications, Opportunities, and Threats | |
Memory Augmented Large Language Models are Computationally Universal | |
PaLM-E: An Embodied Multimodal Language Model | |
M2T: Masking Transformers Twice for Faster Decoding | |
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond | |
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models | |
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | |
Auditing large language models: a three-layered approach | |
Language models in molecular discovery | |
Offsite-Tuning: Transfer Learning without Full Model | |
MusicLM: Generating Music From Text | |
Context-faithful Prompting for Large Language Models | |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models | |
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models | |
GPTutor: a ChatGPT-powered programming tool for code explanation | |
Larger language models do in-context learning differently | |
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans | |
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker | |
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge | |
Multimodal Chain-of-Thought Reasoning in Language Models | |
Recitation-Augmented Language Models | |
Hyena Hierarchy: Towards Larger Convolutional Language Models | |
Eight Things to Know about Large Language Models | |
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | |
A Survey on Model Compression for Large Language Models | |
Active Retrieval Augmented Generation | |
Toolformer: Language Models Can Teach Themselves to Use Tools | |
Evaluating Verifiability in Generative Search Engines | |
Augmented Language Models: a Survey | |
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness | |
Giraffe: Adventures in Expanding Context Lengths in LLMs | |
LLM As DBA | |
Scaling Transformer to 1M tokens and beyond with RMT | |
TidyBot: Personalized Robot Assistance with Large Language Models | |
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering | |
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability | |
Active Prompting with Chain-of-Thought for Large Language Models | |
A Categorical Archive of ChatGPT Failures | |
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity | |
Better Language Models of Code through Self-Improvement | |
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents | |
The Capacity for Moral Self-Correction in Large Language Models | |
Poisoning Language Models During Instruction Tuning | |
Prompt2Model: Generating Deployable Models from Natural Language Instructions | |
Data Selection for Language Models via Importance Resampling | |
Enabling Conversational Interaction with Mobile UI using Large Language Models | |
Evidence of Meaning in Language Models Trained on Programs | |
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | |
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models | |
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark | |
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models | |
Symbol tuning improves in-context learning in language models | |
REPLUG: Retrieval-Augmented Black-Box Language Models | |
Why do Nearest Neighbor Language Models Work? | |
Prismer: A Vision-Language Model with An Ensemble of Experts | |
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models | |
Self-evolving Agents with reflective and memory-augmented abilities | |
CALYPSO: LLMs as Dungeon Masters' Assistants | |
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice | |
Code Llama: Open Foundation Models for Code | |
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models | |
Faithful to Whom? Questioning Interpretability Measures in NLP | |
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis | |
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts | |
How Good Are Large Language Models at Out-of-Distribution Detection? | |
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | |
Can Large Language Models Find And Fix Vulnerable Software? | |
Large Language Models for Software Engineering: A Systematic Literature Review | |
Informed Named Entity Recognition Decoding for Generative Language Models | |
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities | |
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models | |
Better Zero-Shot Reasoning with Role-Play Prompting | |
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | |
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | |
A Survey on Large Language Model based Autonomous Agents | |
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | |
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models | |
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model | |
Evaluating ChatGPT and GPT-4 for Visual Programming | |
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models | |
D4: Improving LLM Pretraining via Document De-Duplication and Diversification | |
Cabrita: closing the gap for foreign languages | |
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems | |
ProAgent: Building Proactive Cooperative AI with Large Language Models | |
Instruction Position Matters in Sequence Generation with Large Language Models | |
Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction | |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation | |
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models | |
Large Language Model as Autonomous Decision Maker | |
Large Language Models as Superpositions of Cultural Perspectives | |
Activation Addition: Steering Language Models Without Optimization | |
Enhancing Recommender Systems with Large Language Model Reasoning Graphs | |
GPTEval: A Survey on Assessments of ChatGPT and GPT-4 | |
An Empirical Study on Challenging Math Problem Solving with GPT-4 | |
Forward-Backward Reasoning in Large Language Models for Verification | |
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI | |
Dynamic Planning with a LLM | |
"Guinea Pig Trials" Utilizing GPT: A Novel Smart Agent-Based Modeling Approach for Studying Firm Competition and Collusion | |
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models | |
Bridging the Gap: Deciphering Tabular Data Using Large Language Model | |
The Pile: An 800GB Dataset of Diverse Text for Language Modeling | |
Prompting Is Programming: A Query Language for Large Language Models | |
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models | |
Knowledge Graph Prompting for Multi-Document Question Answering | |
GPT detectors are biased against non-native English writers | |
GradientCoin: A Peer-to-Peer Decentralized Large Language Models | |
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models | |
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning | |
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models | |
Time Travel in LLMs: Tracing Data Contamination in Large Language Models | |
Can Language Models Learn to Listen? | |
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence | |
Towards an Understanding of Large Language Models in Software Engineering Tasks | |
YaRN: Efficient Context Window Extension of Large Language Models | |
An Examination of the Compositionality of Large Generative Vision-Language Models | |
Company Similarity using Large Language Models | |
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs | |
Instruction Tuning for Large Language Models: A Survey | |
Language to Rewards for Robotic Skill Synthesis | |
Is There Any Social Principle for LLM-Based Agents? | |
A Study on Robustness and Reliability of Large Language Model Code Generation | |
Leveraging Large Language Models for Pre-trained Recommender Systems | |
Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models | |
LLaSM: Large Language and Speech Model | |
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation | |
DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue | |
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt | |
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA | |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework | |
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | |
Pretraining on the Test Set Is All You Need | |
The AI Revolution in Education: Will AI Replace or Assist Teachers in Higher Education? | |
Reinforced Self-Training (ReST) for Language Modeling | |
Fast Inference from Transformers via Speculative Decoding | |
LoRA: Low-Rank Adaptation of Large Language Models | |
Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models | |
AI Deception: A Survey of Examples, Risks, and Potential Solutions | |
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback | |
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects | |
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation | |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | |
Blockwise Parallel Decoding for Deep Autoregressive Models | |
Assigning AI: Seven Approaches for Students, with Prompts | |
Conformal Prediction with Large Language Models for Multi-Choice Question Answering | |
Attention: Marginal Probability is All You Need? | |
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test | |
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time | |
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records | |
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following | |
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models | |
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models | |
XGen-7B Technical Report | |
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models | |
Can Programming Languages Boost Each Other via Instruction Tuning? | |
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants | |
Efficient RLHF: Reducing the Memory Usage of PPO | |
Universal Self-adaptive Prompting | |
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | |
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior | |
One Wide Feedforward is All You Need | |
Better Zero-Shot Reasoning with Self-Adaptive Prompting | |
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge | |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models | |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | |
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models | |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | |
SoTaNa: The Open-Source Software Development Assistant | |
GPT Can Solve Mathematical Problems Without a Calculator | |
Physically Grounded Vision-Language Models for Robotic Manipulation | |
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios | |
FLM-101B: An Open LLM and How to Train It with $100K Budget | |
LaMDA: Language Models for Dialog Applications | |
LMDX: Language Model-based Document Information Extraction and Localization | |
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers | |
Do Multilingual Language Models Think Better in English? | |
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute | |
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild | |
Textbooks Are All You Need II: phi-1.5 technical report | |
Replacing softmax with ReLU in Vision Transformers | |
Investigating Answerability of LLMs for Long-Form Question Answering | |
Vector Search with OpenAI Embeddings: Lucene Is All You Need | |
The Rise and Potential of Large Language Model Based Agents: A Survey | |
Cure the headache of Transformers via Collinear Constrained Attention | |
Uncovering mesa-optimization algorithms in Transformers | |
Large Language Models for Compiler Optimization | |
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages | |
Chain-of-Verification Reduces Hallucination in Large Language Models | |
AstroLLaMA: Towards Specialized Foundation Models in Astronomy | |
Compositional Foundation Models for Hierarchical Planning | |
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents | |
Sparse Autoencoders Find Highly Interpretable Features in Language Models | |
DreamLLM: Synergistic Multimodal Comprehension and Creation | |
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) | |
Improving Language Models with Advantage-based Offline Policy Gradients | |
Improving Factuality and Reasoning in Language Models through Multiagent Debate | |
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting | |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model | |
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models | |
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? | |
Multimodal Foundation Models: From Specialists to General-Purpose Assistants | |
Boolformer: Symbolic Regression of Logic Functions with Transformers | |
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? | |
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models | |
TP-Aware Dequantization | |
LASER: LLM Agent with State-Space Exploration for Web Navigation | |
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models | |
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs | |
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | |
Baichuan 2: Open Large-scale Language Models | |
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer | |
Efficient Benchmarking (of Language Models) | |
Context is Environment | |
Analyzing Transformer Dynamics as Movement through Embedding Space | |
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs | |
RMT: Retentive Networks Meet Vision Transformers | |
Stack-and-Delay: a new codebook pattern for music generation | |
Neurons in Large Language Models: Dead, N-gram, Positional | |
Large Language Model for Science: A Study on P vs. NP | |
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | |
Data Augmentation for Spoken Language Understanding via Pretrained Language Models | |
Petals: Collaborative Inference and Fine-tuning of Large Models | |
Scaling Laws for Sparsely-Connected Foundation Models | |
Kosmos-2.5: A Multimodal Literate Model | |
PDFTriage: Question Answering over Long, Structured Documents | |
Statistical Rejection Sampling Improves Preference Optimization | |
Stabilizing RLHF through Advantage Model and Selective Rehearsal | |
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset | |
Leveraging Contextual Information for Effective Entity Salience Detection | |
NExT-GPT: Any-to-Any Multimodal LLM | |
Are Emergent Abilities in Large Language Models just In-Context Learning? | |
RACE: Large-scale ReAding Comprehension Dataset From Examinations | |
Large-Scale Automatic Audiobook Creation | |
Recovering from Privacy-Preserving Masking with Large Language Models | |
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | |
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations | |
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology | |
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning | |
RAIN: Your Language Models Can Align Themselves without Finetuning | |
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale | |
Hypothesis Search: Inductive Reasoning with Language Models | |
Agents: An Open-source Framework for Autonomous Language Agents | |
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models | |
Gated recurrent neural networks discover attention | |
Contrastive Decoding Improves Reasoning in Large Language Models | |
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts | |
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning | |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | |
Adapting Large Language Models via Reading Comprehension | |
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention | |
MindAgent: Emergent Gaming Interaction | |
Graph Neural Prompting with Large Language Models | |
Sparks of Artificial General Intelligence: Early experiments with GPT-4 | |
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | |
Efficient Post-training Quantization with FP8 Formats | |
Taken out of context: On measuring situational awareness in LLMs | |
Jointly Training Large Autoregressive Multimodal Models | |
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" | |
Curriculum Learning with Adam: The Devil Is in the Wrong Details | |
OWL: A Large Language Model for IT Operations | |
Faith and Fate: Limits of Transformers on Compositionality | |
CodePlan: Repository-level Coding using LLMs and Planning | |
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | |
Efficient Memory Management for Large Language Model Serving with PagedAttention | |
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models | |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | |
SCREWS: A Modular Framework for Reasoning with Revisions | |
Transformer models: an introduction and catalog | |
Small-scale proxies for large-scale Transformer training instabilities | |
Effective Long-Context Scaling of Foundation Models | |
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | |
Qwen Technical Report | |
Attention Approximates Sparse Distributed Memory | |
Calibrating LLM-Based Evaluator | |
Ambiguity-Aware In-Context Learning with Large Language Models | |
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond | |
Vision Transformers Need Registers | |
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic | |
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction | |
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models | |
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | |
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition | |
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval | |
Language Modeling Is Compression | |
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models | |
Aligning Large Multimodal Models with Factually Augmented RLHF | |
Large Language Models as Optimizers | |
SlimPajama-DC: Understanding Data Combinations for LLM Training | |
Finite Scalar Quantization: VQ-VAE Made Simple | |
Physics of Language Models: Part 3.2, Knowledge Manipulation | |
Efficient Streaming Language Models with Attention Sinks | |
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | |
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution | |
LLM-grounded Video Diffusion Models | |
Enable Language Models to Implicitly Learn Self-Improvement From Data | |
Emergent Analogical Reasoning in Large Language Models | |
RA-DIT: Retrieval-Augmented Dual Instruction Tuning | |
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation | |
Large Language Models Cannot Self-Correct Reasoning Yet | |
SmartPlay : A Benchmark for LLMs as Intelligent Agents | |
Language Models Represent Space and Time | |
Retrieval meets Long Context Large Language Models | |
Borges and AI | |
Can large language models provide useful feedback on research papers? A large-scale empirical analysis | |
Ring Attention with Blockwise Transformers for Near-Infinite Context | |
Can Language Models be Instructed to Protect Personal Information? | |
QuIP: 2-Bit Quantization of Large Language Models With Guarantees | |
Who's Harry Potter? Approximate Unlearning in LLMs | |
Low-Resource Languages Jailbreak GPT-4 | |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | |
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning | |
EcoAssistant: Using LLM Assistant More Affordably and Accurately | |
How FaR Are Large Language Models From Agents with Theory-of-Mind? | |
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | |
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation | |
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | |
HeaP: Hierarchical Policies for Web Actions using LLMs | |
A Long Way to Go: Investigating Length Correlations in RLHF | |
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation | |
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors | |
Think before you speak: Training Language Models With Pause Tokens | |
Mistral 7B | |
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? | |
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity | |
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading | |
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation | |
Large Language Models can Learn Rules | |
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | |
Large Language Models Are Zero-Shot Time Series Forecasters | |
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | |
Learning Interactive Real-World Simulators | |
FireAct: Toward Language Agent Fine-tuning | |
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining | |
Text Embeddings Reveal (Almost) As Much As Text | |
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation | |
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics | |
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models | |
Lemur: Harmonizing Natural Language and Code for Language Agents | |
LangNav: Language as a Perceptual Representation for Navigation | |
The LAMBADA dataset: Word prediction requiring a broad discourse context | |
Octopus: Embodied Vision-Language Programmer from Environmental Feedback | |
Toward Joint Language Modeling for Speech Units and Text | |
MemGPT: Towards LLMs as Operating Systems | |
A Zero-Shot Language Agent for Computer Control with Structured Reflection | |
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | |
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training | |
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules | |
The Consensus Game: Language Model Generation via Equilibrium Search | |
Table-GPT: Table-tuned GPT for Diverse Table Tasks | |
PaLI-3 Vision Language Models: Smaller, Faster, Stronger | |
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens | |
Arbitrary Length Generalization for Addition | |
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation | |
Deep Learning Scaling is Predictable, Empirically | |
MLQA: Evaluating Cross-lingual Extractive Question Answering | |
OpenAssistant Conversations -- Democratizing Large Language Model Alignment | |
Intersectional Bias in Hate Speech and Abusive Language Datasets | |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | |
Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning | |
AI Ethics Issues in Real World: Evidence from AI Incident Database | |
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models | |
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT | |
Measuring Mathematical Problem Solving With the MATH Dataset | |
Can Machines Learn Morality? The Delphi Experiment | |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | |
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts | |
AndroidEnv: A Reinforcement Learning Platform for Android | |
Demoting Racial Bias in Hate Speech Detection | |
Social Bias Frames: Reasoning about Social and Power Implications of Language | |
Characterising Bias in Compressed Models | |
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes | |
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | |
Towards Robust Toxic Content Classification | |
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety | |
Towards Continual Knowledge Learning of Language Models | |
The Pushshift Reddit Dataset | |
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | |
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation | |
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? | |
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling | |
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack | |
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems | |
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus | |
One Epoch Is All You Need | |
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading | |
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango | |
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System | |
Plug and Play Language Models: A Simple Approach to Controlled Text Generation | |
NewsQA: A Machine Comprehension Dataset | |
AmbiPun: Generating Humorous Puns with Ambiguous Context | |
Deal or No Deal? End-to-End Learning for Negotiation Dialogues | |
Competition-Level Code Generation with AlphaCode | |
STaR: Bootstrapping Reasoning With Reasoning | |
Efficient Neural Architecture Search via Parameter Sharing | |
Recursively Summarizing Books with Human Feedback | |
Habitat: A Platform for Embodied AI Research | |
Generate & Rank: A Multi-task Framework for Math Word Problems | |
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity | |
Mitigating Statistical Bias within Differentially Private Synthetic Data | |
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | |
RecGPT: Generative Pre-training for Text-based Recommendation | |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | |
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models | |
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks | |
Controlling Style in Generated Dialogue | |
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | |
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search | |
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation | |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | |
Societal Biases in Language Generation: Progress and Challenges | |
Counterfactual Fairness in Text Classification through Robustness | |
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions | |
Deep Double Descent: Where Bigger Models and More Data Hurt | |
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations | |
InCoder: A Generative Model for Code Infilling and Synthesis | |
Back to the Future: On Potential Histories in NLP | |
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization | |
Sharp Minima Can Generalize For Deep Nets | |
Self-attention Does Not Need $O(n^2)$ Memory | |
Measuring the Carbon Intensity of AI in Cloud Instances | |
SocialIQA: Commonsense Reasoning about Social Interactions | |
Generating Long Sequences with Sparse Transformers | |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | |
QAmeleon: Multilingual QA with Only 5 Examples | |
CTRL: A Conditional Transformer Language Model for Controllable Generation | |
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models | |
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models | |
Impact of Pretraining Term Frequencies on Few-Shot Reasoning | |
Is neural language acquisition similar to natural? A chronological probing study | |
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent | |
Buffer Overflow in Mixture of Experts | |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | |
Bag of Tricks for Efficient Text Classification | |
Automatic Detection of Machine Generated Text: A Critical Survey | |
Adversarial Training for Large Neural Language Models | |
Diffsound: Discrete Diffusion Model for Text-to-sound Generation | |
TALM: Tool Augmented Language Models | |
Training Language Models with Language Feedback | |
Toxicity in Multilingual Machine Translation at Scale | |
PEER: A Collaborative Language Model | |
On the Multilingual Capabilities of Very Large-Scale English Language Models | |
LLaMA: Open and Efficient Foundation Language Models | |
SECure: A Social and Environmental Certificate for AI Systems | |
Gaussian Error Linear Units (GELUs) | |
RoFormer: Enhanced Transformer with Rotary Position Embedding | |
Measuring Massive Multitask Language Understanding | |
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension | |
To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making | |
Leveraging QA Datasets to Improve Generative Data Augmentation | |
Decoupled Weight Decay Regularization | |
A Distributional Approach to Controlled Text Generation | |
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering | |
The Turking Test: Can Language Models Understand Instructions? | |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | |
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | |
Language Models (Mostly) Know What They Know | |
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned | |
Towards Understanding and Mitigating Social Biases in Language Models | |
Discovering and Categorising Language Biases in Reddit | |
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation | |
Training Verifiers to Solve Math Word Problems | |
The Curse of Recursion: Training on Generated Data Makes Models Forget | |
Compositional Semantic Parsing with Large Language Models | |
Transforming Question Answering Datasets Into Natural Language Inference Datasets | |
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets | |
The Values Encoded in Machine Learning Research | |
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning | |
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | |
Ethical and social risks of harm from Language Models | |
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems | |
Understanding HTML with Large Language Models | |
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning | |
AudioLM: a Language Modeling Approach to Audio Generation | |
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding | |
Behavior Cloned Transformers are Neurosymbolic Reasoners | |
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review | |
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models | |
Thou shalt not hate: Countering Online Hate Speech | |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) | |
Participation is not a Design Fix for Machine Learning | |
Retrieval Augmentation Reduces Hallucination in Conversation | |
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize | |
How Many Data Samples is an Additional Instruction Worth? | |
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | |
Crosslingual Generalization through Multitask Finetuning | |
The Curious Case of Neural Text Degeneration | |
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction | |
VinaLLaMA: LLaMA-based Vietnamese Foundation Model | |
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference | |
Evaluating the Social Impact of Generative AI Systems in Systems and Society | |
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | |
Towards A Rigorous Science of Interpretable Machine Learning | |
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | |
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender | |
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models | |
Defending Against Neural Fake News | |
Analyzing Dynamic Adversarial Training Data in the Limit | |
Criticality in Formal Languages and Statistical Physics | |
Generating Wikipedia by Summarizing Long Sequences | |
Gender Bias in Contextualized Word Embeddings | |
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | |
Deep Generative Dual Memory Network for Continual Learning | |
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | |
Persistent Anti-Muslim Bias in Large Language Models | |
Mirages: On Anthropomorphism in Dialogue Systems | |
Deep Learning for Symbolic Mathematics | |
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | |
A Survey On Universal Adversarial Attack | |
Atlas: Few-shot Learning with Retrieval Augmented Language Models | |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | |
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning | |
A framework for the extraction of Deep Neural Networks by leveraging public data | |
Recipes for building an open-domain chatbot | |
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent | |
Measuring the Effects of Data Parallelism on Neural Network Training | |
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports | |
Kosmos-G: Generating Images in Context with Multimodal Large Language Models | |
X-SQL: reinforce schema representation with context | |
Constructing Datasets for Multi-hop Reading Comprehension Across Documents | |
FastText.zip: Compressing text classification models | |
The State and Fate of Linguistic Diversity and Inclusion in the NLP World | |
A General Language Assistant as a Laboratory for Alignment | |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | |
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly | |
Transformer tricks: Precomputing the first layer | |
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms | |
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech | |
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model | |
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | |
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection | |
Deep Learning Based Text Classification: A Comprehensive Review | |
Automated Hate Speech Detection and the Problem of Offensive Language | |
Multi-Dimensional Gender Bias Classification | |
Extracting Training Data from Large Language Models | |
ProsocialDialog: A Prosocial Backbone for Conversational Agents | |
Cross-Task Generalization via Natural Language Crowdsourcing Instructions | |
SPLADE-v3: New baselines for SPLADE | |
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection | |
FlowQA: Grasping Flow in History for Conversational Machine Comprehension | |
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey | |
Improving alignment of dialogue agents via targeted human judgements | |
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing | |
Explanation in Artificial Intelligence: Insights from the Social Sciences | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | |
Revealing Persona Biases in Dialogue Systems | |
GeDi: Generative Discriminator Guided Sequence Generation | |
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech | |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
UL2: Unifying Language Learning Paradigms | |
Self-Instruct: Aligning Language Models with Self-Generated Instructions | |
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings | |
Does Gender Matter? Towards Fairness in Dialogue Systems | |
Energy and Policy Considerations for Deep Learning in NLP | |
Tools Fail: Detecting Silent Errors in Faulty Tools | |
The False Promise of Imitating Proprietary LLMs | |
Directional Bias Amplification | |
Hierarchical Text-Conditional Image Generation with CLIP Latents | |
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection | |
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | |
Task-aware Retrieval with Instructions | |
Do Prompt-Based Models Really Understand the Meaning of their Prompts? | |
Reading Wikipedia to Answer Open-Domain Questions | |
Supervising Model Attention with Human Explanations for Robust Natural Language Inference | |
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis | |
Latent Retrieval for Weakly Supervised Open Domain Question Answering | |
Teaching language models to support answers with verified quotes | |
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension | |
MasakhaNER: Named Entity Recognition for African Languages | |
Predicting the Type and Target of Offensive Posts in Social Media | |
Learning to Model Editing Processes | |
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | |
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering | |
Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles | |
Quantifying the Carbon Emissions of Machine Learning | |
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping | |
Chasing Carbon: The Elusive Environmental Footprint of Computing | |
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion | |
Distilling Reasoning Capabilities into Smaller Language Models | |
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning | |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | |
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks | |
WebGPT: Browser-assisted question-answering with human feedback | |
Making Large Language Models Better Reasoners with Step-Aware Verifier | |
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books | |
SGPT: GPT Sentence Embeddings for Semantic Search | |
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models | |
Building a Conversational Agent Overnight with Dialogue Self-Play | |
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks | |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets | |
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection | |
Neural Machine Translation of Rare Words with Subword Units | |
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection | |
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation | |
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models | |
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | |
Know What You Don't Know: Unanswerable Questions for SQuAD | |
Longformer: The Long-Document Transformer | |
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus | |
A Constructive Prediction of the Generalization Error Across Scales | |
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases | |
KERMIT: Generative Insertion-Based Modeling for Sequences | |
mGPT: Few-Shot Learners Go Multilingual | |
The Natural Language Decathlon: Multitask Learning as Question Answering | |
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents | |
A Survey of Race, Racism, and Anti-Racism in NLP | |
Unraveling the Hidden Environmental Impacts of AI Solutions for Environment | |
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding | |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | |
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering | |
Hyperbolic Image-Text Representations | |
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey | |
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models | |
Pretraining Language Models with Human Preferences | |
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English | |
MTEB: Massive Text Embedding Benchmark | |
Interscript: A dataset for interactive learning of scripts through error feedback | |
Looped Transformers as Programmable Computers | |
Inner Monologue: Embodied Reasoning through Planning with Language Models | |
No Language Left Behind: Scaling Human-Centered Machine Translation | |
Collaborative Storytelling with Large-scale Neural Language Models | |
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge | |
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation | |
Recipes for Safety in Open-domain Chatbots | |
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | |
Pre-Trained Language Models for Interactive Decision-Making | |
Can Large Language Models Really Improve by Self-critiquing Their Own Plans? | |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | |
Formal Algorithms for Transformers | |
An Emulator for Fine-Tuning Large Language Models using Small Language Models | |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | |
Democratizing Reasoning Ability: Tailored Learning from Large Language Model | |
HellaSwag: Can a Machine Really Finish Your Sentence? | |
Teaching Language Models to Self-Improve through Interactive Demonstrations | |
Ranking LLM-Generated Loop Invariants for Program Verification | |
Approximating Two-Layer Feedforward Networks for Efficient Transformers | |
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets | |
When can transformers reason with abstract symbols? | |
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models | |
Language Models are Few-shot Multilingual Learners | |
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP | |
AutoMix: Automatically Mixing Language Models | |
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models | |
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | |
Pre-trained Summarization Distillation | |
TEQ: Trainable Equivalent Transformation for Quantization of LLMs | |
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning | |
Improving Large Language Model Fine-tuning for Solving Math Problems | |
Language Models are General-Purpose Interfaces | |
Llemma: An Open Language Model For Mathematics | |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | |
Gender Bias in Machine Translation | |
Towards a Human-like Open-Domain Chatbot | |
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation | |
A Network-based End-to-End Trainable Task-oriented Dialogue System | |
Safe RLHF: Safe Reinforcement Learning from Human Feedback | |
Cloze-driven Pretraining of Self-attention Networks | |
Universal Language Model Fine-tuning for Text Classification | |
OPT: Open Pre-trained Transformer Language Models | |
Towards Zero-Label Language Learning | |
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems | |
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | |
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models | |
Fine-tuned Language Models are Continual Learners | |
3D-GPT: Procedural 3D Modeling with Large Language Models | |
PAL: Program-aided Language Models | |
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | |
Large Language Models for Software Engineering: Survey and Open Problems | |
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots | |
Self-critiquing models for assisting human evaluators | |
Towards Understanding Sycophancy in Language Models | |
SALMONN: Towards Generic Hearing Abilities for Large Language Models | |
Finetuned Language Models Are Zero-Shot Learners | |
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them | |
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search | |
Generating Sequences by Learning to Self-Correct | |
The Depth-to-Width Interplay in Self-Attention | |
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning | |
Internet-augmented language models through few-shot prompting for open-domain question answering | |
GLM-130B: An Open Bilingual Pre-trained Model | |
Three scenarios for continual learning | |
Eureka: Human-Level Reward Design via Coding Large Language Models | |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model | |
An Explanation of In-context Learning as Implicit Bayesian Inference | |
AgentTuning: Enabling Generalized Agent Abilities for LLMs | |
Snapshot Ensembles: Train 1, get M for free | |
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model | |
On the Planning Abilities of Large Language Models -- A Critical Investigation | |
Efficient Estimation of Word Representations in Vector Space | |
Visualizing the Loss Landscape of Neural Nets | |
Contrastive Preference Learning: Learning from Human Feedback without RL | |
High-Resolution Image Synthesis with Latent Diffusion Models | |
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents | |
H2O Open Ecosystem for State-of-the-art Large Language Models | |
Calibrate Before Use: Improving Few-Shot Performance of Language Models | |
All-in-One Image-Grounded Conversational Agents | |
Interactive Task Planning with Language Models | |
Can AI-Generated Text be Reliably Detected? | |
BitNet: Scaling 1-bit Transformers for Large Language Models | |
Scaling Laws for Neural Language Models | |
Self-Refine: Iterative Refinement with Self-Feedback | |
Adversarial Environment Generation for Learning to Navigate the Web | |
Cross-Lingual Language Model Meta-Pretraining | |
Creative Robot Tool Use with Large Language Models | |
Simple and Effective Multi-Paragraph Reading Comprehension | |
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | |
VeRA: Vector-based Random Matrix Adaptation | |
Open-Ended Learning Leads to Generally Capable Agents | |
Exploring the Boundaries of GPT-4 in Radiology | |
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs | |
High-Dimensional Continuous Control Using Generalized Advantage Estimation | |
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | |
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion | |
Eliciting Human Preferences with Language Models | |
One-Shot Learning from a Demonstration with Hierarchical Latent Language | |
OpenAgents: An Open Platform for Language Agents in the Wild | |
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation | |
Specific versus General Principles for Constitutional AI | |
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality | |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
Task2Vec: Task Embedding for Meta-Learning | |
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams | |
Tuna: Instruction Tuning using Feedback from Large Language Models | |
In-Context Pretraining: Language Modeling Beyond Document Boundaries | |
Self-Consistency Improves Chain of Thought Reasoning in Language Models | |
Transcending Scaling Laws with 0.1% Extra Compute | |
InstructExcel: A Benchmark for Natural Language Instruction in Excel | |
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing | |
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning | |
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets | |
Understanding Retrieval Augmentation for Long-Form Question Answering | |
A Neural Conversational Model | |
Exploring the Limits of Language Modeling | |
Scaling Instruction-Finetuned Language Models | |
Learning Performance-Improving Code Edits | |
Training Compute-Optimal Large Language Models | |
Instruction Tuning with GPT-4 | |
Holistic Evaluation of Language Models | |
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | |
Large Language Models as Analogical Reasoners | |
Negative Training for Neural Dialogue Response Generation | |
On the Opportunities and Risks of Foundation Models | |
Dissecting In-Context Learning of Translations in GPTs | |
Carbon Emissions and Large Neural Network Training | |
Faithful Reasoning Using Large Language Models | |
Detecting Pretraining Data from Large Language Models | |
Motif: Intrinsic Motivation from Artificial Intelligence Feedback | |
Unified Language Model Pre-training for Natural Language Understanding and Generation | |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | |
Predictability and Surprise in Large Generative Models | |
Alignment of Language Agents | |
Zephyr: Direct Distillation of LM Alignment | |
Binding Language Models in Symbolic Languages | |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | |
The Evolved Transformer | |
Detecting Hate Speech with GPT-3 | |
Learning to summarize from human feedback | |
Efficient Large Scale Language Modeling with Mixtures of Experts | |
Jailbreaking Black Box Large Language Models in Twenty Queries | |
How do Language Models Bind Entities in Context? | |
Program Synthesis with Large Language Models | |
Challenges in Detoxifying Language Models | |
A Deep Reinforced Model for Abstractive Summarization | |
Moral Foundations of Large Language Models | |
Training Production Language Models without Memorizing User Data | |
A Deep Reinforcement Learning Chatbot | |
RT-1: Robotics Transformer for Real-World Control at Scale | |
Entity Tracking in Language Models | |
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval | |
Controlled Decoding from Language Models | |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models | |
FP8-LM: Training FP8 Large Language Models | |
The Perils & Promises of Fact-checking with Large Language Models | |
Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)? | |
Unsolved Problems in ML Safety | |
Woodpecker: Hallucination Correction for Multimodal Large Language Models | |
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications | |
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time | |
Data-Centric Financial Large Language Models | |
CodeFusion: A Pre-trained Diffusion Model for Code Generation | |
TRAMS: Training-free Memory Selection for Long-range Language Modeling | |
Personas as a Way to Model Truthfulness in Language Models | |
PockEngine: Sparse and Efficient Fine-tuning in a Pocket | |
LLM-FP4: 4-Bit Floating-Point Quantized Transformers | |
CLEX: Continuous Length Extrapolation for Large Language Models | |
ALCUNA: Large Language Models Meet New Knowledge | |
JudgeLM: Fine-tuned Large Language Models are Scalable Judges | |
Large Language Models as Generalizable Policies for Embodied Tasks | |
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers | |
ControlLLM: Augment Language Models with Tools by Searching on Graphs | |
Linear Representations of Sentiment in Large Language Models | |
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B | |
The Generative AI Paradox: "What It Can Create, It May Not Understand" | |
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | |
MM-VID: Advancing Video Understanding with GPT-4V(ision) | |
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation | |
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | |
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | |
ChipNeMo: Domain-Adapted LLMs for Chip Design | |
What's In My Big Data? | |
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve | |
Idempotent Generative Network | |
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning | |
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | |
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | |
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? | |
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | |
NEFTune: Noisy Embeddings Improve Instruction Finetuning | |
The Impact of Depth and Width on Transformer Language Model Generalization | |
FlashDecoding++: Faster Large Language Model Inference on GPUs | |
Skywork: A More Open Bilingual Foundation Model | |
GRIM: GRaph-based Interactive narrative visualization for gaMes | |
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery | |
Does GPT-4 Pass the Turing Test? | |
Text Rendering Strategies for Pixel Language Models | |
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling | |
Learning From Mistakes Makes LLM Better Reasoner | |
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning | |
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation | |
Ultra-Long Sequence Distributed Transformer | |
Ziya2: Data-centric Learning is All LLMs Need | |
GLaMM: Pixel Grounding Large Multimodal Model | |
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration | |
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | |
Unveiling Safety Vulnerabilities of Large Language Models | |
Prompt Cache: Modular Attention Reuse for Low-Latency Inference | |
Levels of AGI: Operationalizing Progress on the Path to AGI | |
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model | |
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning | |
Co-training and Co-distillation for Quality Improvement and Compression of Language Models | |
CogVLM: Visual Expert for Pretrained Language Models | |
Tailoring Self-Rationalizers with Multi-Reward Distillation | |
NExT-Chat: An LMM for Chat, Detection and Segmentation | |
The Efficiency Misnomer | |
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion | |
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs | |
Training Dynamics of Contextual N-Grams in Language Models | |
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents | |
Large Language Models Understand and Can be Enhanced by Emotional Stimuli | |
Gzip versus bag-of-words for text classification | |
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models | |
GPT4All: An Ecosystem of Open Source Compressed Language Models | |
Evaluating Large Language Models: A Comprehensive Survey | |
Leveraging Large Language Models for Automated Proof Synthesis in Rust | |
GPTScore: Evaluate as You Desire | |
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | |
S-LoRA: Serving Thousands of Concurrent LoRA Adapters | |
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency | |
Finding Neurons in a Haystack: Case Studies with Sparse Probing | |
Simple and Controllable Music Generation | |
Can LLMs Follow Simple Rules? | |
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM | |
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models | |
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning | |
Memory Augmented Language Models through Mixture of Word Experts | |
Language Models can be Logical Solvers | |
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | |
ADaPT: As-Needed Decomposition and Planning with Language Models | |
FinGPT: Large Generative Models for a Small Language | |
Simplifying Transformer Blocks | |
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs | |
Prompt Engineering a Prompt Engineer | |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | |
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves | |
Accelerating Large Language Model Decoding with Speculative Sampling | |
Alternating Updates for Efficient Transformers | |
White-Box Transformers via Sparse Rate Reduction | |
ChatAnything: Facetime Chat with LLM-Enhanced Personas | |
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data | |
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 | |
LayoutPrompter: Awaken the Design Ability of Large Language Models | |
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations | |
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation | |
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning | |
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text | |
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | |
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models | |
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer | |
Trusted Source Alignment in Large Language Models | |
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations | |
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | |
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5? | |
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | |
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure | |
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | |
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming | |
The ART of LLM Refinement: Ask, Refine, and Trust | |
Fine-tuning Language Models for Factuality | |
A Survey on Language Models for Code | |
DiLoCo: Distributed Low-Communication Training of Language Models | |
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks | |
Fusion-Eval: Integrating Evaluators with LLMs | |
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers | |
SiRA: Sparse Mixture of Low Rank Adaptation | |
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives | |
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation | |
UT5: Pretraining Non autoregressive T5 with unrolled denoising | |
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models | |
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying | |
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models | |
Contrastive Chain-of-Thought Prompting | |
Learning to Filter Context for Retrieval-Augmented Generation | |
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery | |
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models | |
System 2 Attention (is something you might need too) | |
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration | |
Language Models are Multilingual Chain-of-Thought Reasoners | |
ProAgent: From Robotic Process Automation to Agentic Process Automation | |
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers | |
Exponentially Faster Language Modelling | |
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 | |
ToolTalk: Evaluating Tool-Usage in a Conversational Setting | |
Testing Language Model Agents Safely in the Wild | |
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort | |
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning | |
Orca 2: Teaching Small Language Models How to Reason | |
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections | |
On Leakage of Code Generation Evaluation Datasets | |
GPQA: A Graduate-Level Google-Proof Q&A Benchmark | |
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection | |
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | |
SelfEval: Leveraging the discriminative nature of generative models for evaluation | |
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | |
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework | |
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores | |
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
HiPPO: Recurrent Memory with Optimal Polynomial Projections | |
Transformer Memory as a Differentiable Search Index | |
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | |
DeiT III: Revenge of the ViT | |
Scaling Vision Transformers to 22 Billion Parameters | |
On Calibration of Modern Neural Networks | |
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks | |
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers | |
Attention Is All You Need | |
Acceleration via Fractal Learning Rate Schedules | |
Transformers learn in-context by gradient descent | |
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models | |
Toy Models of Superposition | |
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | |
Unified Scaling Laws for Routed Language Models | |
CLIPPO: Image-and-Language Understanding from Pixels Only | |
Task-Specific Skill Localization in Fine-tuned Language Models | |
Discovering Latent Knowledge in Language Models Without Supervision | |
OCR-free Document Understanding Transformer | |
Language Models are Few-Shot Learners | |
Progress measures for grokking via mechanistic interpretability | |
Learning Transferable Visual Models From Natural Language Supervision | |
Zero-Shot Text-to-Image Generation | |
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models | |
muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems | |
Language Models as Agent Models | |
Learning Models of Individual Behavior in Chess | |
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning | |
Ask Me Anything: A simple strategy for prompting language models | |
Training language models to follow instructions with human feedback | |
Sequence to Sequence Learning with Neural Networks | |
SegGPT: Segmenting Everything In Context | |
A data-driven approach for learning to control computers | |
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation | |
Unifying Vision, Text, and Layout for Universal Document Processing | |
Memorizing Transformers | |
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling | |
Beyond Memorization: Violating Privacy Via Inference with Large Language Models | |
A Succinct Summary of Reinforcement Learning | |
Symbolic Discovery of Optimization Algorithms | |
Confronting Reward Model Overoptimization with Constrained RLHF | |
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | |
A Cookbook of Self-Supervised Learning | |
Training Language Models with Language Feedback at Scale | |
Answering Questions by Meta-Reasoning over Multiple Chains of Thought | |
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment | |
SemDeDup: Data-efficient learning at web-scale through semantic deduplication | |
Adversarial Examples for Evaluating Reading Comprehension Systems | |
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction | |
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP | |
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning | |
ImageBind: One Embedding Space To Bind Them All | |
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks | |
Scaling Data-Constrained Language Models | |
Efficient LLM Inference on CPUs | |
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models | |
Efficiently Scaling Transformer Inference | |
One Model To Learn Them All | |
Brain decoding: toward real-time reconstruction of visual perception | |
GLU Variants Improve Transformer | |
Vision Transformers with Mixed-Resolution Tokenization | |
HyperNetworks | |
InRank: Incremental Low-Rank Learning | |
Text-to-Image Diffusion Models are Zero-Shot Classifiers | |
CoBIT: A Contrastive Bi-directional Image-Text Generation Model | |
MAGVLT: Masked Generative Vision-and-Language Transformer | |
DINOv2: Learning Robust Visual Features without Supervision | |
What learning algorithm is in-context learning? Investigations with linear models | |
Any-to-Any Generation via Composable Diffusion | |
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | |
Shortformer: Better Language Modeling using Shorter Inputs | |
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity | |
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture | |
PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
The alignment problem from a deep learning perspective | |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | |
Jailbreaking is Best Solved by Definition | |
Multimodal Analogical Reasoning over Knowledge Graphs | |
Segment Everything Everywhere All at Once | |
DocPrompting: Generating Code by Retrieving the Docs | |
Emergent Tool Use From Multi-Agent Autocurricula | |
Root Mean Square Layer Normalization | |
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans | |
Efficient Training of Language Models to Fill in the Middle | |
AI for Mathematics: A Cognitive Science Perspective | |
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators | |
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? | |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | |
The First Room-Temperature Ambient-Pressure Superconductor | |
Segment Anything | |
Less is More: Parameter-Free Text Classification with Gzip | |
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions | |
A Generalist Agent | |
Meet in the Middle: A New Pre-training Paradigm | |
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations | |
Can Humans Do Less-Than-One-Shot Learning? | |
Diffusion-LM Improves Controllable Text Generation | |
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking | |
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets | |
Text-to-3D using Gaussian Splatting | |
Precise Zero-Shot Dense Retrieval without Relevance Labels | |
Brainformers: Trading Simplicity for Efficiency | |
DETRs Beat YOLOs on Real-time Object Detection | |
OtterHD: A High-Resolution Multi-modality Model | |
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval | |
ConvNets Match Vision Transformers at Scale | |
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models | |
Scaling Robot Learning with Semantically Imagined Experience | |
Do LLMs exhibit human-like response biases? A case study in survey design | |
READ: Recurrent Adaptation of Large Transformers | |
Benchmarking Neural Network Training Algorithms | |
Automatic Gradient Descent: Deep Learning without Hyperparameters | |
Layer Normalization | |
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | |
Implicit Representations of Meaning in Neural Language Models | |
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable | |
SqueezeLLM: Dense-and-Sparse Quantization | |
Optimisation & Generalisation in Networks of Neurons | |
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals | |
Transformers as Recognizers of Formal Languages: A Survey on Expressivity | |
The effectiveness of MAE pre-pretraining for billion-scale pretraining | |
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks | |
Decoupled Context Processing for Context Augmented Language Modeling | |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | |
The Transient Nature of Emergent In-Context Learning in Transformers | |
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning | |
Matryoshka Diffusion Models | |
Show Your Work: Scratchpads for Intermediate Computation with Language Models | |
Beyond neural scaling laws: beating power law scaling via data pruning | |
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? | |
Going Deeper with Convolutions | |
TimeGPT-1 | |
Capabilities of GPT-4 on Medical Challenge Problems | |
Training Large Language Models Efficiently with Sparsity and Dataflow | |
Optimal Policies Tend to Seek Power | |
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | |
Thinking Like Transformers | |
Why think step by step? Reasoning emerges from the locality of experience | |
Mixture-of-Experts with Expert Choice Routing | |
GPT-4 Technical Report | |
Scaling Expert Language Models with Unsupervised Domain Discovery | |
End-to-End Spatio-Temporal Action Localisation with Video Transformers | |
Mass-Editing Memory in a Transformer | |
Erasing Concepts from Diffusion Models | |
Physics of Language Models: Part 1, Context-Free Grammar | |
Flamingo: a Visual Language Model for Few-Shot Learning | |
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs | |
Semantic Tokenizer for Enhanced Natural Language Processing | |
On Limitations of the Transformer Architecture | |
A Survey of Large Language Models | |
Affordances from Human Videos as a Versatile Representation for Robotics | |
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | |
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | |
Conditioning Predictive Models: Risks and Strategies | |
Implicit Chain of Thought Reasoning via Knowledge Distillation | |
Scaling Laws for Transfer | |
Risks from Learned Optimization in Advanced Machine Learning Systems | |
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression | |
Bayesian Optimization of Catalysts With In-context Learning | |
Teach LLMs to Phish: Stealing Private Information from Language Models | |
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization | |
Knowledge Graphs | |
Language Modelling with Pixels | |
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization | |
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | |
Chinchilla Scaling: A replication attempt | |
Retrofitting Word Vectors to Semantic Lexicons | |
CoLT5: Faster Long-Range Transformers with Conditional Computation | |
Deep contextualized word representations | |
Boosted Prompt Ensembles for Large Language Models | |
Recurrent Memory Transformer | |
Multitask Prompted Training Enables Zero-Shot Task Generalization | |
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs | |
Monarch: Expressive Structured Matrices for Efficient and Accurate Training | |
On the Turing Completeness of Modern Neural Network Architectures | |
Generalized Out-of-Distribution Detection: A Survey | |
AugGPT: Leveraging ChatGPT for Text Data Augmentation | |
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism | |
SLiC-HF: Sequence Likelihood Calibration with Human Feedback | |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models | |
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | |
Human-Timescale Adaptation in an Open-Ended Task Space | |
Sigmoid Loss for Language Image Pre-Training | |
OpenScene: 3D Scene Understanding with Open Vocabularies | |
Nougat: Neural Optical Understanding for Academic Documents | |
SoundStorm: Efficient Parallel Audio Generation | |
Text and Code Embeddings by Contrastive Pre-Training | |
Fine-Tuning Language Models from Human Preferences | |
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | |
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models | |
Effective Theory of Transformers at Initialization | |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | |
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models | |
Natural Selection Favors AIs over Humans | |
ART: Automatic multi-step reasoning and tool-use for large language models | |
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection | |
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models | |
Visual Instruction Tuning | |
Efficiently Modeling Long Sequences with Structured State Spaces | |
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges | |
Mastering Diverse Domains through World Models | |
Simplified State Space Layers for Sequence Modeling | |
Offline RL for Natural Language Generation with Implicit Language Q Learning | |
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | |
Deduplicating Training Data Mitigates Privacy Risks in Language Models | |
Self-supervised Learning: Generative or Contrastive | |
Towards Automated Circuit Discovery for Mechanistic Interpretability | |
Neural Story Planning | |
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training | |
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements | |
Dota 2 with Large Scale Deep Reinforcement Learning | |
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability | |
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | |
The Matrix Calculus You Need For Deep Learning | |
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models | |
DeepNet: Scaling Transformers to 1,000 Layers | |
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens | |
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection | |
LLMs cannot find reasoning errors, but can correct them! | |
Pretraining Without Attention | |
Large language models are not zero-shot communicators | |
Semi-supervised Sequence Learning | |
Improving language models by retrieving from trillions of tokens | |
Synthetic Data from Diffusion Models Improves ImageNet Classification | |
Level Generation Through Large Language Models | |
How Does Generative Retrieval Scale to Millions of Passages? | |
State Spaces Aren't Enough: Machine Translation Needs Attention | |
Data Distributional Properties Drive Emergent In-Context Learning in Transformers | |
Evaluating Large Language Models Trained on Code | |
Injecting structural hints: Using language models to study inductive biases in language learning | |
The case for 4-bit precision: k-bit Inference Scaling Laws | |
Divide-or-Conquer? Which Part Should You Distill Your LLM? | |
Downstream Datasets Make Surprisingly Good Pretraining Corpora | |
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark | |
Fast Transformer Decoding: One Write-Head is All You Need | |
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities | |
Towards Deep Learning Models Resistant to Adversarial Attacks | |
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards | |
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok | |
Large Language Models as General Pattern Machines | |
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models | |
Fast and forward stable randomized algorithms for linear least-squares problems | |
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | |
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models | |
Twist Decoding: Diverse Generators Guide Each Other | |
Monolith: Real Time Recommendation System With Collisionless Embedding Table | |
On-Device Training Under 256KB Memory | |
Meta-Learning in Neural Networks: A Survey | |
The Linear Representation Hypothesis and the Geometry of Large Language Models | |
The Power of Scale for Parameter-Efficient Prompt Tuning | |
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction | |
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention | |
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers | |
GLM: General Language Model Pretraining with Autoregressive Blank Infilling | |
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference | |
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | |
Spreading vectors for similarity search | |
REFINER: Reasoning Feedback on Intermediate Representations | |
Learning to Learn Faster from Human Feedback with Language Model Predictive Control | |
Low-code LLM: Visual Programming over LLMs | |
Decoding speech perception from non-invasive brain recordings | |
Towards Agile Text Classifiers for Everyone | |
Cramming: Training a Language Model on a Single GPU in One Day | |
Text-to-Table: A New Way of Information Extraction | |
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP | |
WizardLM: Empowering Large Language Models to Follow Complex Instructions | |
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | |
ViperGPT: Visual Inference via Python Execution for Reasoning | |
Spatial-Language Attention Policies for Efficient Robot Learning | |
Improved Baselines with Visual Instruction Tuning | |
Decision Transformer: Reinforcement Learning via Sequence Modeling | |
What Algorithms can Transformers Learn? A Study in Length Generalization | |
Tracking Everything Everywhere All at Once | |
Bad Global Minima Exist and SGD Can Reach Them | |
Directly Fine-Tuning Diffusion Models on Differentiable Rewards | |
Fine-Tuning LLaMA for Multi-Stage Text Retrieval | |
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | |
EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
Optimizing Memory Mapping Using Deep Reinforcement Learning | |
A General Theoretical Paradigm to Understand Learning from Human Preferences | |
Beyond Words: A Comprehensive Survey of Sentence Representations | |
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | |
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought | |
Adding Gradient Noise Improves Learning for Very Deep Networks | |
Positional Description Matters for Transformers Arithmetic | |
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? | |
Calibrated Language Models Must Hallucinate | |
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks | |
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement | |
Online Decision Transformer | |
Benchmarking Large Language Models for News Summarization | |
Overthinking the Truth: Understanding how Language Models Process False Demonstrations | |
Scalable Extraction of Training Data from (Production) Language Models | |
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? | |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | |
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization | |
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization | |
Visual In-Context Prompting | |
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models | |
GAIA: a benchmark for General AI Assistants | |
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory | |
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia | |
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text | |
Chain-of-Thought Reasoning is a Policy Improvement Operator | |
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | |
Thinking Fast and Slow in Large Language Models | |
Towards Accurate Differential Diagnosis with Large Language Models | |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces | |
Vanishing Gradients in Reinforcement Finetuning of Language Models | |
The History and Risks of Reinforcement Learning and Human Feedback | |
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning | |
Video Language Planning | |
Thread of Thought Unraveling Chaotic Contexts | |
PaSS: Parallel Speculative Sampling | |
SeaLLMs -- Large Language Models for Southeast Asia | |
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models | |
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models | |
An LLM Compiler for Parallel Function Calling | |
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation | |
WinoGrande: An Adversarial Winograd Schema Challenge at Scale | |
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey | |
Magicoder: Source Code Is All You Need | |
SILC: Improving Vision Language Pretraining with Self-Distillation | |
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | |
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback | |
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents | |
An Early Evaluation of GPT-4V(ision) | |
Farzi Data: Autoregressive Data Distillation | |
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models | |
One Embedder, Any Task: Instruction-Finetuned Text Embeddings | |
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents | |
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | |
Towards a Unified View of Parameter-Efficient Transfer Learning | |
Beyond Surface: Probing LLaMA Across Scales and Layers | |
TiC-CLIP: Continual Training of CLIP Models | |
GPT4Point: A Unified Framework for Point-Language Understanding and Generation | |
GOAT: GO to Any Thing | |
Nash Learning from Human Feedback | |
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | |
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | |
Axiomatic Preference Modeling for Longform Question Answering | |
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling | |
Efficient Monotonic Multihead Attention | |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | |
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | |
Are LLMs Useful in the Poorest Schools? theTeacherAI in Sierra Leone | |
De-Diffusion Makes Text a Strong Cross-Modal Interface | |
Dolphins: Multimodal Language Model for Driving | |
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture | |
Efficient Transformer Knowledge Distillation: A Performance Review | |
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs | |
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments | |
Instruction-tuning Aligns LLMs to the Human Brain | |
Large Language Model Alignment: A Survey | |
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | |
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics | |
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models | |
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs | |
Instruction-Following Evaluation for Large Language Models | |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | |
Pre-Training to Learn in Context | |
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks | |
Large Language Models for Mathematicians | |
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words | |
Language Model Inversion | |
Training Chain-of-Thought via Latent-Variable Inference | |
The Quantization Model of Neural Scaling | |
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses | |
TinyGSM: achieving >80% on GSM8k with small language models | |
Context Tuning for Retrieval Augmented Generation | |
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning | |
TigerBot: An Open Multilingual Multitask LLM | |
PromptBench: A Unified Library for Evaluation of Large Language Models | |
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions | |
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | |
Challenges with unsupervised LLM knowledge discovery | |
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges | |
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | |
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision | |
Honeybee: Locality-enhanced Projector for Multimodal LLM | |
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation | |
ProTIP: Progressive Tool Retrieval Improves Planning | |
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets | |
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | |
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding | |
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection | |
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models | |
SparQ Attention: Bandwidth-Efficient LLM Inference | |
Silkie: Preference Distillation for Large Visual Language Models | |
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | |
Algorithmic Collusion by Large Language Models | |
Mathematical Language Models: A Survey | |
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention | |
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | |
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes | |
Pixel Aligned Language Models | |
PathFinder: Guided Search over Multi-Step Reasoning Paths | |
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | |
Vision-Language Models as a Source of Rewards | |
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | |
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" | |
Language-Informed Visual Concept Learning | |
Evaluation of Large Language Models for Decision Making in Autonomous Driving | |
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent | |
Extending Context Window of Large Language Models via Semantic Compression | |
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions | |
Formal Aspects of Language Modeling | |
Large Language Models on Graphs: A Comprehensive Survey | |
Merlin:Empowering Multimodal LLMs with Foresight Minds | |
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey | |
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming | |
Generating Illustrated Instructions | |
Alignment for Honesty | |
Paloma: A Benchmark for Evaluating Language Model Fit | |
Self-Evaluation Improves Selective Generation in Large Language Models | |
Nomic Embed: Training a Reproducible Long Context Text Embedder | |
Rejuvenating image-GPT as Strong Visual Representation Learners | |
Object Recognition as Next Token Prediction | |
Foundation Models in Robotics: Applications, Challenges, and the Future | |
Distributed Inference and Fine-tuning of Large Language Models Over The Internet | |
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning | |
Data Management For Large Language Models: A Survey | |
AtP*: An efficient and scalable method for localizing LLM behaviour to components | |
Knowledge Distillation of Large Language Models | |
Faithful Persona-based Conversational Dataset Generation with Large Language Models | |
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! | |
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks | |
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism | |
Localized Symbolic Knowledge Distillation for Visual Commonsense Models | |
Weight subcloning: direct initialization of transformers using larger pretrained ones | |
Segment and Caption Anything | |
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation | |
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models | |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | |
OneLLM: One Framework to Align All Modalities with Language | |
Steering Llama 2 via Contrastive Activation Addition | |
VILA: On Pre-training for Visual Language Models | |
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions | |
HyperAttention: Long-context Attention in Near-Linear Time | |
LLM360: Towards Fully Transparent Open-Source LLMs | |
Efficient Transformers with Dynamic Token Pooling | |
GIVT: Generative Infinite-Vocabulary Transformers | |
Modeling Context in Referring Expressions | |
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes | |
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise | |
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model | |
Text-Conditioned Resampler For Long Form Video Understanding | |
Gemini: A Family of Highly Capable Multimodal Models | |
LLMs are Not Just Next Token Predictors | |
LLM in a flash: Efficient Large Language Model Inference with Limited Memory | |
Cascade Speculative Drafting for Even Faster LLM Inference | |
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | |
VideoPoet: A Large Language Model for Zero-Shot Video Generation | |
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | |
AppAgent: Multimodal Agents as Smartphone Users | |
Time is Encoded in the Weights of Finetuned Language Models | |
Generative Multimodal Models are In-Context Learners | |
Cached Transformers: Improving Transformers with Differentiable Memory Cache | |
Mini-GPTs: Efficient Large Language Models through Contextual Pruning | |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | |
An In-depth Look at Gemini's Language Abilities | |
Retrieval-Augmented Generation for Large Language Models: A Survey | |
Intriguing Properties of Quantization at Scale | |
Parrot Captions Teach CLIP to Spot Text | |
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math | |
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | |
YAYI 2: Multilingual Open-Source Large Language Models | |
Reasons to Reject? Aligning Language Models with Judgments | |
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation | |
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding | |
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion | |
Exploiting Novel GPT-4 APIs | |
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
VCoder: Versatile Vision Encoders for Multimodal Large Language Models | |
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models | |
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | |
LLM4VG: Large Language Models Evaluation for Video Grounding | |
Shai: A large language model for asset management | |
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation | |
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment | |
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 | |
Supervised Knowledge Makes Large Language Models Better In-context Learners | |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | |
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases | |
The LLM Surgeon | |
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | |
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices | |
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | |
Task Contamination: Language Models May Not Be Few-Shot Anymore | |
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training | |
Learning Vision from Models Rivals Learning Vision from Data | |
TinyLlama: An Open-Source Small Language Model | |
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models | |
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation | |
Making Large Language Models A Better Foundation For Dense Retrieval | |
LARP: Language-Agent Role Play for Open-World Games | |
A Survey of Reasoning with Foundation Models | |
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape | |
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs | |
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks | |
Towards the Law of Capacity Gap in Distilling Language Models | |
At Which Training Stage Does Code Data Help LLMs Reasoning? | |
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve | |
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery | |
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition | |
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers | |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning | |
A Comprehensive Study of Knowledge Editing for Large Language Models | |
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM | |
Orion-14B: Open-source Multilingual Large Language Models | |
LLaMA Beyond English: An Empirical Study on Language Capability Transfer | |
DocLLM: A layout-aware generative language model for multimodal document understanding | |
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training | |
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | |
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models | |
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | |
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws | |
GeoGalactica: A Scientific Large Language Model in Geoscience | |
Improving Text Embeddings with Large Language Models | |
Boosting Large Language Model for Speech Synthesis: An Empirical Study | |
TrustLLM: Trustworthiness in Large Language Models | |
Unicron: Economizing Self-Healing LLM Training at Scale | |
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | |
Proving Test Set Contamination in Black Box Language Models | |
LLaMA Pro: Progressive LLaMA with Block Expansion | |
LLM Augmented LLMs: Expanding Capabilities through Composition | |
LLaVA-$φ$: Efficient Multi-Modal Assistant with Small Language Model | |
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers | |
Understanding LLMs: A Comprehensive Overview from Training to Inference | |
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers | |
A Vision Check-up for Language Models | |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | |
Multilingual Instruction Tuning With Just a Pinch of Multilinguality | |
WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope | |
GPT-4V(ision) is a Generalist Web Agent, if Grounded | |
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs | |
Mind2Web: Towards a Generalist Agent for the Web | |
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | |
DocGraphLM: Documental Graph Language Model for Information Extraction | |
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache | |
TOFU: A Task of Fictitious Unlearning for LLMs | |
Transformers are Multi-State RNNs | |
Secrets of RLHF in Large Language Models Part II: Reward Modeling | |
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | |
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages | |
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism | |
Towards Conversational Diagnostic AI | |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | |
Efficient LLM inference solution on Intel GPU | |
I am a Strange Dataset: Metalinguistic Tests for Language Models | |
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | |
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models | |
The Impact of Reasoning Step Length on Large Language Models | |
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | |
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding | |
Mixtral of Experts | |
ChatQA: Building GPT-4 Level Conversational QA Models | |
TeleChat Technical Report | |
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models | |
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon | |
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding | |
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach | |
MaLA-500: Massive Language Adaptation of Large Language Models | |
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks | |
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion? | |
State of What Art? A Call for Multi-Prompt LLM Evaluation | |
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting | |
Compressing Context to Enhance Inference Efficiency of Large Language Models | |
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks | |
VMamba: Visual State Space Model | |
DiffusionGPT: LLM-Driven Text-to-Image Generation System | |
Self-Rewarding Language Models | |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | |
Asynchronous Local-SGD Training for Language Modeling | |
ReFT: Reasoning with Reinforced Fine-Tuning | |
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | |
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference | |
Tuning Language Models by Proxy | |
Scalable Pre-training of Large Autoregressive Image Models | |
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation | |
Extending LLMs' Context Window with 100 Samples | |
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models | |
SPADE: Synthesizing Assertions for Large Language Model Pipelines | |
Foundations of Vector Retrieval | |
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation | |
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads | |
Evaluating the Moral Beliefs Encoded in LLMs | |
Boosting Theory-of-Mind Performance in Large Language Models via Prompting | |
MambaByte: Token-free Selective State Space Model | |
RakutenAI-7B: Extending Large Language Models for Japanese | |
MM-LLMs: Recent Advances in MultiModal Large Language Models | |
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents | |
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding | |
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study | |
Small Language Model Meets with Reinforced Vision Vocabulary | |
WARM: On the Benefits of Weight Averaged Reward Models | |
In-Context Learning for Extreme Multi-Label Classification | |
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | |
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text | |
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark | |
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | |
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | |
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion | |
What Are Tools Anyway? A Survey from the Language Model Perspective | |
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models | |
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection | |
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment | |
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation | |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | |
Mission: Impossible Language Models | |
Benchmarking LLMs via Uncertainty Quantification | |
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models | |
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering | |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence | |
H2O-Danube-1.8B Technical Report | |
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | |
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion | |
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI | |
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | |
Representation Engineering: A Top-Down Approach to AI Transparency | |
LongAlign: A Recipe for Long Context Alignment of Large Language Models | |
Scavenging Hyena: Distilling Transformers into Long Convolution Models | |
Efficient Tool Use with Chain-of-Abstraction Reasoning | |
YOLO-World: Real-Time Open-Vocabulary Object Detection | |
Weaver: Foundation Models for Creative Writing | |
Weak-to-Strong Jailbreaking on Large Language Models | |
Transfer Learning for Text Diffusion Models | |
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis | |
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives | |
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | |
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling | |
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception | |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models | |
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture | |
Watermarking Makes Language Models Radioactive | |
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities | |
SliceGPT: Compress Large Language Models by Deleting Rows and Columns | |
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support | |
Generative Expressive Robot Behaviors using Large Language Models | |
Efficient Exploration for LLMs | |
Can Large Language Models Understand Context? | |
SymbolicAI: A framework for logic-based approaches combining generative models and solvers | |
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? | |
OLMo: Accelerating the Science of Language Models | |
Tree Prompting: Efficient Task Adaptation without Fine-Tuning | |
CroissantLLM: A Truly Bilingual French-English Language Model | |
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model | |
Transforming and Combining Rewards for Aligning Large Language Models | |
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models | |
Scaling Laws for Downstream Task Performance of Large Language Models | |
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research | |
Seven Failure Points When Engineering a Retrieval Augmented Generation System | |
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | |
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks | |
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations | |
Multi-line AI-assisted Code Authoring | |
Self-Discover: Large Language Models Self-Compose Reasoning Structures | |
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | |
Training-Free Consistent Text-to-Image Generation | |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | |
Shortened LLaMA: A Simple Depth Pruning for Large Language Models | |
Rethinking Optimization and Architecture for Tiny Language Models | |
LiPO: Listwise Preference Optimization through Learning-to-Rank | |
BlackMamba: Mixture of Experts for State-Space Models | |
Rethinking Interpretability in the Era of Large Language Models | |
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | |
TravelPlanner: A Benchmark for Real-World Planning with Language Agents | |
K-Level Reasoning with Large Language Models | |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | |
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models | |
Specialized Language Models with Cheap Inference from Limited Domain Data | |
Repeat After Me: Transformers are Better than State Space Models at Copying | |
A Survey on Hallucination in Large Vision-Language Models | |
Corrective Retrieval Augmented Generation | |
A Comprehensive Survey of Compression Algorithms for Language Models | |
Leveraging Large Language Models for NLG Evaluation: A Survey | |
The Power of Noise: Redefining Retrieval for RAG Systems | |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | |
Red Teaming Visual Language Models | |
Knowledge Fusion of Large Language Models | |
A Survey of Resource-efficient LLM and Multimodal Foundation Models | |
Lexinvariant Language Models | |
Noise2Music: Text-conditioned Music Generation with Diffusion Models | |
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery | |
Mathematical Capabilities of ChatGPT | |
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation | |
Large Language Models for Mathematical Reasoning: Progresses and Challenges | |
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | |
Driving Everywhere with Large Language Model Policy Adaptation | |
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | |
SpiRit-LM: Interleaved Spoken and Written Language Model | |
Multilingual E5 Text Embeddings: A Technical Report | |
In-Context Principle Learning from Mistakes | |
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains | |
Hydragen: High-Throughput LLM Inference with Shared Prefixes | |
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay | |
Fast Timing-Conditioned Latent Audio Diffusion | |
Direct Language Model Alignment from Online AI Feedback | |
Grandmaster-Level Chess Without Search | |
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | |
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | |
Tandem Transformers for Inference Efficient LLMs | |
World Model on Million-Length Video And Language With RingAttention | |
Lumos : Empowering Multimodal LLMs with Scene Text Recognition | |
Suppressing Pink Elephants with Direct Principle Feedback | |
Policy Improvement using Language Feedback Models | |
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | |
Scaling Laws for Fine-Grained Mixture of Experts | |
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | |
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model | |
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts | |
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping | |
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | |
ODIN: Disentangled Reward Mitigates Hacking in RLHF | |
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting | |
A Tale of Tails: Model Collapse as a Change of Scaling Laws | |
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | |
Generative Representational Instruction Tuning | |
ChemLLM: A Chemical Large Language Model | |
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning | |
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | |
DeAL: Decoding-time Alignment for Large Language Models | |
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling | |
SubGen: Token Generation in Sublinear Time and Memory | |
Keyframer: Empowering Animation Design using Large Language Models | |
Large Language Model for Table Processing: A Survey | |
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | |
Approaching Human-Level Forecasting with Language Models | |
A phase transition between positional and semantic learning in a solvable model of dot-product attention | |
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning | |
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks | |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | |
Premise Order Matters in Reasoning with Large Language Models | |
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | |
Chain-of-Thought Reasoning Without Prompting | |
BitDelta: Your Fine-Tune May Only Be Worth One Bit | |
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
Data Engineering for Scaling Language Models to 128K Context | |
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization | |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts | |
How to Train Data-Efficient LLMs | |
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects | |
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers | |
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency | |
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents | |
Arrows of Time for Large Language Models | |
Coercing LLMs to do and reveal (almost) anything | |
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | |
Speculative Streaming: Fast LLM Inference without Auxiliary Models | |
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting | |
User-LLM: Efficient LLM Contextualization with User Embeddings | |
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models | |
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization | |
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts | |
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | |
Instruction-tuned Language Models are Better Knowledge Learners | |
The FinBen: An Holistic Financial Benchmark for Large Language Models | |
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling | |
The boundary of neural network trainability is fractal | |
Reformatted Alignment | |
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | |
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | |
OneBit: Towards Extremely Low-bit Large Language Models | |
CoLLaVO: Crayon Large Language and Vision mOdel | |
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models | |
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | |
RLVF: Learning from Verbal Feedback without Overgeneralization | |
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models | |
Efficient Guided Generation for Large Language Models | |
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention | |
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models | |
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling | |
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows | |
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing | |
Generative Language Modeling for Automated Theorem Proving | |
Automated Unit Test Improvement using Large Language Models at Meta | |
LLM Agents can Autonomously Hack Websites | |
Large Language Models: A Survey | |
In-Context Retrieval-Augmented Language Models | |
Consolidating Attention Features for Multi-view Image Editing | |
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models | |
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | |
Scaling Up LLM Reviews for Google Ads Content Moderation | |
Subobject-level Image Tokenization | |
TinyLLaVA: A Framework of Small-scale Large Multimodal Models | |
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming | |
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models | |
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons | |
EvoPrompting: Language Models for Code-Level Neural Architecture Search | |
Goal Driven Discovery of Distributional Differences via Language Descriptions | |
ChatMusician: Understanding and Generating Music Intrinsically with LLM | |
GPTVQ: The Blessing of Dimensionality for LLM Quantization | |
FuseChat: Knowledge Fusion of Chat Models | |
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | |
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | |
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | |
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | |
Large Language Models for Data Annotation: A Survey | |
LoRA+: Efficient Low Rank Adaptation of Large Models | |
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | |
Towards Optimal Learning of Language Models | |
Evaluating Very Long-Term Conversational Memory of LLM Agents | |
Training-Free Long-Context Scaling of Large Language Models | |
Disentangled 3D Scene Generation with Layout Learning | |
Do Large Language Models Latently Perform Multi-Hop Reasoning? | |
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | |
Nemotron-4 15B Technical Report | |
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | |
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding | |
Towards Open-ended Visual Quality Comparison | |
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method | |
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT | |
Orca-Math: Unlocking the potential of SLMs in Grade School Math | |
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | |
MOSAIC: A Modular System for Assistive and Interactive Cooking | |
Priority Sampling of Large Language Models for Compilers | |
Simple linear attention language models balance the recall-throughput tradeoff | |
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access | |
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | |
StarCoder 2 and The Stack v2: The Next Generation | |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | |
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | |
Simulacra as Conscious Exotica | |
Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence | |
Enhancing Vision-Language Pre-training with Rich Supervisions | |
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | |
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs | |
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters | |
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | |
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | |
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | |
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | |
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | |
Emergent and Predictable Memorization in Large Language Models | |
Design2Code: How Far Are We From Automating Front-End Engineering? | |
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models | |
MathScale: Scaling Instruction Tuning for Mathematical Reasoning | |
Empowering Large Language Model Agents through Action Learning | |
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | |
RT-H: Action Hierarchies Using Language | |
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models | |
Resonance RoPE: Improving Context Length Generalization of Large Language Models | |
Datasets for Large Language Models: A Comprehensive Survey | |
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models | |
Do Efficient Transformers Really Save Computation? | |
MathPrompter: Mathematical Reasoning using Large Language Models | |
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT | |
Can Large Language Models Reason and Plan? | |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | |
Common 7B Language Models Already Possess Strong Math Capabilities | |
Yi: Open Foundation Models by 01.AI | |
Teaching Large Language Models to Reason with Reinforcement Learning | |
SaulLM-7B: A pioneering Large Language Model for Law | |
Online Adaptation of Language Models with a Memory of Amortized Contexts | |
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | |
Learning to Decode Collaboratively with Multiple Language Models | |
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect | |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | |
The Unreasonable Effectiveness of Eccentric Automatic Prompts | |
A Survey on Evaluation of Large Language Models | |
The pitfalls of next-token prediction | |
Stealing Part of a Production Language Model | |
Algorithmic progress in language models | |
Thinking Tokens for Language Modeling | |
Is Cosine-Similarity of Embeddings Really About Similarity? | |
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | |
Can't Remember Details in Long Documents? You Need Some R&R | |
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | |
Retrieval-Augmented Generation for AI-Generated Content: A Survey | |
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History | |
3D-VLA: A 3D Vision-Language-Action Generative World Model | |
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | |
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | |
GPT on a Quantum Computer | |
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding | |
GiT: Towards Generalist Vision Transformer through Universal Language Interface | |
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences | |
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | |
Social Skill Training with Large Language Models | |
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control | |
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset | |
Veagle: Advancements in Multimodal Representation Learning | |
Simple and Scalable Strategies to Continually Pre-train Large Language Models | |
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents | |
Language models scale reliably with over-training and on downstream tasks | |
Gemma: Open Models Based on Gemini Research and Technology | |
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | |
On the Societal Impact of Open Foundation Models | |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | |
Chronos: Learning the Language of Time Series | |
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | |
ORPO: Monolithic Preference Optimization without Reference Model | |
MoAI: Mixture of All Intelligence for Large Language and Vision Models | |
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | |
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU | |
DeepSeek-VL: Towards Real-World Vision-Language Understanding | |
How Far Are We from Intelligent Visual Deductive Reasoning? | |
Small Models are Valuable Plug-ins for Large Language Models | |
Backtracing: Retrieving the Cause of the Query | |
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies | |
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | |
Learning to Generate Better Than Your LLM | |
Meta-in-context learning in large language models | |
LERF: Language Embedded Radiance Fields | |
Eliciting Latent Predictions from Transformers with the Tuned Lens | |
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | |
Resurrecting Recurrent Neural Networks for Long Sequences | |
An Overview on Language Models: Recent Developments and Outlook | |
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | |
A Survey of Evaluation Metrics Used for NLG Systems | |
SummEval: Re-evaluating Summarization Evaluation | |
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | |
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | |
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | |
Logits of API-Protected LLMs Leak Proprietary Information | |
Knowledge Conflicts for LLMs: A Survey | |
Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model | |
Will GPT-4 Run DOOM? | |
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | |
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models | |
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization | |
Large language models surpass human experts in predicting neuroscience results | |
Reliable, Adaptable, and Attributable Language Models with Retrieval | |
You Need to Pay Better Attention | |
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval | |
Stable LM 2 1.6B Technical Report | |
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation | |
A Survey on Data Selection for Language Models | |
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails | |
Repetition Improves Language Model Embeddings | |
How Transformers Learn Causal Structure with Gradient Descent | |
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models | |
Analysing The Impact of Sequence Composition on Language Model Pre-Training | |
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | |
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models | |
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models | |
Bayesian Reward Models for LLM Alignment | |
KMMLU: Measuring Massive Multitask Language Understanding in Korean | |
Dissecting Human and LLM Preferences | |
Exploring Value Biases: How LLMs Deviate Towards the Ideal | |
Do Llamas Work in English? On the Latent Language of Multilingual Transformers | |
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models | |
Why are Sensitive Functions Hard for Transformers? | |
Agents Need Not Know Their Purpose | |
Copyright Traps for Large Language Models | |
DoRA: Weight-Decomposed Low-Rank Adaptation | |
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks | |
Rethinking Machine Unlearning for Large Language Models | |
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | |
Improving Black-box Robustness with In-Context Rewriting | |
Secret Collusion Among Generative AI Agents | |
Natural Language Reinforcement Learning | |
Universal Neural Functionals | |
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks | |
LESS: Selecting Influential Data for Targeted Instruction Tuning | |
Building Your Own Product Copilot: Challenges, Opportunities, and Needs | |
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs | |
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache | |
Continual Learning for Large Language Models: A Survey | |
Towards Efficient and Exact Optimization of Language Model Alignment | |
HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction | |
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP | |
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | |
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding | |
Spike No More: Stabilizing the Pre-training of Large Language Models | |
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems | |
Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention | |
Zoology: Measuring and Improving Recall in Efficient Language Models | |
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer | |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | |
LoBaSS: Gauging Learnability in Supervised Fine-tuning Data | |
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective | |
Instruction Tuning with Human Curriculum | |
MatFormer: Nested Transformer for Elastic Inference | |
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | |
xVal: A Continuous Number Encoding for Large Language Models | |
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models | |
Human Feedback is not Gold Standard | |
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation | |
Headless Language Models: Learning without Predicting with Contrastive Weight Tying | |
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models | |
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | |
Do language models plan ahead for future tokens? | |
CAME: Confidence-guided Adaptive Memory Efficient Optimization | |
Improving Language Plasticity via Pretraining with Active Forgetting | |
AdANNS: A Framework for Adaptive Semantic Search | |
Strategic Reasoning with Language Models | |
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies | |
Sparse is Enough in Scaling Transformers | |
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback | |
A Theory on Adam Instability in Large-Scale Machine Learning | |
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning | |
Are Language Models Worse than Humans at Following Prompts? It's Complicated | |
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition | |
Transformer Language Models without Positional Encodings Still Learn Positional Information | |
Sequence Parallelism: Long Sequence Training from System Perspective | |
Bio-inspired Structure Identification in Language Embeddings | |
Transformers without Tears: Improving the Normalization of Self-Attention | |
Neural Text Generation with Unlikelihood Training | |
MASS: Masked Sequence to Sequence Pre-training for Language Generation | |
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs | |
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | |
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | |
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | |
TnT-LLM: Text Mining at Scale with Large Language Models | |
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | |
Larimar: Large Language Models with Episodic Memory Control | |
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images | |
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | |
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data | |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback | |
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | |
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | |
RAFT: Adapting Language Model to Domain Specific RAG | |
Recurrent Drafter for Fast Speculative Decoding in Large Language Models | |
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations | |
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | |
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews | |
Language Agents as Optimizable Graphs | |
Comparative Study of Large Language Model Architectures on Frontier | |
Optimizing Distributed Training on Frontier for Large Language Models | |
Striped Attention: Faster Ring Attention for Causal Transformers | |
Block-Recurrent Transformers | |
Addressing Some Limitations of Transformers with Feedback Memory | |
Reverse Training to Nurse the Reversal Curse | |
Evaluating Frontier Models for Dangerous Capabilities | |
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | |
When Do We Not Need Larger Vision Models? | |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | |
Towards 3D Molecule-Text Interpretation in Language Models | |
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis | |
Mixture of Soft Prompts for Controllable Data Generation | |
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | |
Evolutionary Optimization of Model Merging Recipes | |
Semiparametric Token-Sequence Co-Supervision | |
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | |
On Learning to Summarize with Large Language Models as References | |
Scalable Prompt Generation for Semi-supervised Learning with Language Models | |
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models | |
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | |
MyVLM: Personalizing VLMs for User-Specific Queries | |
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | |
Recourse for reclamation: Chatting with generative language models | |
On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial | |
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging | |
The MiniPile Challenge for Data-Efficient Language Models | |
OmniNet: Omnidirectional Representations from Transformers | |
Arcee's MergeKit: A Toolkit for Merging Large Language Models | |
FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications | |
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | |
The Case for Co-Designing Model Architectures with Hardware | |
The Unreasonable Ineffectiveness of the Deeper Layers | |
Improving Text-to-Image Consistency via Automatic Prompt Optimization | |
InternLM2 Technical Report | |
AIOS: LLM Agent Operating System | |
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression | |
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | |
Can large language models explore in-context? | |
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series | |
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | |
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | |
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models | |
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | |
VidLA: Video-Language Alignment at Scale | |
Compiler generated feedback for Large Language Models | |
sDPO: Don't Use Your Data All at Once | |
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare | |
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners | |
LLM4Decompile: Decompiling Binary Code with Large Language Models | |
Getting the most out of your tokenizer for pre-training and domain adaptation | |
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese | |
Wider and Deeper LLM Networks are Fairer LLM Evaluators | |
Editing Large Language Models: Problems, Methods, and Opportunities | |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
Long-form factuality in large language models | |
Towards a World-English Language Model for On-Device Virtual Assistants | |
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | |
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling | |
STaR-GATE: Teaching Language Models to Ask Clarifying Questions | |
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding | |
LITA: Language Instructed Temporal-Localization Assistant | |
TextCraftor: Your Text Encoder Can be Image Quality Controller | |
Mechanistic Design and Scaling of Hybrid Architectures | |
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines | |
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore | |
Blockwise Parallel Transformer for Large Context Models | |
Large Language Models Can Be Strong Differentially Private Learners | |
Head-wise Shareable Attention for Large Language Models | |
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | |
ReALM: Reference Resolution As Language Modeling | |
Gecko: Versatile Text Embeddings Distilled from Large Language Models | |
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs | |
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer | |
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | |
DiJiang: Efficient Large Language Models through Compact Kernelization | |
Jamba: A Hybrid Transformer-Mamba Language Model | |
Localizing Paragraph Memorization in Language Models | |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | |
Group Preference Optimization: Few-Shot Alignment of Large Language Models | |
Communicative Agents for Software Development | |
Preference Ranking Optimization for Human Alignment | |
The CRINGE Loss: Learning what language not to model | |
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action | |
Attribute First, then Generate: Locally-attributable Grounded Text Generation | |
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models | |
FABLES: Evaluating faithfulness and content selection in book-length summarization | |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward | |
WavLLM: Towards Robust and Adaptive Speech Large Language Model | |
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text | |
ST-LLM: Large Language Models Are Effective Temporal Learners | |
Advancing LLM Reasoning Generalists with Preference Trees | |
Best Practices and Lessons Learned on Synthetic Data for Language Models | |
Long-context LLMs Struggle with Long In-context Learning | |
HyperCLOVA X Technical Report | |
Poro 34B and the Blessing of Multilinguality | |
Octopus v2: On-device language model for super agent | |
Are large language models superhuman chemists? | |
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model | |
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course | |
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | |
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models | |
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models | |
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers | |
Auxiliary task demands mask the capabilities of smaller language models | |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks | |
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity | |
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? | |
Data Interpreter: An LLM Agent For Data Science | |
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | |
Training LLMs over Neurally Compressed Text | |
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | |
ReFT: Representation Finetuning for Language Models | |
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models | |
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | |
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? | |
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | |
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models | |
Noise-Aware Training of Layout-Aware Language Models | |
AI and the Problem of Knowledge Collapse | |
Learning to Plan and Generate Text with Citations | |
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models | |
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models | |
ALOHa: A New Measure for Hallucination in Captioning Models | |
Efficient Multi-Vector Dense Retrieval Using Bit Vectors | |
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization | |
Iterative Forward Tuning Boosts In-context Learning in Language Models | |
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | |
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | |
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues | |
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences | |
Stream of Search (SoS): Learning to Search in Language | |
Large Product Key Memory for Pretrained Language Models | |
Large Memory Layers with Product Keys | |
BRAVE: Broadening the visual encoding of vision-language models | |
Adapting LLaMA Decoder to Vision Transformer | |
RULER: What's the Real Context Size of Your Long-Context Language Models? | |
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models | |
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | |
Reconstructing Hand-Held Objects in 3D | |
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | |
MuPT: A Generative Symbolic Music Pretrained Transformer | |
OmniFusion Technical Report | |
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
CodecLM: Aligning Language Models with Tailored Synthetic Data | |
SambaLingo: Teaching Large Language Models New Languages | |
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | |
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | |
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | |
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models | |
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws | |
Koala: Key frame-conditioned long video-LLM | |
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models | |
Understanding Emergent Abilities of Language Models from the Loss Perspective | |
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code | |
Making Large Language Models Better Data Creators | |
On Surgical Fine-tuning for Language Encoders | |
AdaLomo: Low-memory Optimization with Adaptive Learning Rate | |
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | |
Embedding Democratic Values into Social Media AIs via Societal Objective Functions | |
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning | |
Less is More: Selective Layer Finetuning with SubTuning | |
Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning | |
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling | |
Cut the CARP: Fishing for zero-shot story evaluation | |
LLoCO: Learning Long Contexts Offline | |
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | |
Rho-1: Not All Tokens Are What You Need | |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | |
Audio Dialogues: Dialogues dataset for audio and music understanding | |
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | |
JetMoE: Reaching Llama2 Performance with 0.1M Dollars | |
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents | |
Entity-Level Sentiment Analysis (ELSA): An exploratory task survey | |
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models | |
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | |
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators | |
Mechanics of Next Token Prediction with Self-Attention | |
Scaling Laws of RoPE-based Extrapolation | |
Pre-training Small Base LMs with Fewer Tokens | |
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies | |
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck | |
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search | |
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers | |
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data | |
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | |
Toward a Theory of Tokenization in LLMs | |
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | |
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca | |
Learn Your Reference Model for Real Good Alignment | |
Large Language Models are as persuasive as humans, but why? About the cognitive effort and moral-emotional language of LLM arguments | |
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models | |
TransformerFAM: Feedback attention is working memory | |
On Speculative Decoding for Multimodal Large Language Models | |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length | |
Generative Disco: Text-to-Video Generation for Music Visualization | |
Self-playing Adversarial Language Game Enhances LLM Reasoning | |
Compression Represents Intelligence Linearly | |
The Illusion of State in State-Space Models | |
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past | |
A Thorough Examination of Decoding Methods in the Era of LLMs | |
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA | |
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? | |
Should You Mask 15% in Masked Language Modeling? | |
Finetuning Pretrained Transformers into RNNs | |
BLINK: Multimodal Large Language Models Can See but Not Perceive | |
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | |
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment | |
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | |
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data | |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | |
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | |
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | |
Fewer Truncations Improve Language Modeling | |
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection | |
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity | |
Many-Shot In-Context Learning | |
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | |
Exploring the landscape of large language models: Foundations, techniques, and challenges | |
Automated Social Science: Language Models as Scientist and Subjects | |
Language Models Still Struggle to Zero-shot Reason about Time Series | |
Stepwise Alignment for Constrained Language Model Policy Optimization | |
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents | |
Language Imbalance Can Boost Cross-lingual Generalisation | |
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge | |
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | |
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency | |
TextSquare: Scaling up Text-Centric Visual Instruction Tuning | |
Large Language Models are Few-Shot Health Learners | |
How Far Can We Go with Practical Function-Level Program Repair? | |
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation | |
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | |
A Survey on Retrieval-Augmented Text Generation for Large Language Models | |
A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs | |
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior | |
State Space Model for New-Generation Network Alternative to Transformers: A Survey | |
LLM In-Context Recall is Prompt Dependent | |
Reducing hallucination in structured outputs via Retrieval-Augmented Generation | |
Towards Large Language Models as Copilots for Theorem Proving in Lean | |
Characterizing LLM Abstention Behavior in Science QA with Context Perturbations | |
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function | |
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences | |
Aligning language models with human preferences | |
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding | |
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation | |
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation | |
RAR-b: Reasoning as Retrieval Benchmark | |
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models | |
Deep Reinforcement Learning with a Natural Language Action Space | |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | |
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study | |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | |
FlowMind: Automatic Workflow Generation with LLMs | |
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference | |
DataComp: In search of the next generation of multimodal datasets | |
Stable and low-precision training for large-scale vision-language models | |
Multi-Head Mixture-of-Experts | |
Transformers Can Represent $n$-gram Language Models | |
Pegasus-v1 Technical Report | |
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs | |
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework | |
SnapKV: LLM Knows What You are Looking for Before Generation | |
SpaceByte: Towards Deleting Tokenization from Large Language Modeling | |
A Survey on Self-Evolution of Large Language Models | |
Retrieval Head Mechanistically Explains Long-Context Factuality | |
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels | |
SPLATE: Sparse Late Interaction Retrieval | |
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models | |
AgentKit: Flow Engineering with Graphs, not Coding | |
Rethinking LLM Memorization through the Lens of Adversarial Compression | |
What's the Magic Word? A Control Theory of LLM Prompting | |
Adapting Language Models to Compress Contexts | |
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design | |
LMentry: A Language Model Benchmark of Elementary Language Tasks | |
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | |
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners | |
Graph Machine Learning in the Era of Large Language Models (LLMs) | |
NExT: Teaching Large Language Models to Reason about Code Execution | |
"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment | |
Can Language Models Solve Olympiad Programming? | |
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs | |
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models | |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | |
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | |
Make Your LLM Fully Utilize the Context | |
Weak-to-Strong Extrapolation Expedites Alignment | |
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | |
Continual Learning of Large Language Models: A Comprehensive Survey | |
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding | |
Tele-FLM Technical Report | |
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning | |
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | |
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models | |
MoDE: CLIP Data Experts via Clustering | |
Universal Adversarial Triggers Are Not Universal | |
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | |
Improving Dictionary Learning with Gated Sparse Autoencoders | |
BASS: Batched Attention-optimized Speculative Sampling | |
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | |
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data | |
Image Segmentation Using Text and Image Prompts | |
Holistic Safety and Responsibility Evaluations of Advanced AI Models | |
WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | |
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models | |
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | |
Efficient Continual Pre-training for Building Domain Specific Large Language Models | |
DeLighT: Deep and Light-weight Transformer | |
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically | |
GeckOpt: LLM System Efficiency via Intent-Based Tool Selection | |
Better Synthetic Data by Retrieving and Transforming Existing Datasets | |
Relational Graph Convolutional Networks for Sentiment Analysis | |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | |
Foundational Challenges in Assuring Alignment and Safety of Large Language Models | |
Nyonic Technical Report | |
LLM Evaluators Recognize and Favor Their Own Generations | |
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning | |
A Survey of Generative Search and Recommendation in the Era of Large Language Models | |
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs | |
A Primer on the Inner Workings of Transformer-based Language Models | |
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF | |
zkLLM: Zero Knowledge Proofs for Large Language Models | |
A Survey on the Memory Mechanism of Large Language Model based Agents | |
Large Language Model Agent as a Mechanical Designer | |
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs | |
Near to Mid-term Risks and Opportunities of Open Source Generative AI | |
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks | |
Benchmarking Mobile Device Control Agents across Diverse Configurations | |
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark | |
Assessing The Potential Of Mid-Sized Language Models For Clinical QA | |
Conformal Prediction for Natural Language Processing: A Survey | |
Dual Modalities of Text: Visual and Textual Generative Pre-training | |
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback | |
Predicting Emergent Abilities with Infinite Resolution Evaluation | |
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding | |
Hallucination of Multimodal Large Language Models: A Survey | |
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | |
Benchmarking Benchmark Leakage in Large Language Models | |
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations | |
ChuXin: 1.6B Technical Report | |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models | |
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval | |
LEGENT: Open Platform for Embodied Agents | |
From Persona to Personalization: A Survey on Role-Playing Language Agents | |
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | |
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models | |
Autonomous LLM-driven research from data to human-verifiable research papers | |
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | |
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | |
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration | |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation | |
Beyond Words: A Mathematical Framework for Interpreting Large Language Models | |
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers | |
Ranked List Truncation for Large Language Model-based Re-Ranking | |
Building a Large Japanese Web Corpus for Large Language Models | |
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases | |
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | |
DOCCI: Descriptions of Connected and Contrasting Images | |
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | |
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | |
Better & Faster Large Language Models via Multi-token Prediction | |
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | |
Extending Llama-3's Context Ten-Fold Overnight | |
Octopus v4: Graph of language models | |
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics | |
ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting | |
How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library | |
Faster Convergence for Transformer Fine-tuning with Line Search Methods | |
Linear Transformers Are Secretly Fast Weight Programmers | |
FLAME: Factuality-Aware Alignment for Large Language Models | |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | |
In-Context Learning Creates Task Vectors | |
WildChat: 1M ChatGPT Interaction Logs in the Wild | |
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval" | |
LLM-AD: Large Language Model based Audio Description System | |
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval | |
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | |
Self-Play Preference Optimization for Language Model Alignment | |
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | |
A Careful Examination of Large Language Model Performance on Grade School Arithmetic | |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge | |
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models | |
Automatic Creative Selection with Cross-Modal Matching | |
Harmonic LLMs are Trustworthy | |
On Training a Neural Network to Explain Binaries | |
In-Context Learning with Long-Context Models: An In-Depth Exploration | |
Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning | |
Aligning LLM Agents by Learning Latent Preference from User Edits | |
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | |
Neural Networks Learn Statistics of Increasing Complexity | |
Emerging Properties in Self-Supervised Vision Transformers | |
Advancing Multimodal Medical Capabilities of Gemini | |
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust | |
D2PO: Discriminator-Guided DPO with Response Evaluation Models | |
Controllable Text Generation in the Instruction-Tuning Era | |
MANTIS: Interleaved Multi-Image Instruction Tuning | |
A Philosophical Introduction to Language Models - Part II: The Way Forward | |
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | |
How do Large Language Models Handle Multilingualism? | |
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models | |
Modeling Emotions and Ethics with Large Language Models | |
Structured Chemistry Reasoning with Large Language Models | |
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks | |
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO | |
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference | |
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences | |
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | |
Characterising the Creative Process in Humans and Large Language Models | |
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction | |
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering | |
AlphaMath Almost Zero: process Supervision without process | |
MAmmoTH2: Scaling Instructions from the Web | |
Is Flash Attention Stable? | |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions | |
What matters when building vision-language models? | |
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | |
Understanding LLMs Requires More Than Statistical Generalization | |
Efficient and Economic Large Language Model Inference with Attention Offloading | |
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law | |
Large Language Models are Inconsistent and Biased Evaluators | |
101 Billion Arabic Words Dataset | |
What is Sentiment Meant to Mean to Language Models? | |
GPT-4 passes most of the 297 written Polish Board Certification Examinations | |
Text Quality-Based Pruning for Efficient Training of Language Models | |
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant | |
On the Evaluation of Machine-Generated Reports | |
Automatic Programming: Large Language Models and Beyond | |
Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs | |
Multi-hop Question Answering over Knowledge Graphs using Large Language Models | |
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | |
Parallel Structures in Pre-training Data Yield In-Context Learning | |
BooookScore: A systematic exploration of book-length summarization in the era of LLMs | |
Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | |
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders | |
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | |
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training | |
Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
ReZero is All You Need: Fast Convergence at Large Depth | |
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | |
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | |
A Transformer with Stack Attention | |
xLSTM: Extended Long Short-Term Memory | |
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions | |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | |
The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring | |
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform | |
Granite Code Models: A Family of Open Foundation Models for Code Intelligence | |
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference | |
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches | |
Assemblage: Automatic Binary Dataset Construction for Machine Learning | |
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | |
Modeling Caption Diversity in Contrastive Vision-Language Pretraining | |
CLLMs: Consistency Large Language Models | |
You Only Cache Once: Decoder-Decoder Architectures for Language Models | |
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context | |
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control | |
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals | |
Chain of Thoughtlessness: An Analysis of CoT in Planning | |
LLMs Can Patch Up Missing Relevance Judgments in Evaluation | |
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures | |
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | |
How Susceptible are Large Language Models to Ideological Manipulation? | |
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | |
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers | |
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | |
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models | |
Can We Use Large Language Models to Fill Relevance Judgment Holes? | |
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | |
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models | |
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics | |
The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models | |
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models | |
Automating the Enterprise with Foundation Models | |
Enhancing Q-Learning with Large Language Model Heuristics | |
Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure | |
Language Modeling Using Tensor Trains | |
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation | |
Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models | |
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis | |
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations | |
Large Language Models (LLMs) as Agents for Augmented Democracy | |
Scaling Laws for Forgetting When Fine-Tuning Large Language Models | |
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence | |
Natural Language Processing RELIES on Linguistics | |
Probing Multimodal LLMs as World Models for Driving | |
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | |
Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | |
A Causal Explainable Guardrails for Large Language Models | |
In-Context Symbolic Regression: Leveraging Language Models for Function Discovery | |
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models | |
Value Augmented Sampling for Language Model Alignment and Personalization | |
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology | |
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models | |
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory | |
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | |
Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages | |
Sub-goal Distillation: A Method to Improve Small Language Agents | |
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning | |
Linearizing Large Language Models | |
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | |
LMD3: Language Model Data Density Dependence | |
State-Free Inference of State-Space Models: The Transfer Function Approach | |
Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance | |
Masked Structural Growth for 2x Faster Language Model Pre-training | |
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | |
A Generalist Learner for Multifaceted Medical Image Interpretation | |
The Platonic Representation Hypothesis | |
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | |
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | |
Zero-Shot Tokenizer Transfer | |
RLHF Workflow: From Reward Modeling to Online RLHF | |
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation | |
SUTRA: Scalable Multilingual Language Model Architecture | |
ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization | |
Large Language Models as Planning Domain Generators | |
Explaining Text Similarity in Transformer Models | |
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | |
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent | |
Exposing Attention Glitches with Flip-Flop Language Modeling | |
CodeT5+: Open Code Large Language Models for Code Understanding and Generation | |
CinePile: A Long Video Question Answering Dataset and Benchmark | |
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | |
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory | |
Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models | |
Understanding the performance gap between online and offline alignment algorithms | |
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models | |
SpeechVerse: A Large-scale Generalizable Audio Language Model | |
Compositional Text-to-Image Generation with Dense Blob Representations | |
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | |
People cannot distinguish GPT-4 from a human in a Turing test | |
LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities | |
What Can Natural Language Processing Do for Peer Review? | |
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | |
Improving Transformers with Dynamically Composable Multi-Head Attention | |
Word2World: Generating Stories and Worlds through Large Language Models | |
Ask Again, Then Fail: Large Language Models' Vacillations in Judgement | |
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models | |
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model | |
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis | |
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | |
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | |
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models | |
Measuring Implicit Bias in Explicitly Unbiased Large Language Models | |
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | |
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | |
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | |
Chameleon: Mixed-Modal Early-Fusion Foundation Models | |
Many-Shot In-Context Learning in Multimodal Foundation Models | |
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | |
LoRA Learns Less and Forgets Less | |
Using ChatGPT for Thematic Analysis | |
Are Large Pre-Trained Language Models Leaking Your Personal Information? | |
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre | |
HMT: Hierarchical Memory Transformer for Long Context Language Processing | |
Air Gap: Protecting Privacy-Conscious Conversational Agents | |
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models | |
LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages | |
MarkLLM: An Open-Source Toolkit for LLM Watermarking | |
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations | |
Towards Uncertainty-Aware Language Agent | |
Observational Scaling Laws and the Predictability of Language Model Performance | |
Layer-Condensed KV Cache for Efficient Inference of Large Language Models | |
Inducing Group Fairness in LLM-Based Decisions | |
CELA: Cost-Efficient Language Model Alignment for CTR Prediction | |
RDRec: Rationale Distillation for LLM-based Recommendation | |
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | |
INDUS: Effective and Efficient Language Models for Scientific Applications | |
Dynamic data sampler for cross-language transfer learning in large language models | |
Grounded 3D-LLM with Referent Tokens | |
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | |
Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining | |
MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization | |
WavCraft: Audio Editing and Generation with Large Language Models | |
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives | |
Transformers learn to implement preconditioned gradient descent for in-context learning | |
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting | |
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | |
Imp: Highly Capable Large Multimodal Models for Mobile Devices | |
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | |
Towards Modular LLMs by Building and Reusing a Library of LoRAs | |
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework | |
Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks | |
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | |
Latent State Estimation Helps UI Agents to Reason | |
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | |
Large Language Models Meet NLP: A Survey | |
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference | |
Blind Baselines Beat Membership Inference Attacks for Foundation Models | |
Your Transformer is Secretly Linear | |
Can AI Relate: Testing Large Language Model Response for Mental Health Support | |
Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! | |
Large Language Models are Biased Reinforcement Learners | |
ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | |
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | |
Keep It Private: Unsupervised Privatization of Online Text | |
Generative AI and Large Language Models for Cyber Security: All Insights You Need | |
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | |
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations | |
Leveraging Reinforcement Learning and Large Language Models for Code Optimization | |
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models | |
Large Language Models Are Not Robust Multiple Choice Selectors | |
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | |
Not All Language Model Features Are Linear | |
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | |
Dense Connector for MLLMs | |
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | |
Bitune: Bidirectional Instruction-Tuning | |
Lessons from the Trenches on Reproducible Evaluation of Language Models | |
Multi-turn Reinforcement Learning from Preference Human Feedback | |
Base of RoPE Bounds Context Length | |
Top-Down Partitioning for Efficient List-Wise Ranking | |
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | |
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token | |
Agent Planning with World Knowledge Model | |
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | |
Distributed Speculative Inference of Large Language Models | |
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations | |
RAGE Against the Machine: Retrieval-Augmented LLM Explanations | |
Efficient Multimodal Large Language Models: A Survey | |
Natural Language Can Help Bridge the Sim2Real Gap | |
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research | |
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models | |
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models | |
Infinite Limits of Multi-head Transformer Dynamics | |
News Recommendation with Category Description by a Large Language Model | |
Evaluation of the Programming Skills of Large Language Models | |
AI-Assisted Assessment of Coding Practices in Modern Code Review | |
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery | |
Super Tiny Language Models | |
RE-Adapt: Reverse Engineered Adaptation of Large Language Models | |
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning | |
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data | |
Instruction Tuning With Loss Over Instructions | |
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation | |
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | |
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach | |
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | |
Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification | |
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | |
SignLLM: Sign Languages Production Large Language Models | |
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training | |
Are Long-LLMs A Necessity For Long-Context Tasks? | |
iVideoGPT: Interactive VideoGPTs are Scalable World Models | |
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition | |
Extracting Prompts by Inverting LLM Outputs | |
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization | |
Aya 23: Open Weight Releases to Further Multilingual Progress | |
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} | |
OLAPH: Improving Factuality in Biomedical Long-form Question Answering | |
Tailoring Vaccine Messaging with Common-Ground Opinions | |
Efficient Adversarial Training in LLMs with Continuous Attacks | |
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings | |
Neural Scaling Laws for Embodied AI | |
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust | |
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub | |
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | |
G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | |
"The Death of Wikipedia?" -- Exploring the Impact of ChatGPT on Wikipedia Engagement | |
Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning | |
Eliciting Latent Knowledge from Quirky Language Models | |
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding | |
Matryoshka Multimodal Models | |
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | |
Transformers Can Do Arithmetic with the Right Embeddings | |
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning | |
An Introduction to Vision-Language Modeling | |
Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models | |
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? | |
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective | |
Zamba: A Compact 7B SSM Hybrid Model | |
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters | |
MoEUT: Mixture-of-Experts Universal Transformers | |
DAGER: Exact Gradient Inversion for Large Language Models | |
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | |
The Impact of Positional Encoding on Length Generalization in Transformers | |
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks | |
Phase Transitions in the Output Distribution of Large Language Models | |
Crafting Interpretable Embeddings by Asking LLMs Questions | |
gzip Predicts Data-dependent Scaling Laws | |
Spectral Editing of Activations for Large Language Model Alignment | |
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment | |
Learning to Reason via Program Generation, Emulation, and Search | |
Hacc-Man: An Arcade Game for Jailbreaking LLMs | |
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval | |
FinTextQA: A Dataset for Long-form Financial Question Answering | |
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks | |
Don't Forget to Connect! Improving RAG with Graph-based Reranking | |
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | |
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | |
Faithful Logical Reasoning via Symbolic Chain-of-Thought | |
2BP: 2-Stage Backpropagation | |
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections | |
Fine-tuning Large Language Models with Sequential Instructions | |
Evaluating the Factual Consistency of Large Language Models Through News Summarization | |
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | |
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs | |
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | |
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | |
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | |
Robust Preference Optimization through Reward Model Distillation | |
Jina CLIP: Your CLIP Model Is Also Your Text Retriever | |
Matryoshka Query Transformer for Large Vision-Language Models | |
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets | |
Offline Regularised Reinforcement Learning for Large Language Models Alignment | |
LLMs achieve adult human performance on higher-order theory of mind tasks | |
On the Role of Attention Masks and LayerNorm in Transformers | |
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | |
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice | |
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization | |
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | |
Xwin-LM: Strong and Scalable Alignment Practice for LLMs | |
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning | |
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts | |
Enhancing Large Vision Language Models with Self-Training on Image Comprehension | |
Preference Learning Algorithms Do Not Learn Preference Rankings | |
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | |
Contextual Position Encoding: Learning to Count What's Important | |
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement | |
Linking In-context Learning in Transformers to Human Episodic Memory | |
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models | |
Bayesian Online Natural Gradient (BONG) | |
Data Augmentation Vision Transformer for Fine-grained Image Classification | |
MotionLLM: Understanding Human Behaviors from Human Motions and Videos | |
Don't drop your samples! Coherence-aware training benefits Conditional diffusion | |
Large Language Models Can Self-Improve At Web Agent Tasks | |
Group Robust Preference Optimization in Reward-free RLHF | |
Evaluating Large Language Model Biases in Persona-Steered Generation | |
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads | |
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable | |
Is In-Context Learning Sufficient for Instruction Following in LLMs? | |
Aligning to Thousands of Preferences via System Message Generalization | |
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories | |
Generating Query Recommendations via LLMs | |
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding | |
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning | |
Position: Foundation Agents as the Paradigm Shift for Decision Making | |
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | |
A Survey on Vision-Language-Action Models for Embodied AI | |
Large Language Models Can Self-Correct with Minimal Effort | |
Language Models with Conformal Factuality Guarantees | |
Prompt Optimization with Human Feedback | |
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction | |
RealitySummary: On-Demand Mixed Reality Document Enhancement using Large Language Models | |
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars | |
Certifiably Robust RAG against Retrieval Corruption | |
Want To Reduce Labeling Cost? GPT-3 Can Help | |
Embedding-Aligned Language Models | |
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code | |
A Survey of Multimodal Large Language Model from A Data-centric Perspective | |
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | |
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | |
CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search | |
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | |
Large Language Models are Zero-Shot Next Location Predictors | |
There and Back Again: The AI Alignment Paradox | |
Expanded Gating Ranges Improve Activation Functions | |
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models | |
The Geometry of Categorical and Hierarchical Concepts in Large Language Models | |
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | |
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought | |
Grokfast: Accelerated Grokking by Amplifying Slow Gradients | |
Stress-Testing Capability Elicitation With Password-Locked Models | |
Knowledge Circuits in Pretrained Transformers | |
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models | |
Learning the Language of Protein Structure | |
Zyda: A 1.3T Dataset for Open Language Modeling | |
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model | |
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark | |
Towards Scalable Automated Alignment of LLMs: A Survey | |
Pretrained Hybrids with MAD Skills | |
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback | |
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling | |
Controlling Large Language Model Agents with Entropic Activation Steering | |
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians | |
Transfer Q Star: Principled Decoding for LLM Alignment | |
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | |
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | |
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | |
To Believe or Not to Believe Your LLM | |
Scalable MatMul-free Language Modeling | |
Meta-Designing Quantum Experiments with Language Models | |
Extended Mind Transformers | |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | |
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback | |
How to Understand Whole Software Repository? | |
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs | |
Automated Focused Feedback Generation for Scientific Writing Assistance | |
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM | |
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning | |
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | |
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms | |
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes | |
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs | |
Item-Language Model for Conversational Recommendation | |
Block Transformer: Global-to-Local Language Modeling for Fast Inference | |
Parrot: Multilingual Visual Instruction Tuning | |
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | |
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | |
A Study of Optimizations for Fine-tuning Large Language Models | |
Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses | |
The Impossibility of Fair LLMs | |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | |
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | |
Are We Done with MMLU? | |
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search | |
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | |
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead | |
Pre-trained Large Language Models Use Fourier Features to Compute Addition | |
CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning | |
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs | |
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | |
DiffUHaul: A Training-Free Method for Object Dragging in Images | |
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model | |
ABodyBuilder3: Improved and scalable antibody structure predictions | |
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models | |
DsDm: Model-Aware Dataset Selection with Datamodels | |
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools | |
Improving Alignment and Robustness with Short Circuiting | |
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | |
Matching Anything by Segmenting Anything | |
What Do Language Models Learn in Context? The Structured Task Hypothesis | |
Scaling and evaluating sparse autoencoders | |
Verbalized Machine Learning: Revisiting Machine Learning with Language Models | |
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller | |
Iteration Head: A Mechanistic Study of Chain-of-Thought | |
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention | |
Does your data spark joy? Performance gains from domain upsampling at the end of training | |
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | |
CRAG -- Comprehensive RAG Benchmark | |
Mixture-of-Agents Enhances Large Language Model Capabilities | |
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach | |
MAIRA-2: Grounded Radiology Report Generation | |
Proofread: Fixes All Errors with One Tap | |
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning | |
Large Language Model Confidence Estimation via Black-Box Access | |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | |
Towards a Personal Health Large Language Model | |
Tx-LLM: A Large Language Model for Therapeutics | |
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization | |
Unified Text-to-Image Generation and Retrieval | |
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers | |
BERTs are Generative In-Context Learners | |
Is Free Self-Alignment Possible? | |
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools | |
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | |
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | |
Creativity Has Left the Chat: The Price of Debiasing Language Models | |
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | |
Can Language Models Serve as Text-Based World Simulators? | |
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad | |
Contrastive learning of T cell receptor representations | |
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching | |
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned | |
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models | |
MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering | |
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models | |
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models | |
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | |
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks | |
On the Reliability of Watermarks for Large Language Models | |
A Survey of Diffusion Models in Natural Language Processing | |
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be | |
Learning to Grow Pretrained Models for Efficient Transformer Training | |
An Image is Worth 32 Tokens for Reconstruction and Generation | |
Simple and Effective Masked Diffusion Language Models | |
Instant 3D Human Avatar Generation using Image Diffusion Models | |
TextGrad: Automatic "Differentiation" via Text | |
Spectrum: Targeted Training on Signal to Noise Ratio | |
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | |
Multimodal Belief Prediction | |
McEval: Massively Multilingual Code Evaluation | |
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | |
Merging Improves Self-Critique Against Jailbreak Attacks | |
Confabulation: The Surprising Value of Large Language Model Hallucinations | |
The Prompt Report: A Systematic Survey of Prompting Techniques | |
Improve Mathematical Reasoning in Language Models by Automated Process Supervision | |
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | |
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | |
Parallelizing Linear Transformers with the Delta Rule over Sequence Length | |
LLM Dataset Inference: Did you train on my dataset? | |
Towards Lifelong Learning of Large Language Models: A Survey | |
PowerInfer-2: Fast Large Language Model Inference on a Smartphone | |
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages | |
Attention as a Hypernetwork | |
ConStat: Performance-Based Contamination Detection in Large Language Models | |
What If We Recaption Billions of Web Images with LLaMA-3? | |
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | |
Discovering Preference Optimization Algorithms with and for Large Language Models | |
Large Language Models Must Be Taught to Know What They Don't Know | |
An Empirical Study of Mamba-based Language Models | |
Collective Constitutional AI: Aligning a Language Model with Public Input | |
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | |
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | |
Explore the Limits of Omni-modal Pretraining at Scale | |
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition | |
Large Language Model Unlearning via Embedding-Corrupted Prompts | |
Grounding Multimodal Large Language Models in Actions | |
BertaQA: How Much Do Language Models Know About Local Culture? | |
VCR: Visual Caption Restoration | |
Hibou: A Family of Foundational Vision Transformers for Pathology | |
Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe | |
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost | |
Improving Retrieval for RAG based Question Answering Models on Financial Documents | |
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey | |
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding | |
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | |
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | |
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations | |
Transformers meet Neural Algorithmic Reasoners | |
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding | |
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | |
OpenVLA: An Open-Source Vision-Language-Action Model | |
ReMI: A Dataset for Reasoning with Multiple Images | |
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning | |
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts | |
Investigating the translation capabilities of Large Language Models trained on parallel data only | |
Multi-Agent Software Development through Cross-Team Collaboration | |
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation | |
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus | |
UnO: Unsupervised Occupancy Fields for Perception and Forecasting | |
HelpSteer2: Open-source dataset for training top-performing reward models | |
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs | |
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus | |
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery | |
Real2Code: Reconstruct Articulated Objects via Code Generation | |
DafnyBench: A Benchmark for Formal Software Verification | |
Estimating the Hallucination Rate of Generative AI | |
RWKV-CLIP: A Robust Vision-Language Representation Learner | |
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | |
Text Embeddings by Weakly-Supervised Contrastive Pre-training | |
Promptagator: Few-shot Dense Retrieval From 8 Examples | |
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder | |
InPars: Data Augmentation for Information Retrieval using Large Language Models | |
Reconciling Kaplan and Chinchilla Scaling Laws | |
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation | |
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval | |
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals | |
Cycles of Thought: Measuring LLM Confidence through Stable Explanations | |
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation | |
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models | |
Are you still on track!? Catching LLM Task Drift with Activations | |
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | |
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback | |
Quantifying Variance in Evaluation Benchmarks | |
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | |
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | |
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | |
Evaluation of Large Language Models: STEM education and Gender Stereotypes | |
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation | |
Mixture-of-Subspaces in Low-Rank Adaptation | |
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation | |
CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions | |
GEB-1.3B: Open Lightweight Large Language Model | |
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting | |
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery | |
Large language model validity via enhanced conformal prediction methods | |
Decoding the Diversity: A Review of the Indic AI Research Landscape | |
Advancing High Resolution Vision-Language Models in Biomedicine | |
Bayesian Statistical Modeling with Predictors from LLMs | |
Self-Supervised Speech Representations are More Phonetic than Semantic | |
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | |
Needle In A Multimodal Haystack | |
mDPO: Conditional Preference Optimization for Multimodal Large Language Models | |
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | |
DataComp-LM: In search of the next generation of training sets for language models | |
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem | |
The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models | |
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models | |
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning | |
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | |
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | |
Language Modeling with Editable External Knowledge | |
WPO: Enhancing RLHF with Weighted Preference Optimization | |
VideoLLM-online: Online Video Large Language Model for Streaming Video | |
How Do Large Language Models Acquire Factual Knowledge During Pretraining? | |
Task Me Anything | |
Refusal in Language Models Is Mediated by a Single Direction | |
DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models | |
Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis | |
GUICourse: From General Vision Language Models to Versatile GUI Agents | |
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | |
In-Context Editing: Learning Knowledge from Self-Induced Distributions | |
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | |
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation | |
Breaking the Attention Bottleneck | |
STAR: SocioTechnical Approach to Red Teaming Language Models | |
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | |
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies | |
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training | |
AudioPaLM: A Large Language Model That Can Speak and Listen | |
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | |
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation | |
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators | |
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation | |
Full Parameter Fine-tuning for Large Language Models with Limited Resources | |
Improving Multi-Agent Debate with Sparse Communication Topology | |
Meta Reasoning for Large Language Models | |
A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression | |
Unifying Multimodal Retrieval via Document Screenshot Embedding | |
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | |
Deep Bayesian Active Learning for Preference Modeling in Large Language Models | |
OLMES: A Standard for Language Model Evaluations | |
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement | |
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning | |
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries | |
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | |
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | |
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages | |
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models | |
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | |
Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech | |
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models | |
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning | |
VoCo-LLaMA: Towards Vision Compression with Large Language Models | |
TroL: Traversal of Layers for Large Language and Vision Models | |
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM | |
Statistical Uncertainty in Word Embeddings: GloVe-V | |
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | |
Large Scale Transfer Learning for Tabular Data via Language Modeling | |
Transcoders Find Interpretable LLM Feature Circuits | |
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | |
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | |
Tokenization Falling Short: The Curse of Tokenization | |
Can LLM be a Personalized Judge? | |
NAST: Noise Aware Speech Tokenization for Speech Language Models | |
Bootstrapping Language Models with DPO Implicit Rewards | |
The Impact of Initialization on LoRA Finetuning Dynamics | |
StatBot.Swiss: Bilingual Open Data Exploration in Natural Language | |
Adversarial Attacks on Multimodal Agents | |
Estimating Knowledge in Large Language Models Without Generating a Single Token | |
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning | |
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models | |
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways | |
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | |
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges | |
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations | |
Long Code Arena: a Set of Benchmarks for Long-Context Code Models | |
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization | |
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs | |
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | |
Instruction Pre-Training: Language Models are Supervised Multitask Learners | |
LLMatDesign: Autonomous Materials Discovery with Large Language Models | |
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? | |
AgentReview: Exploring Peer Review Dynamics with LLM Agents | |
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains | |
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts | |
Are LLMs Naturally Good at Synthetic Tabular Data Generation? | |
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning | |
Measuring memorization in RLHF for code completion | |
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces | |
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | |
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance | |
garak: A Framework for Security Probing Large Language Models | |
Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities | |
GenQA: Generating Millions of Instructions from a Handful of Prompts | |
Transferring Knowledge from Large Foundation Models to Small Downstream Models | |
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | |
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | |
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | |
DeciMamba: Exploring the Length Extrapolation Potential of Mamba | |
Evidence of a log scaling law for political persuasion with large language models | |
LiveMind: Low-latency Large Language Models with Simultaneous Inference | |
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | |
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation | |
Improving Visual Commonsense in Language Models via Multiple Image Generation | |
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? | |
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | |
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers | |
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level | |
HARE: HumAn pRiors, a key to small language model Efficiency | |
Delving into ChatGPT usage in academic writing through excess vocabulary | |
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems | |
Interpretability of Language Models via Task Spaces | |
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right | |
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | |
CodeRAG-Bench: Can Retrieval Augment Code Generation? | |
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models | |
Large Language Models are Null-Shot Learners | |
SGLang: Efficient Execution of Structured Language Model Programs | |
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs | |
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment | |
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation | |
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | |
How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions | |
Learning to Retrieve Iteratively for In-Context Learning | |
Jailbreaking as a Reward Misspecification Problem | |
Information Guided Regularization for Fine-tuning Language Models | |
Unlocking the Global Synergies in Low-Rank Adapters | |
Towards Retrieval Augmented Generation over Large Video Libraries | |
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection | |
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | |
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation | |
Exploring Design Choices for Building Language-Specific LLMs | |
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights | |
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold | |
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | |
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task | |
Data Contamination Can Cross Language Barriers | |
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges | |
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | |
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | |
Probing the Decision Boundaries of In-context Learning in Large Language Models | |
CancerLLM: A Large Language Model in Cancer Domain | |
CarLLaVA: Vision language models for camera-only closed-loop driving | |
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs | |
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | |
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | |
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models | |
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | |
OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants | |
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | |
Long Context Transfer from Language to Vision | |
Efficient Continual Pre-training by Mitigating the Stability Gap | |
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs | |
Sparse High Rank Adapters | |
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | |
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | |
WARP: On the Benefits of Weight Averaged Rewarded Policies | |
Scaling Laws for Linear Complexity Language Models | |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | |
Preference Tuning For Toxicity Mitigation Generalizes Across Languages | |
FIRST: Faster Improved Listwise Reranking with Single Token Decoding | |
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context | |
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers | |
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | |
Confidence Regulation Neurons in Language Models | |
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization | |
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs | |
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models | |
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | |
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics | |
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations | |
Hallucination is Inevitable: An Innate Limitation of Large Language Models | |
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | |
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | |
Steering Without Side Effects: Improving Post-Deployment Control of Language Models | |
Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention | |
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | |
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data | |
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate | |
PostMark: A Robust Blackbox Watermark for Large Language Models | |
Can LLMs Learn Macroeconomic Narratives from Social Media? | |
Embodied Instruction Following in Unknown Environments | |
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | |
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon | |
Data curation via joint example selection further accelerates multimodal learning | |
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | |
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | |
LongIns: A Challenging Long-context Instruction-based Exam for LLMs | |
Multi-property Steering of Large Language Models with Dynamic Activation Composition | |
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | |
Benchmarking Mental State Representations in Language Models | |
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | |
Delving into the Utilisation of ChatGPT in Scientific Publications in Astronomy | |
How to Compute the Probability of a Word | |
Unlocking Continual Learning Abilities in Language Models | |
Large Language Models Assume People are More Rational than We Really are | |
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | |
Finding Transformer Circuits with Edge Pruning | |
A mathematical perspective on Transformers | |
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation | |
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt | |
LLMs' Classification Performance is Overclaimed | |
Cross-Modality Safety Alignment | |
Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology | |
Preference Distillation for Personalized Generative Recommendation | |
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents | |
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG | |
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game | |
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving | |
Associative Recurrent Memory Transformer | |
Symbolic Learning Enables Self-Evolving Agents | |
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | |
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | |
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs | |
From Rewriting to Remembering: Common Ground for Conversational QA Models | |
Adversarial Search Engine Optimization for Large Language Models | |
A Closer Look into Mixture-of-Experts in Large Language Models | |
Multimodal foundation world models for generalist embodied agents | |
Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias | |
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | |
Efficacy of Language Model Self-Play in Non-Zero-Sum Games | |
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | |
Large Language Models are Interpretable Learners | |
Are Language Models Actually Useful for Time Series Forecasting? | |
CAVE: Controllable Authorship Verification Explanations | |
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers | |
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records | |
One Thousand and One Pairs: A "novel" challenge for long-context language models | |
Breaking the Frame: Image Retrieval by Visual Overlap Prediction | |
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans | |
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | |
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | |
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms | |
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation | |
A Benchmark for Learning to Translate a New Language from One Grammar Book | |
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | |
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding | |
Aligning Teacher with Student Preferences for Tailored Training Data Generation | |
Simulating Classroom Education with LLM-Empowered Agents | |
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation | |
Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models | |
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | |
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | |
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | |
Can LLMs Learn by Teaching? A Preliminary Study | |
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models | |
Is Programming by Example solved by LLMs? | |
Suri: Multi-constraint Instruction Following for Long-form Text Generation | |
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? | |
LiveBench: A Challenging, Contamination-Free LLM Benchmark | |
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | |
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation | |
Revealing Fine-Grained Values and Opinions in Large Language Models | |
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings | |
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models | |
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data | |
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | |
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models | |
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | |
News Deja Vu: Connecting Past and Present with Semantic Search | |
Contrastive Entity Coreference and Disambiguation for Historical Texts | |
SAIL: Self-Improving Efficient Online Alignment of Large Language Models | |
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | |
Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets | |
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models | |
Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models | |
Self-Retrieval: Building an Information Retrieval System with One Large Language Model | |
Cognitive Architectures for Language Agents | |
Adaptable Logical Control for Large Language Models | |
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More | |
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | |
A Critical Study of What Code-LLMs (Do Not) Learn | |
"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | |
Efficient Evolutionary Search Over Chemical Space with Large Language Models | |
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | |
Understanding and Mitigating Language Confusion in LLMs | |
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | |
Scaling Synthetic Data Creation with 1,000,000,000 Personas | |
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning | |
The Remarkable Robustness of LLMs: Stages of Inference? | |
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | |
Following Length Constraints in Instructions | |
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation | |
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs | |
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | |
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | |
Direct Preference Knowledge Distillation for Large Language Models | |
Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning | |
Monitoring Latent World States in Language Models with Propositional Probes | |
RouteLLM: Learning to Route LLMs with Preference Data | |
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks | |
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph | |
RaTEScore: A Metric for Radiology Report Generation | |
PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks | |
Flora: Low-Rank Adapters Are Secretly Gradient Compressors | |
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | |
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation | |
Scaling Laws for Fact Memorization of Large Language Models | |
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data | |
RegMix: Data Mixture as Regression for Language Model Pre-training | |
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives | |
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging | |
ColPali: Efficient Document Retrieval with Vision Language Models | |
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion | |
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems | |
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | |
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER | |
MIRAI: Evaluating LLM Agents for Event Forecasting | |
Searching for Best Practices in Retrieval-Augmented Generation | |
$\text{Memory}^3$: Language Modeling with Explicit Memory | |
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation | |
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation | |
M2QA: Multi-domain Multilingual Question Answering | |
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | |
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs | |
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning | |
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | |
Brevity is the soul of wit: Pruning long files for code generation | |
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention | |
From RAG to RICHES: Retrieval Interlaced with Sequence Generation | |
LiteSearch: Efficacious Tree Search for LLM | |
Detection and Measurement of Syntactic Templates in Generated Text | |
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | |
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | |
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models | |
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI | |
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | |
Compressing Search with Language Models | |
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization | |
ProgressGym: Alignment with a Millennium of Moral Progress | |
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models | |
Changing Answer Order Can Decrease MMLU Accuracy | |
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | |
Understanding Alignment in Multimodal LLMs: A Comprehensive Study | |
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions | |
Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions | |
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models | |
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding | |
A Review of Large Language Models and Autonomous Agents in Chemistry | |
Agentless: Demystifying LLM-based Software Engineering Agents | |
Eliminating Position Bias of Language Models: A Mechanistic Approach | |
Resolving Discrepancies in Compute-Optimal Scaling of Language Models | |
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset | |
FLoRA: Low-Rank Core Space for N-dimension | |
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | |
TokenPacker: Efficient Visual Projector for Multimodal LLM | |
Investigating Decoder-only Large Language Models for Speech-to-text Translation | |
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models | |
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | |
Evaluating Human Alignment and Model Faithfulness of LLM Rationale | |
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists | |
On the Limitations of Fine-tuned Judge Models for LLM Evaluation | |
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment | |
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | |
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | |
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning | |
Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media | |
A Solvable Model of Neural Scaling Laws | |
Hopfield Networks is All You Need | |
Improving Transformer Models by Reordering their Sublayers | |
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses | |
Prompt Stability Scoring for Text Annotation with Large Language Models | |
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application | |
AI-native Memory: A Pathway from LLMs Towards AGI | |
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations | |
From Efficient Multimodal Models to World Models: A Survey | |
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments | |
LLMs can learn self-restraint through iterative self-reflection | |
ReGround: Improving Textual and Spatial Grounding at No Cost | |
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria | |
Large language models can accurately predict searcher preferences | |
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering | |
Large Language Models Enable Few-Shot Clustering | |
LM vs LM: Detecting Factual Errors via Cross Examination | |
Perspectives on Large Language Models for Relevance Judgment | |
Human-like Summarization Evaluation with ChatGPT | |
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization | |
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs | |
How Does Quantization Affect Multilingual LLMs? | |
Are Large Language Models Consistent over Value-laden Questions? | |
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs | |
Tree Search for Language Model Agents | |
Towards Compositionality in Concept Learning | |
Unified Auto-Encoding with Masked Diffusion | |
GraphEdit: Large Language Models for Graph Structure Learning | |
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization | |
LLM-Select: Feature Selection with Large Language Models | |
Improving Reward Models with Synthetic Critiques | |
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models | |
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models | |
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | |
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | |
On scalable oversight with weak LLMs judging strong LLMs | |
Fast Forwarding Low-Rank Training | |
Learning to (Learn at Test Time): RNNs with Expressive Hidden States | |
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models | |
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | |
Mixture of A Million Experts | |
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | |
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | |
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs | |
Anthropocentric bias and the possibility of artificial cognition | |
AgentInstruct: Toward Generative Teaching with Agentic Flows | |
HEMM: Holistic Evaluation of Multimodal Foundation Models | |
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks | |
52B to 1T: Lessons Learned via Tele-FLM Series | |
Reasoning in Large Language Models: A Geometric Perspective | |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | |
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | |
Synthetic Multimodal Question Generation | |
Unveiling Encoder-Free Vision-Language Models | |
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens | |
Distilling System 2 into System 1 | |
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages | |
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models | |
Granular Privacy Control for Geolocation with Vision Language Models | |
VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | |
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval | |
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course | |
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | |
Multi-Object Hallucination in Vision-Language Models | |
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | |
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | |
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty | |
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System | |
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models | |
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct | |
LLMBox: A Comprehensive Library for Large Language Models | |
Training Task Experts through Retrieval Based Distillation | |
Language Models Encode Collaborative Signals in Recommendation | |
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | |
When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions | |
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking | |
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction | |
MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation | |
Machine Unlearning Fails to Remove Data Poisoning Attacks | |
BeHonest: Benchmarking Honesty in Large Language Models | |
Emu: Generative Pretraining in Multimodality | |
Enabling Large Language Models to Generate Text with Citations | |
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | |
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | |
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | |
Vision language models are blind | |
Composable Interventions for Language Models | |
A Single Transformer for Scalable Vision-Language Modeling | |
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension | |
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs | |
Decoding-Time Language Model Alignment with Multiple Objectives | |
WebCanvas: Benchmarking Web Agents in Online Environments | |
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | |
Visual representations in the human brain are aligned with large language models | |
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | |
RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension | |
Inference Performance Optimization for Large Language Models on CPUs | |
LETS-C: Leveraging Language Embedding for Time Series Classification | |
Just read twice: closing the recall gap for recurrent language models | |
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions | |
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | |
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging | |
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling | |
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations | |
Forcing Diffuse Distributions out of Language Models | |
Evaluating LLMs at Detecting Errors in LLM Responses | |
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models | |
R-Tuning: Instructing Large Language Models to Say `I Don't Know' | |
Label Supervised LLaMA Finetuning | |
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models | |
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | |
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses | |
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | |
Review-LLM: Harnessing Large Language Models for Personalized Review Generation | |
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study | |
MAVIS: Mathematical Visual Instruction Tuning | |
Automata-based constraints for language model decoding | |
GTA: A Benchmark for General Tool Agents | |
SEED-Story: Multimodal Long Story Generation with Large Language Model | |
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective | |
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | |
PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | |
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception | |
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients | |
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | |
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models | |
Genomic Language Models: Opportunities and Challenges | |
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | |
Self-Recognition in Language Models | |
Deconstructing What Makes a Good Optimizer for Language Models | |
Teaching Transformers Causal Reasoning through Axiomatic Training | |
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) | |
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation | |
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context | |
Why are Visually-Grounded Language Models Bad at Image Classification? | |
LoQT: Low Rank Adapters for Quantized Training | |
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | |
Lynx: An Open Source Hallucination Evaluation Model | |
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | |
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models | |
Human-like Episodic Memory for Infinite Context LLMs | |
MUSCLE: A Model Update Strategy for Compatible LLM Evolution | |
H2O-Danube3 Technical Report | |
Context Embeddings for Efficient Answer Generation in RAG | |
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | |
RoboMorph: Evolving Robot Morphology using Large Language Models | |
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | |
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | |
New Desiderata for Direct Preference Optimization | |
Characterizing Prompt Compression Methods for Long Context Inference | |
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency | |
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing | |
MUSE: Machine Unlearning Six-Way Evaluation for Language Models | |
Accuracy is Not All You Need | |
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models | |
Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs | |
Universal Neurons in GPT2 Language Models | |
Agent Instructs Large Language Models to be General Zero-Shot Reasoners | |
Qwen2 Technical Report | |
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism | |
LAB-Bench: Measuring Capabilities of Language Models for Biology Research | |
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | |
MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | |
Representing Rule-based Chatbots with Transformers | |
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs | |
Benchmarking Language Model Creativity: A Case Study on Code Generation | |
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules | |
Spontaneous Reward Hacking in Iterative Self-Refinement | |
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients | |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | |
LLM Circuit Analyses Are Consistent Across Training and Scale | |
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation | |
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | |
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development | |
Fast Matrix Multiplications for Lookup Table-Quantized LLMs | |
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | |
Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce | |
When is the consistent prediction likely to be a correct prediction? | |
Transformer tricks: Removing weights for skipless transformers | |
Transformers represent belief state geometry in their residual stream | |
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | |
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | |
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models | |
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment | |
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | |
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces | |
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | |
A Survey on LoRA of Large Language Models | |
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | |
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | |
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | |
Patch-Level Training for Large Language Models | |
E5-V: Universal Embeddings with Multimodal Large Language Models | |
Case2Code: Learning Inductive Reasoning with Synthetic Data | |
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | |
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models | |
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections | |
The Art of Saying No: Contextual Noncompliance in Language Models | |
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | |
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention | |
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | |
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression | |
Practical Unlearning for Large Language Models | |
Does Refusal Training in LLMs Generalize to the Past Tense? | |
Automatic Prompt Optimization with "Gradient Descent" and Beam Search | |
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | |
CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization | |
Understanding Reference Policies in Direct Preference Optimization | |
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation | |
PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks | |
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore | |
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study | |
Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation | |
Weak-to-Strong Reasoning | |
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation | |
Benchmarking Vision Language Models for Cultural Understanding | |
DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | |
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach | |
A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks | |
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems | |
Scaling Granite Code Models to 128K Context | |
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval | |
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models | |
Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers | |
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | |
Lean-STaR: Learning to Interleave Thinking and Proving | |
GAVEL: Generating Games Via Evolution and Language Models | |
Transformer Layers as Painters | |
AUITestAgent: Automatic Requirements Oriented GUI Function Testing | |
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | |
Training on the Test Task Confounds Evaluation and Emergence | |
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing | |
PaliGemma: A versatile 3B VLM for transfer | |
A Survey on Mixture of Experts | |
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning | |
Consent in Crisis: The Rapid Decline of the AI Data Commons | |
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities | |
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | |
The Vision of Autonomic Computing: Can LLMs Make It a Reality? | |
EVLM: An Efficient Vision-Language Model for Visual Understanding | |
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | |
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle | |
Internal Consistency and Self-Feedback in Large Language Models: A Survey | |
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition | |
SciCode: A Research Coding Benchmark Curated by Scientists | |
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | |
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning | |
VideoGameBunny: Towards vision assistants for video games | |
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization | |
NNsight and NDIF: Democratizing Access to Foundation Model Internals | |
Fractal Patterns May Illuminate the Success of Next-Token Prediction | |
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | |
NV-Retriever: Improving text embedding models with effective hard-negative mining | |
Efficient Retrieval with Learned Similarities | |
Knowledge Mechanisms in Large Language Models: A Survey and Perspective | |
Gated Linear Attention Transformers with Hardware-Efficient Training | |
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM | |
Discrete Flow Matching | |
MIBench: Evaluating Multimodal Large Language Models over Multiple Images | |
BOND: Aligning LLMs with Best-of-N Distillation | |
Foundational Models Defining a New Era in Vision: A Survey and Outlook | |
Shared Imagination: LLMs Hallucinate Alike | |
Aligning Large Language Models with Human: A Survey | |
Compact Language Models via Pruning and Knowledge Distillation | |
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization | |
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | |
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings | |
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | |
To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability | |
Demystifying Chains, Trees, and Graphs of Thoughts | |
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | |
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation | |
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing | |
Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing | |
PERSONA: A Reproducible Testbed for Pluralistic Alignment | |
Scalify: scale propagation for efficient low-precision LLM training | |
Reinforced Prompt Personalization for Recommendation with Large Language Models | |
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents | |
DDK: Distilling Domain Knowledge for Efficient Large Language Models | |
Course-Correction: Safety Alignment Using Synthetic Preferences | |
Longhorn: State Space Models are Amortized Online Learners | |
u-$μ$P: The Unit-Scaled Maximal Update Parametrization | |
Recursive Introspection: Teaching Language Model Agents How to Self-Improve | |
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption | |
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? | |
Fluent Student-Teacher Redteaming | |
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | |
Efficient Inference of Vision Instruction-Following Models with Elastic Cache | |
Very Large-Scale Multi-Agent Simulation in AgentScope | |
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents | |
$VILA^2$: VILA Augmented VILA | |
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | |
Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach | |
Visual Haystacks: Answering Harder Questions About Sets of Images | |
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | |
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach | |
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | |
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts | |
Prover-Verifier Games improve legibility of LLM outputs | |
Exploring Advanced Large Language Models with LLMsuite | |
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques | |
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities | |
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond | |
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | |
RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation | |
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering | |
The Art of Refusal: A Survey of Abstention in Large Language Models | |
SALMON: Self-Alignment with Instructable Reward Models | |
Small Molecule Optimization with Large Language Models | |
Generation Constraint Scaling Can Mitigate Hallucination | |
A Survey on Employing Large Language Models for Text-to-SQL Tasks | |
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement | |
Prompt Injection Attacks on Large Language Models in Oncology | |
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | |
Theia: Distilling Diverse Vision Foundation Models for Robot Learning | |
Diffusion Feedback Helps CLIP See Better | |
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models | |
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | |
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages | |
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge | |
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain | |
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models | |
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification | |
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains | |
PersonaGym: Evaluating Persona Agents and LLMs | |
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training | |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? | |
Transformers need glasses! Information over-squashing in language tasks | |
ThinK: Thinner Key Cache by Query-Driven Pruning | |
Meltemi: The first open Large Language Model for Greek | |
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework | |
Machine Unlearning in Generative AI: A Survey | |
A Large Encoder-Decoder Family of Foundation Models For Chemical Language | |
AI-Assisted Generation of Difficult Math Questions | |
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions | |
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | |
Demystifying Verbatim Memorization in Large Language Models | |
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs | |
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | |
The Llama 3 Herd of Models | |
ShieldGemma: Generative AI Content Moderation Based on Gemma | |
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts | |
Adaptive Retrieval-Augmented Generation for Conversational Systems | |
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | |
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems | |
Latxa: An Open Language Model and Evaluation Suite for Basque | |
Improving Retrieval Augmented Language Model with Self-Reasoning | |
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey | |
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? | |
Data Contamination Report from the 2024 CONDA Shared Task | |
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | |
Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | |
Are LLMs classical or nonmonotonic reasoners? Lessons from generics | |
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | |
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | |
Tamper-Resistant Safeguards for Open-Weight LLMs | |
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | |
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | |
OmniParser for Pure Vision Based GUI Agent | |
Finch: Prompt-guided Key-Value Cache Compression | |
Gemma 2: Improving Open Language Models at a Practical Size | |
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs | |
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models | |
An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | |
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning | |
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | |
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs | |
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations | |
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs | |
Apple Intelligence Foundation Language Models | |
Multi-group Uncertainty Quantification for Long-form Text Generation | |
MaskInversion: Localized Embeddings via Optimization of Explainability Maps | |
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | |
Transformers are Universal In-context Learners | |
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | |
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation | |
Leveraging LLM Reasoning Enhances Personalized Recommender Systems | |
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | |
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | |
A Survey of Mamba | |
Jailbreaking Text-to-Image Models with LLM-Based Agents | |
Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins | |
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training | |
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks | |
Generative Retrieval with Preference Optimization for E-commerce Search | |
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | |
Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | |
GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering | |
Crafting the Path: Robust Query Rewriting for Information Retrieval | |
Harnessing Large Language Models for Multimodal Product Bundling | |
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | |
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era | |
Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems | |
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting | |
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models | |
Vortex under Ripplet: An Empirical Study of RAG-enabled Applications | |
MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models | |
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | |
Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | |
AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment | |
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation | |
Retrieval-augmented generation in multilingual settings | |
Optimization of Retrieval-Augmented Generation Context with Outlier Detection | |
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models | |
Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification | |
LumberChunker: Long-Form Narrative Document Segmentation | |
Entropy-Based Decoding for Retrieval-Augmented Large Language Models | |
Improving Zero-shot LLM Re-Ranker with Risk Minimization | |
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens | |
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search | |
Retrieval Augmented Zero-Shot Text Classification | |
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking | |
StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | |
PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval | |
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation | |
Unified Active Retrieval for Retrieval Augmented Generation | |
LLM-enhanced Reranking in Recommender Systems | |
Intermediate Distillation: Data-Efficient Distillation from Black-Box LLMs for Information Retrieval | |
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG | |
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs | |
Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | |
A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks | |
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling | |
Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search | |
Async Learned User Embeddings for Ads Delivery Optimization | |
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents | |
RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation | |
MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model | |
Evaluating the External and Parametric Knowledge Fusion of Large Language Models | |
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | |
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers | |
RAG Does Not Work for Enterprises | |
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models | |
Voice Jailbreak Attacks Against GPT-4o | |
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control | |
DeeperImpact: Optimizing Sparse Learned Index Structures | |
Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning | |
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration | |
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation | |
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference | |
RaFe: Ranking Feedback Improves Query Rewriting for RAG | |
Question-Based Retrieval using Atomic Units for Enterprise RAG | |
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation | |
Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy | |
Redefining Information Retrieval of Structured Database via Large Language Models | |
Contextualization with SPLADE for High Recall Retrieval | |
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning | |
Comparative Analysis of Retrieval Systems in the Real World | |
Semi-Parametric Retrieval via Binary Token Index | |
Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations | |
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model | |
Retrieval-Oriented Knowledge for Click-Through Rate Prediction | |
Leveraging Large Language Models for Multimodal Search | |
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | |
From Matching to Generation: A Survey on Generative Information Retrieval | |
Retrieval Augmented Generation for Domain-specific Question Answering | |
Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding | |
Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering | |
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models | |
Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL | |
Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers | |
Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing | |
Recall-Augmented Ranking: Enhancing Click-Through Rate Prediction Accuracy with Cross-Stage Data | |
The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation | |
Efficient Prompting Methods for Large Language Models: A Survey | |
Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models | |
PMG : Personalized Multimodal Generation with Large Language Models | |
RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm | |
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search | |
Generative Information Retrieval Evaluation | |
End-to-end training of Multimodal Model and ranking Model | |
Event-enhanced Retrieval in Real-time Search | |
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation | |
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models | |
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | |
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | |
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts | |
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models | |
Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation | |
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems | |
Shallow Cross-Encoders for Low-Latency Retrieval | |
Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | |
Generate then Retrieve: Conversational Response Retrieval Using LLMs as Answer and Query Generators | |
Are Large Language Models Good at Utility Judgments? | |
SelfIE: Self-Interpretation of Large Language Model Embeddings | |
Make Large Language Model a Better Ranker | |
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check | |
CoLLEGe: Concept Embedding Generation for Large Language Models | |
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation | |
JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning | |
Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning | |
Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases | |
Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems | |
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | |
ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval | |
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems | |
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design | |
Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners | |
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems | |
An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce | |
LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction | |
Embedding-based search in JetBrains IDEs | |
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records | |
Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges | |
ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework | |
Meta-Task Prompting Elicits Embeddings from Large Language Models | |
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA | |
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation | |
Corpus-Steered Query Expansion with Large Language Models | |
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering | |
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) | |
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning | |
ESE: Espresso Sentence Embeddings | |
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | |
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions | |
Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models | |
Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity | |
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge | |
ARKS: Active Retrieval in Knowledge Soup for Code Generation | |
Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models | |
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence | |
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs | |
TriSampler: A Better Negative Sampling Principle for Dense Retrieval | |
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models | |
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models | |
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning | |
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers | |
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering | |
T-RAG: Lessons from the LLM Trenches | |
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models | |
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models | |
Non-autoregressive Generative Models for Reranking Recommendation | |
History, Development, and Principles of Large Language Models-An Introductory Survey | |
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback | |
Leveraging LLMs for Unsupervised Dense Retriever Ranking | |
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation | |
Retrieve to Explain: Evidence-driven Predictions with Language Models | |
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models | |
Locally-Adaptive Quantization for Streaming Vector Search | |
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA | |
When Large Language Models Meet Vector Databases: A Survey | |
Data-efficient Fine-tuning for LLM-based Recommendation | |
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models | |
Re3val: Reinforced and Reranked Generative Retrieval | |
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models | |
Generative Dense Retrieval: Memory Can Be a Burden | |
The Chronicles of RAG: The Retriever, the Chunk and the Generator | |
Curator: Efficient Indexing for Multi-Tenant Vector Databases | |
Bridging the Preference Gap between Retrievers and LLMs | |
InRanker: Distilled Rankers for Zero-shot Information Retrieval | |
Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis | |
ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback | |
Unsupervised hard Negative Augmentation for contrastive learning | |
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models | |
RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation | |
Large Language Models are Not Stable Recommender Systems | |
ESPN: Memory-Efficient Multi-Vector Information Retrieval | |
Unlocking the Potential of Large Language Models for Explainable Recommendations | |
Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems | |
Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM | |
Dense X Retrieval: What Retrieval Granularity Should We Use? | |
End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene | |
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions | |
ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation | |
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models | |
Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base | |
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders | |
On Retrieval Augmentation and the Limitations of Language Model Training | |
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems | |
Text Retrieval with Multi-Stage Re-Ranking Models | |
LLatrieval: LLM-Verified Retrieval for Verifiable Generation | |
CoverBench: A Challenging Benchmark for Complex Claim Verification | |
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion | |
Exploring Fine-tuning ChatGPT for News Recommendation | |
Self-Taught Evaluators | |
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | |
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | |
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | |
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models | |
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | |
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | |
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads | |
Language Model Can Listen While Speaking | |
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | |
Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping | |
MiniCPM-V: A GPT-4V Level MLLM on Your Phone | |
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | |
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models | |
Can LLMs predict the convergence of Stochastic Gradient Descent? | |
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | |
LLaVA-OneVision: Easy Visual Task Transfer | |
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design | |
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | |
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs | |
A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search | |
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering | |
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | |
Generative Retrieval with Few-shot Indexing | |
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval | |
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills | |
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer | |
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | |
Synthesizing Text-to-SQL Data from Weak and Strong LLMs | |
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models | |
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | |
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time | |
Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data | |
EXAONE 3.0 7.8B Instruction Tuned Language Model | |
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access | |
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | |
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models | |
Learning Task Decomposition to Assist Humans in Competitive Programming | |
Better Alignment with Instruction Back-and-Forth Translation | |
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | |
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers | |
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection | |
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation | |
ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning | |
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | |
Diffusion Guided Language Modeling | |
Conversational Prompt Engineering | |
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP | |
Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations | |
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning | |
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | |
Pairwise Judgment Formulation for Semantic Embedding Model in Web Search | |
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | |
Interpreting Attention Layer Outputs with Sparse Autoencoders | |
Fine-tuning language models to find agreement among humans with diverse preferences | |
VITA: Towards Open-Source Interactive Omni Multimodal LLM | |
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | |
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | |
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks | |
Early Exit Strategies for Approximate k-NN Search in Dense Retrieval | |
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction | |
Relevance Filtering for Embedding-based Retrieval | |
OpenResearcher: Unleashing AI for Accelerated Scientific Research | |
Enhancing Relevance of Embedding-based Retrieval at Walmart | |
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models | |
Natural Language Outlines for Code: Literate Programming in the LLM Era | |
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | |
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective | |
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | |
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | |
Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | |
PhysBERT: A Text Embedding Model for Physics Scientific Literature | |
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | |
Med42-v2: A Suite of Clinical LLMs | |
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers | |
PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting | |
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | |
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation | |
Layerwise Recurrent Router for Mixture-of-Experts | |
Prompt Tuning as User Inherent Profile Inference Machine | |
Large Language Model Agent in Financial Trading: A Survey | |
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models | |
Design Proteins Using Large Language Models: Enhancements and Comparative Analyses | |
Hermes 3 Technical Report | |
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | |
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | |
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM | |
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning | |
Aquila2 Technical Report | |
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | |
Hierarchical Structured Neural Network for Retrieval | |
BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search | |
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | |
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs | |
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | |
Can Large Language Models Understand Symbolic Graphics Programs? | |
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | |
DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | |
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval | |
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability | |
Post-Training Sparse Attention with Double Sparsity | |
Large language models can be zero-shot anomaly detectors for time series? | |
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | |
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm | |
FuseChat: Knowledge Fusion of Chat Models | |
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | |
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | |
NL2OR: Solve Complex Operations Research Problems Using Natural Language Inputs | |
Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models | |
Min P Sampling: Balancing Creativity and Coherence at High Temperature | |
LLM Stability: A detailed analysis with some surprises | |
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | |
A Survey on Benchmarks of Multimodal Large Language Models | |
Where is the signal in tokenization space? | |
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations | |
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering | |
Cropper: Vision-Language Model for Image Cropping through In-Context Learning | |
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering | |
Can Large Language Models Reason? A Characterization via 3-SAT | |
Large language models can consistently generate high-quality content for election disinformation operations | |
LongVILA: Scaling Long-Context Visual Language Models for Long Videos | |
Meta Knowledge for Retrieval Augmented Large Language Models | |
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges | |
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models | |
Graph Retrieval-Augmented Generation: A Survey | |
Patched MOA: optimizing inference for diverse software development tasks | |
Patched RTC: evaluating LLMs for diverse software development tasks | |
InstructCoder: Instruction Tuning Large Language Models for Code Editing | |
To Code, or Not To Code? Exploring Impact of Code in Pre-training | |
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model | |
HMoE: Heterogeneous Mixture of Experts for Language Modeling | |
Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval | |
Analysis of Plan-based Retrieval for Grounded Text Generation | |
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency | |
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique | |
Goldfish: Monolingual Language Models for 350 Languages | |
BLADE: Benchmarking Language Model Agents for Data-Driven Science | |
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering | |
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | |
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs | |
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses | |
LLM Pruning and Distillation in Practice: The Minitron Approach | |
Critique-out-Loud Reward Models | |
FocusLLM: Scaling LLM's Context by Parallel Decoding | |
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models | |
StructuredRAG: JSON Response Formatting with Large Language Models | |
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | |
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation | |
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | |
Mistral-SPLADE: LLMs for for better Learned Sparse Retrieval | |
CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | |
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer | |
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs | |
Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data | |
Flexora: Flexible Low Rank Adaptation for Large Language Models | |
Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information | |
Controllable Text Generation for Large Language Models: A Survey | |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | |
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications | |
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | |
Drama Engine: A Framework for Narrative Agents | |
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search | |
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment | |
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | |
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models | |
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM | |
Automating Thought of Search: A Journey Towards Soundness and Completeness | |
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | |
Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs | |
Matmul or No Matmal in the Era of 1-bit LLMs | |
Cross-Modal Safety Alignment: Is textual unlearning all you need? | |
Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study | |
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution | |
QUB-Cirdan at "Discharge Me!": Zero shot discharge letter generation by open-source LLM | |
Exploring Backdoor Attacks against Large Language Model-based Decision Making | |
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation | |
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters | |
Visual Perception by Large Language Model's Weights | |
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs | |
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization | |
PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization | |
Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory | |
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs | |
InstructionCP: A fast approach to transfer Large Language Models into target language | |
KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models | |
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning | |
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning | |
One-Shot Safety Alignment for Large Language Models via Optimal Dualization | |
Are Large Language Models Chameleons? | |
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding | |
Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation | |
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | |
Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery | |
Can Graph Learning Improve Task Planning? | |
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors | |
Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners | |
Language Generation with Strictly Proper Scoring Rules | |
Compressing Large Language Models using Low Rank and Low Precision Decomposition | |
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions | |
Mechanistic Interpretability of Binary and Ternary Transformers | |
Enhanced Robot Arm at the Edge with NLP and Vision Systems | |
Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback | |
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs | |
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | |
THREAD: Thinking Deeper with Recursive Spawning | |
Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching | |
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention | |
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | |
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs | |
Autoformalizing Euclidean Geometry | |
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | |
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization | |
MotionLLM: Multimodal Motion-Language Learning with Large Language Models | |
Exploring the LLM Journey from Cognition to Expression with Linear Representations | |
A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor | |
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing | |
Laurel: Generating Dafny Assertions Using Large Language Models | |
LLMs for User Interest Exploration in Large-scale Recommendation Systems | |
Devil's Advocate: Anticipatory Reflection for LLM Agents | |
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs | |
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models | |
Mechanism Design for LLM Fine-tuning with Multiple Reward Models | |
FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference | |
A statistical framework for weak-to-strong generalization | |
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks | |
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | |
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection | |
C3LLM: Conditional Multimodal Content Generation Using Large Language Models | |
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting | |
Finetuning Large Language Model for Personalized Ranking | |
Towards Completeness-Oriented Tool Retrieval for Large Language Models | |
Keypoint-based Progressive Chain-of-Thought Distillation for LLMs | |
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models | |
Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models | |
Claim Verification in the Age of Large Language Models: A Survey | |
Streaming Long Video Understanding with Large Language Models | |
Your Large Language Models Are Leaving Fingerprints | |
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response | |
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions | |
Why Not Transform Chat Large Language Models to Non-English? | |
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment | |
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework | |
Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation | |
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models | |
DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification | |
How to set AdamW's weight decay as you scale model and dataset size | |
Safety Alignment for Vision Language Models | |
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation | |
Large Language Models are Effective Priors for Causal Graph Discovery | |
HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model | |
WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness | |
LIRE: listwise reward enhancement for preference alignment | |
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction | |
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models | |
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering | |
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | |
Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems | |
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs | |
Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings | |
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance | |
Towards Evaluating and Building Versatile Large Language Models for Medicine | |
LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction | |
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | |
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers | |
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | |
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | |
Domain-specific long text classification from sparse relevant information | |
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | |
Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | |
Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | |
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates | |
Semantic Alignment for Multimodal Large Language Models | |
Memory-Efficient LLM Training with Online Subspace Descent | |
A Survey of Hallucination in Large Foundation Models | |
MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | |
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | |
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions | |
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models | |
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning | |
MobileQuant: Mobile-friendly Quantization for On-device Language Models | |
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | |
LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal! | |
Efficient Detection of Toxic Prompts in Large Language Models | |
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler | |
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | |
A Web-Based Solution for Federated Learning with LLM-Based Automation | |
NanoFlow: Towards Optimal Large Language Model Serving Throughput | |
A Survey of Large Language Models for European Languages | |
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments | |
Challenges and Responses in the Practice of Large Language Models | |
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | |
Inverse Scaling: When Bigger Isn't Better | |
Generative Verifiers: Reward Modeling as Next-Token Prediction | |
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing | |
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | |
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | |
MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce | |
Writing in the Margins: Better Inference Pattern for Long Context Retrieval | |
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning | |
PAT: Pruning-Aware Tuning for Large Language Models | |
Text2SQL is Not Enough: Unifying AI and Databases with TAG | |
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | |
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express | |
Agentic Retrieval-Augmented Generation for Time Series Analysis | |
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | |
A Law of Next-Token Prediction in Large Language Models | |
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | |
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration | |
Efficient LLM Scheduling by Learning to Rank | |
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | |
Decentralized LLM Inference over Edge Networks with Energy Harvesting | |
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | |
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | |
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature | |
Geometry of Lightning Self-Attention: Identifiability and Dimension | |
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models | |
Conan-embedding: General Text Embedding with More and Better Negative Samples | |
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models | |
ReMamba: Equip Mamba with Effective Long-Sequence Modeling | |
Awes, Laws, and Flaws From Today's LLM Research | |
Persuasion Games using Large Language Models | |
Can Unconfident LLM Annotations Be Used for Confident Conclusions? | |
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling | |
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever | |
Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation | |
A Survey on Evaluating Large Language Models in Code Generation Tasks | |
Law of Vision Representation in MLLMs | |
SynDL: A Large-Scale Synthetic Test Collection | |
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models | |
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements | |
Understanding the User: An Intent-Based Ranking Dataset | |
Iterative Graph Alignment | |
Icing on the Cake: Automatic Code Summarization at Ericsson | |
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | |
LLMs generate structurally realistic social networks but overestimate political homophily | |
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems | |
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models | |
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | |
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | |
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents | |
LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation | |
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs | |
InkubaLM: A small language model for low-resource African languages | |
SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section | |
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization | |
Automatic Differential Diagnosis using Transformer-Based Multi-Label Sequence Classification | |
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding | |
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | |
CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation | |
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models | |
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer | |
MemLong: Memory-Augmented Retrieval for Long Text Modeling | |
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training | |
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning | |
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | |
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | |
Selective Preference Optimization via Token-Level Reward Function Estimation | |
Impact of ChatGPT on the writing style of condensed matter physicists | |
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback | |
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models | |
ImageBind-LLM: Multi-modality Instruction Tuning | |
Transformers as Support Vector Machines | |
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection | |
RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer | |
OLMoE: Open Mixture-of-Experts Language Models | |
BEAVER: An Enterprise Benchmark for Text-to-SQL | |
Foundations of Large Language Model Compression -- Part 1: Weight Quantization | |
Contemporary Model Compression on Large Language Models Inference | |
rerankers: A Lightweight Python Library to Unify Ranking Methods | |
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | |
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | |
Focus Agent: LLM-Powered Virtual Focus Group | |
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks | |
AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction | |
In Defense of RAG in the Era of Long-Context Language Models | |
Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | |
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models | |
ProGRes: Prompted Generative Rescoring on ASR n-Best | |
Augmented Reality without Borders: Achieving Precise Localization Without Maps | |
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | |
CogVLM2: Visual Language Models for Image and Video Understanding | |
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need | |
In-Context Imitation Learning via Next-Token Prediction | |
A Practitioner's Guide to Continual Multimodal Pretraining | |
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | |
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | |
Configurable Foundation Models: Building LLMs from a Modular Perspective | |
Towards a Unified View of Preference Learning for Large Language Models: A Survey | |
A Comparative Study of Pre-training and Self-training | |
Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models? | |
RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models | |
Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering | |
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval | |
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild | |
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining | |
Unforgettable Generalization in Language Models | |
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation | |
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI | |
Imitating Language via Scalable Inverse Reinforcement Learning | |
Statically Contextualizing Large Language Models with Typed Holes | |
ContextCite: Attributing Model Generation to Context | |
TinyAgent: Function Calling at the Edge | |
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts | |
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action | |
Ruri: Japanese General Text Embeddings | |
On-Device Language Models: A Comprehensive Review | |
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text | |
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments | |
Building Math Agents with Multi-Turn Iterative Preference Learning | |
Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges | |
Attention Heads of Large Language Models: A Survey | |
Planning In Natural Language Improves LLM Search For Code Generation | |
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization | |
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | |
Extracting Paragraphs from LLM Token Activations | |
xLAM: A Family of Large Action Models to Empower AI Agent Systems | |
Large Language Model-Based Agents for Software Engineering: A Survey | |
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration | |
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries | |
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | |
Evolution of Social Norms in LLM Agents using Natural Language | |
A Static Evaluation of Code Completion by Large Language Models | |
Universal Transformers | |
Hardware Acceleration of LLMs: A comprehensive survey and comparison | |
Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation | |
The Compressor-Retriever Architecture for Language Model OS | |
A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine | |
A Survey for Large Language Models in Biomedicine | |
Watermarking Techniques for Large Language Models: A Survey | |
Genetic Approach to Mitigate Hallucination in Generative IR | |
Theory, Analysis, and Best Practices for Sigmoid Self-Attention | |
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | |
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | |
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | |
Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | |
An overview of domain-specific foundation model: key technologies, applications and challenges | |
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | |
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | |
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration | |
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data | |
MoRe Fine-Tuning with 10x Fewer Parameters | |
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | |
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | |
AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model | |
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models | |
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | |
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | |
Benchmarking Chinese Knowledge Rectification in Large Language Models | |
A System and Benchmark for LLM-based Q\&A on Heterogeneous Data | |
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | |
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning | |
Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications | |
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs | |
Achieving Peak Performance for Large Language Models: A Systematic Review | |
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models | |
Improving Pretraining Data Using Perplexity Correlations | |
LLMs Will Always Hallucinate, and We Need to Live With This | |
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance | |
How Does Code Pretraining Affect Language Model Task Performance? | |
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation | |
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More | |
Radiology-Llama2: Best-in-Class Large Language Model for Radiology | |
Synthetic continued pretraining | |
Agent Workflow Memory | |
Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | |
STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM | |
What is the Role of Small Models in the LLM Era: A Survey | |
LLaMA-Omni: Seamless Speech Interaction with Large Language Models | |
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | |
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? | |
Length Desensitization in Directed Preference Optimization | |
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | |
Can Large Language Models Unlock Novel Scientific Research Ideas? | |
SongCreator: Lyrics-based Universal Song Generation | |
Self-Harmonized Chain of Thought | |
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | |
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | |
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | |
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation | |
Generative Hierarchical Materials Search | |
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | |
What Makes a Maze Look Like a Maze? | |
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | |
Retrieval Augmented Thought Process for Private Data Handling in Healthcare | |
Dense Reward for Free in Reinforcement Learning from Human Feedback | |
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG | |
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | |
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection | |
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT | |
Representation Tuning | |
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | |
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | |
Alleviating Hallucinations in Large Language Models with Scepticism Modeling | |
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | |
Harmonic Reasoning in Large Language Models | |
STLM Engineering Report: Dropout | |
Towards Automated Machine Learning Research | |
Optimization Hyper-parameter Laws for Large Language Models | |
Residual Stream Analysis with Multi-Layer SAEs | |
LAST: Language Model Aware Speech Tokenization | |
A Fused Large Language Model for Predicting Startup Success | |
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers | |
Accelerating Large Language Model Training with Hybrid GPU-based Compression | |
Training on the Benchmark Is Not All You Need | |
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning | |
LanguaShrink: Reducing Token Overhead with Psycholinguistics | |
EPO: Hierarchical LLM Agents with Environment Preference Optimization | |
Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | |
Harmonized Speculative Sampling | |
Why transformers are obviously good models of language | |
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | |
How transformers learn structured data: insights from hierarchical filtering | |
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data | |
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection | |
Search-Based LLMs for Code Optimization | |
Memorization In In-Context Learning | |
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? | |
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | |
Demystifying the Communication Characteristics for Distributed Transformer Models | |
In-Context Learning with Representations: Contextual Generalization of Trained Transformers | |
Performance Law of Large Language Models | |
Importance Weighting Can Help Large Language Models Self-Improve | |
Acquiring Bidirectionality via Large and Small Language Models | |
Extracting Sentence Embeddings from Pretrained Transformer Models | |
Instruct Large Language Models to Generate Scientific Literature Survey Step by Step | |
LLMs can Schedule | |
A Unified Framework for Model Editing | |
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies | |
Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data | |
Animate, or Inanimate, That is the Question for Large Language Models | |
Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks | |
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression | |
Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training | |
From Words to Worth: Newborn Article Impact Prediction with LLM | |
Is Child-Directed Speech Effective Training Data for Language Models? | |
Automated Theorem Provers Help Improve Large Language Model Reasoning | |
SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | |
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages | |
Cross-layer Attention Sharing for Large Language Models | |
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs | |
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | |
Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models | |
On the Resilience of Multi-Agent Systems with Malicious Agents | |
Disentangling Dense Embeddings with Sparse Autoencoders | |
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context | |
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning | |
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens | |
Entropy, Thermodynamics and the Geometrization of the Language Model | |
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | |
CultureVo: The Serious Game of Utilizing Gen AI for Enhancing Cultural Intelligence | |
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | |
LLMs' Understanding of Natural Language Revealed | |
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models | |
Do Language Models Have a Critical Period for Language Acquisition? | |
Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications | |
Towards Effective and Efficient Continual Pre-training of Large Language Models | |
Climbing the Complexity Ladder with Expressive Attention | |
Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies | |
I Could've Asked That: Reformulating Unanswerable Questions | |
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | |
On the Design and Analysis of LLM-Based Algorithms | |
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data | |
A mathematical framework of intelligence and consciousness based on Riemannian Geometry | |
Enhancing Training Efficiency Using Packing with Flash Attention | |
Banishing LLM Hallucinations Requires Rethinking Generalization | |
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser | |
Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata | |
A Notion of Complexity for Theory of Mind via Discrete World Models | |
Tree Cross Attention | |
Sentence Bottleneck Autoencoders from Transformer Language Models | |
Neural Machine Translation without Embeddings | |
Agents in Software Engineering: Survey, Landscape, and Vision | |
Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | |
Programming Refusal with Conditional Activation Steering | |
AIPO: Improving Training Objective for Iterative Preference Optimization | |
Your Weak LLM is Secretly a Strong Teacher for Alignment | |
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task | |
Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents | |
CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks | |
LLM Critics Help Catch LLM Bugs | |
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | |
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models | |
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training | |
Reasoning with Language Model is Planning with World Model | |
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | |
Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | |
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | |
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds | |
BERT Rediscovers the Classical NLP Pipeline | |
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | |
Assessing Adversarial Robustness of Large Language Models: An Empirical Study | |
Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs | |
LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | |
Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | |
Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation | |
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems | |
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code | |
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots | |
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs | |
jina-embeddings-v3: Multilingual Embeddings With Task LoRA | |
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey | |
On the Diagram of Thought | |
CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks | |
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator | |
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications | |
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | |
Explaining Datasets in Words: Statistical Models with Natural Language Parameters | |
AudioBERT: Audio Knowledge Augmented Language Model | |
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation | |
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | |
Qwen2.5-Coder Technical Report | |
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | |
A Controlled Study on Long Context Extension and Generalization in LLMs | |
GRIN: GRadient-INformed MoE | |
LLMs + Persona-Plug = Personalized LLMs | |
Human-like Affective Cognition in Foundation Models | |
Designing Interfaces for Multimodal Vector Search Applications | |
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation | |
A Framework for Ranking Content Providers Using Prompt Engineering and Self-Attention Network | |
Scaling FP8 training to trillion-token LLMs | |
NVLM: Open Frontier-Class Multimodal LLMs | |
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents | |
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | |
Towards Time Series Reasoning with LLMs | |
Learning Spatially-Aware Language and Audio Embedding | |
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | |
LOLA -- An Open-Source Massively Multilingual Large Language Model | |
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse | |
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer | |
Semformer: Transformer Language Models with Semantic Planning | |
Embedding Geometries of Contrastive Language-Image Pre-Training | |
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models | |
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B | |
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | |
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models | |
On the limits of agency in agent-based models | |
Schrodinger's Memory: Large Language Models | |
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison | |
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation | |
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing | |
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study | |
Stable Language Model Pre-training by Reducing Embedding Variability | |
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems | |
The Expressive Power of Transformers with Chain of Thought | |
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | |
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions | |
Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing | |
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation | |
Training Language Models to Self-Correct via Reinforcement Learning | |
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization | |
Enhancing E-commerce Product Title Translation with Retrieval-Augmented Generation and Large Language Models | |
Language Models Learn to Mislead Humans via RLHF | |
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL | |
MEXMA: Token-level objectives improve sentence representations | |
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories | |
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries | |
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning | |
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | |
BERT-VBD: Vietnamese Multi-Document Summarization Framework | |
Measuring Human and AI Values based on Generative Psychometrics with Large Language Models | |
RoMath: A Mathematical Reasoning Benchmark in Romanian | |
Compressing LLMs: The Truth is Rarely Pure and Never Simple | |
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | |
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | |
Knowledge-Based Domain-Oriented Data Augmentation for Enhancing Unsupervised Sentence Embedding | |
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | |
AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | |
Retrieval-Augmented Test Generation: How Far Are We? | |
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning | |
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues | |
Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights | |
Linear Recency Bias During Training Improves Transformers' Fit to Reading Times | |
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models | |
Making Large Language Models into World Models with Precondition and Effect Knowledge | |
Linguini: A benchmark for language-agnostic linguistic reasoning | |
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | |
Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking | |
SLIMER-IT: Zero-Shot NER on Italian Language | |
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization | |
Adaptive Large Language Models By Layerwise Attention Shortcuts | |
Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors | |
MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences | |
Language Models "Grok" to Copy | |
Autoregressive + Chain of Thought $\simeq$ Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer | |
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? | |
What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs | |
When Context Leads but Parametric Memory Follows in Large Language Models | |
SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses | |
Mixture of Diverse Size Experts | |
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | |
Semi-Supervised Reward Modeling via Iterative Self-Training | |
Spectral Filters, Dark Signals, and Attention Sinks | |
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers | |
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection | |
ChainBuddy: An AI Agent System for Generating LLM Pipelines | |
ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources | |
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models | |
RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion | |
RRM: Robust Reward Model Training Mitigates Reward Hacking | |
AutoVerus: Automated Proof Generation for Rust Code | |
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models | |
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey | |
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench | |
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | |
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning | |
Jailbreaking Large Language Models with Symbolic Mathematics | |
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | |
An adapted large language model facilitates multiple medical tasks in diabetes care | |
KTO: Model Alignment as Prospect Theoretic Optimization | |
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs | |
Towards Understanding Grokking: An Effective Theory of Representation Learning | |
What Makes Good In-Context Examples for GPT-$3$? | |
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | |
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping | |
Learning from Contrastive Prompts: Automated Optimization and Adaptation | |
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | |
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs | |
Phantom of Latent for Large Language and Vision Models | |
Target-Aware Language Modeling via Granular Data Sampling | |
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling | |
A Case Study of Web App Coding with OpenAI Reasoning Models | |
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency | |
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems | |
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking | |
LLM-Assisted Visual Analytics: Opportunities and Challenges | |
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling | |
Instruction Following without Instruction Tuning | |
OmniBench: Towards The Future of Universal Omni-Language Models | |
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely | |
A Survey on the Honesty of Large Language Models | |
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | |
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | |
MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | |
Making Text Embedders Few-Shot Learners | |
Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation | |
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA | |
EuroLLM: Multilingual Language Models for Europe | |
Small Language Models: Survey, Measurements, and Insights | |
Reward-Robust RLHF in LLMs | |
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts | |
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning | |
Block-Attention for Low-Latency RAG | |
Federated Large Language Models: Current Progress and Future Directions | |
Visual Prompting in Multimodal Large Language Models: A Survey | |
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | |
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | |
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | |
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization | |
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | |
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing | |
Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval | |
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | |
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring | |
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks | |
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems | |
A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms | |
Disentangling Questions from Query Generation for Task-Adaptive Retrieval | |
Boosting Healthcare LLMs Through Retrieved Context | |
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | |
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | |
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale | |
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization | |
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models | |
A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions | |
Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models | |
EgoLM: Multi-Modal Language Model of Egocentric Motions | |
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | |
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search | |
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores | |
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | |
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | |
Looped Transformers for Length Generalization | |
Automatic Instruction Evolving for Large Language Models | |
Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study | |
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA) | |
Infer Human's Intentions Before Following Natural Language Instructions | |
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends | |
VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search | |
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference | |
Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs | |
Multi-language Unit Test Generation using LLMs | |
CLUE: Concept-Level Uncertainty Estimation for Large Language Models | |
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models | |
Alignment-Aware Model Extraction Attacks on Large Language Models | |
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | |
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | |
Hypothesizing Missing Causal Variables with LLMs | |
Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs | |
Membership Inference Attacks Against In-Context Learning | |
Deploying a Retrieval based Response Model for Task Oriented Dialogues | |
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | |
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction | |
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment | |
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | |
Large Language Models Can Understanding Depth from Monocular Images | |
Addition is All You Need for Energy-efficient Language Models | |
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices | |
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | |
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | |
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | |
Can Models Learn Skill Composition from Examples? | |
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code | |
Hyper-Connections | |
Visual Question Decomposition on Multimodal Large Language Models | |
DiaSynth -- Synthetic Dialogue Generation Framework | |
On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation | |
LML: Language Model Learning a Dataset for Data-Augmented Prediction | |
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | |
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | |
Emu3: Next-Token Prediction is All You Need | |
Learning the Latent Rules of a Game from Data: A Chess Story | |
Cottention: Linear Transformers With Cosine Attention | |
Do We Need Domain-Specific Embedding Models? An Empirical Investigation | |
Data Analysis in the Era of Generative AI | |
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization | |
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback | |
SciDFM: A Large Language Model with Mixture-of-Experts for Science | |
Generative Retrieval Meets Multi-Graded Relevance | |
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | |
An Adversarial Perspective on Machine Unlearning for AI Safety | |
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult | |
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows | |
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making | |
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders | |
Natural Language Processing Methods for the Study of Protein-Ligand Interactions | |
Solving math word problems with process- and outcome-based feedback | |
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG | |
Law of the Weakest Link: Cross Capabilities of Large Language Models | |
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | |
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect | |
LoRA Dropout as a Sparsity Regularizer for Overfitting Control | |
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs | |
Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation | |
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning | |
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation | |
Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling | |
HelpSteer2-Preference: Complementing Ratings with Preferences | |
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | |
Quantifying Generalization Complexity for Large Language Models | |
Not All LLM Reasoners Are Created Equal | |
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks | |
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis | |
FactAlign: Long-form Factuality Alignment of Large Language Models | |
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs | |
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | |
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation | |
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | |
Contrastive Localized Language-Image Pre-Training | |
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | |
Large Language Models as Markov Chains | |
Distilling an End-to-End Voice Assistant Without Instruction Training Data | |
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation | |
General Preference Modeling with Preference Representations for Aligning Language Models | |
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? | |
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | |
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | |
FlashMask: Efficient and Rich Mask Extension of FlashAttention | |
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting | |
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks | |
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head | |
Understanding Higher-Order Correlations Among Semantic Components in Embeddings | |
Calibrating Language Models with Adaptive Temperature Scaling | |
On the Inductive Bias of Stacking Towards Improving Reasoning | |
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy | |
Intelligence at the Edge of Chaos | |
Contextual Document Embeddings | |
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning | |
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning | |
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics | |
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | |
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | |
AutoTrain: No-code training for state-of-the-art models | |
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models | |
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | |
The Perfect Blend: Redefining RLHF with Mixture of Judges | |
How Much Can RAG Help the Reasoning of LLM? | |
ENTP: Encoder-only Next Token Prediction | |
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models | |
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | |
A General Framework for Producing Interpretable Semantic Text Embeddings | |
Showing LLM-Generated Code Selectively Based on Confidence of LLMs | |
Autoregressive Large Language Models are Computationally Universal | |
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise | |
Intrinsic Evaluation of RAG Systems for Deep-Logic Questions | |
Erasing Conceptual Knowledge from Language Models | |
Selective Attention Improves Transformer | |
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs | |
ARB-LLM: Alternating Refined Binarizations for Large Language Models | |
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning | |
In-context Learning in Presence of Spurious Correlations | |
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark | |
ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model | |
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs | |
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | |
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | |
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | |
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | |
Efficient $1$-bit tensor approximations | |
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | |
Differential Transformer | |
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | |
DEPT: Decoupled Embeddings for Pre-training Language Models | |
Fast State Restoration in LLM Serving with HCache | |
TLDR: Token-Level Detective Reward Model for Large Vision Language Models | |
Reward-RAG: Enhancing RAG with Reward Driven Supervision | |
Named Clinical Entity Recognition Benchmark | |
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs | |
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations | |
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | |
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | |
Why Do We Need Weight Decay in Modern Deep Learning? | |
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | |
Algorithmic Capabilities of Random Transformers | |
Inference Scaling for Long-Context Retrieval Augmented Generation | |
Preference Optimization as Probabilistic Inference | |
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | |
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References | |
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization | |
LongGenBench: Long-context Generation Benchmark | |
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? | |
nGPT: Normalized Transformer with Representation Learning on the Hypersphere | |
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks | |
ToolGen: Unified Tool Retrieval and Calling via Generation | |
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | |
A generative framework to bridge data-driven models and scientific theories in language neuroscience | |
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks | |
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | |
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation | |
Steering Large Language Models between Code Execution and Textual Reasoning | |
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | |
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | |
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search | |
Archon: An Architecture Search Framework for Inference-Time Techniques | |
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes | |
Data Selection via Optimal Control for Language Models | |
Upcycling Large Language Models into Mixture of Experts | |
Temporal Reasoning Transfer from Text to Video | |
TRACE: Temporal Grounding Video LLM via Causal Event Modeling | |
MM-Ego: Towards Building Egocentric Multimodal LLMs | |
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | |
Can Transformers Reason Logically? A Study in SAT Solving | |
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | |
Personalized Visual Instruction Tuning | |
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | |
Pixtral 12B | |
Self-Boosting Large Language Models with Synthetic Preference Data | |
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA | |
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | |
Multimodal Situational Safety | |
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs | |
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning | |
CursorCore: Assist Programming through Aligning Anything | |
TinyEmo: Scaling down Emotional Reasoning via Metric Projection | |
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders | |
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | |
Falcon Mamba: The First Competitive Attention-free 7B Language Model | |
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | |
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | |
Does Spatial Cognition Emerge in Frontier Models? | |
Round and Round We Go! What makes Rotary Positional Encodings useful? | |
Large Language Model Enhanced Text-to-SQL Generation: A Survey | |
Tracking Universal Features Through Fine-Tuning and Model Merging | |
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG | |
Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space | |
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks | |
Response Tuning: Aligning Large Language Models without Instruction | |
Collective Critics for Creative Story Generation | |
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints | |
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment | |
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | |
Emergent properties with repeated examples | |
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization | |
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | |
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents | |
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | |
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | |
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | |
Intriguing Properties of Large Language and Vision Models | |
Benchmarking Agentic Workflow Generation | |
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | |
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition | |
Think Twice: A Human-like Two-stage Conversational Agent for Emotional Response Generation | |
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents | |
Vector-ICL: In-context Learning with Continuous Vector Representations | |
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | |
LLM Cascade with Multi-Objective Optimal Consideration | |
No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users | |
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks | |
LLMs Are In-Context Reinforcement Learners | |
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | |
Accelerated Preference Optimization for Large Language Model Alignment | |
How to Train Long-Context Language Models (Effectively) | |
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning | |
SimpleStrat: Diversifying Language Model Generation with Stratification | |
Mentor-KD: Making Small Language Models Better Multi-step Reasoners | |
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | |
Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory | |
Baichuan-Omni Technical Report | |
KV Prediction for Improved Time to First Token | |
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation | |
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? | |
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity | |
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | |
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | |
Benign Overfitting in Single-Head Attention | |
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | |
I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | |
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness | |
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models | |
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | |
RL, but don't do anything I wouldn't do | |
From Tokens to Words: On the Inner Lexicon of LLMs | |
Neuron-Level Sequential Editing for Large Language Models | |
Mixture of Attentions For Speculative Decoding | |
Integrating Natural Language Prompting Tasks in Introductory Programming Courses | |
Benign or Not-Benign Overfitting in Token Selection of Attention Mechanism | |
Causal Inference with Large Language Model: A Survey | |
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | |
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models | |
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI | |
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | |
PeerArg: Argumentative Peer Review with LLMs | |
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | |
Thinking LLMs: General Instruction Following with Thought Generation | |
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | |
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents | |
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models | |
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation | |
Rethinking Data Selection at Scale: Random Selection is Almost All You Need | |
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling | |
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | |
Tree of Problems: Improving structured problem solving with compositionality | |
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | |
Think While You Generate: Discrete Diffusion with Planned Denoising | |
Strong Model Collapse | |
Fundamental Limitations on Subquadratic Alternatives to Transformers | |
On The Computational Complexity of Self-Attention | |
Primer: Searching for Efficient Transformers for Language Modeling | |
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | |
Agent-as-a-Judge: Evaluate Agents with Agents | |
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | |
Empirical Study of Mutual Reinforcement Effect and Application in Few-shot Text Classification Tasks via Prompt | |
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models | |
What Matters in Transformers? Not All Attention is Needed | |
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | |
A Hitchhiker's Guide to Scaling Law Estimation | |
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | |
Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations | |
Agentic Information Retrieval | |
In-Context Learning Enables Robot Action Prediction in LLMs | |
Exploring Model Kinship for Merging Large Language Models | |
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL | |
BenTo: Benchmark Task Reduction with In-Context Transferability | |
Revealing the Barriers of Language Agents in Planning | |
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | |
Prompt Compression for Large Language Models: A Survey | |
Model Balancing Helps Low-data Training and Fine-tuning | |
The Moral Case for Using Language Model Agents for Recommendation | |
OMCAT: Omni Context Aware Transformer | |
FLARE: Faithful Logic-Aided Reasoning and Exploration | |
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence | |
Persistent Topological Features in Large Language Models | |
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions | |
Large Language Model Evaluation via Matrix Nuclear-Norm | |
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains | |
Taming Overconfidence in LLMs: Reward Calibration in RLHF | |
Parameter-Efficient Fine-Tuning of State Space Models | |
How Do Multilingual Models Remember? Investigating Multilingual Factual Recall Mechanisms | |
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements | |
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities | |
LightRAG: Simple and Fast Retrieval-Augmented Generation | |
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism | |
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | |
Can MLLMs Understand the Deep Implication Behind Chinese Images? | |
Retrospective Learning from Interactions | |
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | |
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | |
Harnessing Webpage UIs for Text-Rich Visual Understanding | |
Looking Inward: Language Models Can Learn About Themselves by Introspection | |
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment | |
Improving Multi-modal Large Language Model through Boosting Vision Capabilities | |
Persistent Pre-Training Poisoning of LLMs | |
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model | |
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning | |
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant | |
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems | |
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | |
Roadmap towards Superhuman Speech Understanding using Large Language Models | |
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation | |
A Little Human Data Goes A Long Way | |
AERO: Softmax-Only LLMs for Efficient Private Inference | |
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging | |
Improving Instruction-Following in Language Models through Activation Steering | |
JudgeBench: A Benchmark for Evaluating LLM-based Judges | |
From Commands to Prompts: LLM-based Semantic File System for AIOS | |
MoH: Multi-Head Attention as Mixture-of-Head Attention | |
When Attention Sink Emerges in Language Models: An Empirical View | |
Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key | |
FlatQuant: Flatness Matters for LLM Quantization | |
MedMobile: A mobile-sized language model with expert-level clinical capabilities | |
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models | |
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small | |
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval | |
SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | |
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | |
TopoLM: brain-like spatio-functional organization in a topographic language model | |
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers | |
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | |
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | |
Teaching Models to Balance Resisting and Accepting Persuasion | |
Do LLMs "know" internally when they follow instructions? | |
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions | |
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning | |
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models | |
Goal Inference from Open-Ended Dialog | |
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | |
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs | |
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | |
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media | |
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces | |
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization | |
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise | |
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | |
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | |
Pre-training Distillation for Large Language Models: A Design Space Exploration | |
Improve Vision Language Model Chain-of-thought Reasoning | |
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style | |
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages | |
Baichuan Alignment Technical Report | |
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation | |
Decomposing The Dark Matter of Sparse Autoencoders | |
Sparse Universal Transformer | |
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | |
Diverging Preferences: When do Annotators Disagree and do Models Know? | |
Do LLMs estimate uncertainty well in instruction-following? | |
Large Language Models Are Overparameterized Text Encoders | |
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts | |
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs | |
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy | |
Generative Reward Models | |
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | |
Content Enhanced BERT-based Text-to-SQL Generation | |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | |
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities | |
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | |
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment | |
Cascade Reward Sampling for Efficient Decoding-Time Alignment | |
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | |
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models | |
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation | |
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation | |
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement | |
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training | |
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant | |
In-context learning and Occam's razor | |
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers | |
Zero-shot Model-based Reinforcement Learning using Large Language Models | |
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks | |
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs | |
Transformers are Efficient Compilers, Provably | |
LongReward: Improving Long-context Large Language Models with AI Feedback | |
Automatically Interpreting Millions of Features in Large Language Models | |
You can remove GPT2's LayerNorm by fine-tuning | |
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning | |
MiniPLM: Knowledge Distillation for Pre-Training Language Models | |
Value Residual Learning For Alleviating Attention Concentration In Transformers | |
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging | |
Aligning Large Language Models via Self-Steering Optimization | |
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | |
Beyond Retrieval: Generating Narratives in Conversational Recommender Systems | |
Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other? | |
STAR: A Simple Training-free Approach for Recommendations using Large Language Models | |
SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques | |
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | |
Improving Pinterest Search Relevance Using Large Language Models | |
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | |
Pyramid Vector Quantization for LLMs | |
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | |
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | |
Stick-breaking Attention | |
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | |
Scaling Diffusion Language Models via Adaptation from Autoregressive Models | |
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation | |
Frontiers in Intelligent Colonoscopy | |
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | |
LLM-based Optimization of Compound AI Systems: A Survey | |
Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers | |
M-RewardBench: Evaluating Reward Models in Multilingual Settings | |
MedINST: Meta Dataset of Biomedical Instructions | |
ALTA: Compiler-Based Analysis of Transformers | |
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback | |
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding | |
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | |
Should We Really Edit Language Models? On the Evaluation of Edited Language Models | |
Why Does the Effective Context Length of LLMs Fall Short? | |
RRADistill: Distilling LLMs' Passage Ranking Ability for Document Re-Ranking of Long-Tail Queries in a Search Engine | |
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch | |
LOGO -- Long cOntext aliGnment via efficient preference Optimization | |
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models | |
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs | |
Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits | |
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | |
Language Models are Symbolic Learners in Arithmetic | |
Balancing Label Quantity and Quality for Scalable Elicitation | |
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm | |
AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline | |
SpinQuant: LLM quantization with learned rotations | |
WAFFLE: Multi-Modal Model for Automated Front-End Development | |
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | |
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | |
Taipan: Efficient and Expressive State Space Language Models with Selective Attention | |
Can Knowledge Editing Really Correct Hallucinations? | |
When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models | |
Distill Visual Chart Reasoning Ability from LLMs to MLLMs | |
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs | |
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning | |
Provably Robust Watermarks for Open-Source Language Models | |
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations | |
Rethinking Softmax: Self-Attention with Polynomial Activations | |
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning | |
Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study | |
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI | |
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models | |
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment | |
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction | |
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | |
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering | |
DreamLIP: Language-Image Pre-training with Long Captions | |
Inductive Biases and Variable Creation in Self-Attention Mechanisms | |
An LLM Agent for Automatic Geospatial Data Analysis | |
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning | |
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | |
Long Term Memory: The Foundation of AI Self-Evolution | |
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization | |
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs | |
LeanAgent: Lifelong Learning for Formal Theorem Proving | |
Little Giants: Synthesizing High-Quality Embedding Data at Scale | |
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers | |
A Survey of Conversational Search | |
Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | |
How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts | |
Teach Multimodal LLMs to Comprehend Electrocardiographic Images | |
Knowledge Graph Enhanced Language Agents for Recommendation | |
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | |
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training | |
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning | |
VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs | |
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark | |
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback | |
Counting Ability of Large Language Models and Impact of Tokenization | |
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | |
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance | |
Reflection-Bench: probing AI intelligence with reflection | |
PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles | |
Analysing the Residual Stream of Language Models Under Knowledge Conflicts | |
CoqPilot, a plugin for LLM-based generation of proofs | |
Measuring memorization through probabilistic discoverable extraction | |
Computational Bottlenecks of Training Small-scale Large Language Models | |
Mixture of Parrots: Experts improve memorization more than reasoning | |
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation | |
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | |
A Survey of Small Language Models | |
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation | |
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction | |
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation | |
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation | |
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models | |
Plan$\times$RAG: Planning-guided Retrieval Augmented Generation | |
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment | |
Language Models And A Second Opinion Use Case: The Pocket Professional | |
Fast Best-of-N Decoding via Speculative Rejection | |
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers | |
RARe: Retrieval Augmented Retrieval with In-Context Examples | |
Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | |
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation | |
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction | |
Large Language Models Reflect the Ideology of their Creators | |
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration | |
A Survey on Data Synthesis and Augmentation for Large Language Models | |
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | |
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | |
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | |
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training | |
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA | |
Understanding Synthetic Context Extension via Retrieval Heads | |
Matryoshka: Learning to Drive Black-Box LLMs with LLMs | |
The Geometry of Concepts: Sparse Autoencoder Feature Structure | |
Attacking Vision-Language Computer Agents via Pop-ups | |
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | |
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse | |
CLEAR: Character Unlearning in Textual and Visual Modalities | |
Aligning Audio-Visual Joint Representations with an Agentic Workflow | |
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | |
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters | |
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation | |
Distinguishing Ignorance from Error in LLM Hallucinations | |
Learning and Unlearning of Fabricated Knowledge in Language Models | |
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models | |
On the Role of Depth and Looping for In-Context Learning with Task Diversity | |
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' | |
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges | |
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | |
Accelerating Direct Preference Optimization with Prefix Sharing | |
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels | |
QTIP: Quantization with Trellises and Incoherence Processing | |
EMMA: End-to-End Multimodal Model for Autonomous Driving | |
SciPIP: An LLM-based Scientific Paper Idea Proposer | |
Zipfian Whitening | |
On Memorization of Large Language Models in Logical Reasoning | |
Stealing User Prompts from Mixture of Experts | |
Toxicity of the Commons: Curating Open-Source Pre-Training Data | |
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering | |
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function | |
SelfCodeAlign: Self-Alignment for Code Generation | |
Constraint Back-translation Improves Complex Instruction Following of Large Language Models | |
Nearest Neighbor Normalization Improves Multimodal Retrieval | |
Language Models can Self-Lengthen to Generate Long Texts | |
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models | |
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | |
Weight decay induces low-rank attention layers | |
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | |
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists | |
Toward Understanding In-context vs. In-weight Learning | |
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks | |
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments | |
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages | |
AAAR-1.0: Assessing AI's Potential to Assist Research | |
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective | |
Failure Modes of LLMs for Causal Reasoning on Narratives | |
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | |
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks | |
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses | |
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources | |
Physics in Next-token Prediction | |
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | |
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | |
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | |
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | |
GPT or BERT: why not both? | |
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | |
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF | |
Thinking Forward and Backward: Effective Backward Planning with Large Language Models | |
Context Parallelism for Scalable Million-Token Inference | |
RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | |
DynaSaur: Large Language Agents Beyond Predefined Actions | |
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks | |
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | |
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | |
Survey of Cultural Awareness in Language Models: Text and Beyond | |
LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering | |
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models | |
E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation | |
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation | |
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset | |
BitNet a4.8: 4-bit Activations for 1-bit LLMs | |
Beyond Utility: Evaluating LLM as Recommender | |
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering | |
Personalization of Large Language Models: A Survey | |
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | |
How Does Critical Batch Size Scale in Pre-training? | |
Scaling Optimal LR Across Token Horizons | |
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding | |
Not All Memories are Created Equal: Learning to Forget by Expiring | |
Inference Optimal VLMs Need Only One Visual Token but Larger Models | |
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | |
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | |
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge | |
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs | |
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | |
Sample-Efficient Alignment for LLMs | |
LLaMo: Large Language Model-based Molecular Graph Assistant | |
Controlling Language and Diffusion Models by Transporting Activations | |
A Scalable Communication Protocol for Networks of Large Language Models | |
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models | |
Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval | |
Wave Network: An Ultra-Small Language Model | |
Model Equality Testing: Which Model Is This API Serving? | |
A linguistic analysis of undesirable outcomes in the era of generative AI | |
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level | |
Long Context RAG Performance of Large Language Models | |
LASER: Attention with Exponential Transformation | |
Photon: Federated LLM Pre-Training | |
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | |
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models | |
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter? | |
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba | |
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | |
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | |
Can LLMs make trade-offs involving stipulated pain and pleasure states? | |
Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers | |
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically | |
Teaching Models to Improve on Tape | |
Evolving Alignment via Asymmetric Self-Play | |
Scaling LLM Inference with Optimized Sample Compute Allocation | |
Self-Consistency Preference Optimization | |
Tiny Transformers Excel at Sentence Compression | |
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | |
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination | |
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond | |
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness | |
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks | |
LoRA vs Full Fine-tuning: An Illusion of Equivalence | |
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | |
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models | |
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval | |
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | |
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication | |
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | |
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | |
Towards Reliable Alignment: Uncertainty-aware RLHF | |
Abrupt Learning in Transformers: A Case Study on Matrix Completion | |
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression | |
O1 Replication Journey: A Strategic Progress Report -- Part 1 | |
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing | |
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | |
Methods of improving LLM training stability | |
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs | |
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts | |
Generalized Probabilistic Attention Mechanism in Transformers | |
Economic Anthropology in the Era of Generative Artificial Intelligence | |
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation | |
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | |
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | |
MoDification: Mixture of Depths Made Easy | |
Speciesism in Natural Language Processing Research | |
Reducing the Transformer Architecture to a Minimum | |
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning | |
Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence | |
Hypothesis Testing the Circuit Hypothesis in LLMs | |
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | |
Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding | |
Conformity in Large Language Models | |
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing | |
A Case for AI Consciousness: Language Agents and Global Workspace Theory | |
Local and Global Decoding in Text Generation | |
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators | |
Geometric Signatures of Compositionality Across a Language Model's Lifetime | |
Is Parameter Collision Hindering Continual Learning in LLMs? | |
Reverse Modeling in Large Language Models | |
On the Proper Treatment of Tokenization in Psycholinguistics | |
Post-edits Are Preferences Too | |
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization | |
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity | |
EmbedLLM: Learning Compact Representations of Large Language Models | |
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | |
Mitigating Memorization In Language Models | |
House of Cards: Massive Weights in LLMs | |
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models | |
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models | |
Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training | |
RisingBALLER: A player is a token, a match is a sentence, A path towards a foundational model for football players data analytics | |
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards | |
Self-Updatable Large Language Models with Parameter Integration | |
Are LLMs Aware that Some Questions are not Open-ended? | |
Vision Language Models See What You Want but not What You See | |
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions | |
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models | |
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? | |
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | |
Analyzing The Language of Visual Tokens | |
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities | |
Model merging with SVD to tie the Knots | |
Best Practices for Distilling Large Language Models into BERT for Web Search Ranking | |
Interpretable Language Modeling via Induction-head Ngram Models | |
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method | |
GUI Agents with Foundation Models: A Comprehensive Survey | |
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks | |
DELIFT: Data Efficient Language model Instruction Fine Tuning | |
Aioli: A Unified Optimization Framework for Language Model Data Mixing | |
LBPE: Long-token-first Tokenization to Improve Large Language Models | |
Balancing Pipeline Parallelism with Vocabulary Parallelism | |
Fox-1 Technical Report | |
STAND-Guard: A Small Task-Adaptive Content Moderation Model | |
Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs | |
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement | |
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | |
FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | |
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale | |
Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning | |
LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions | |
Scattered Forest Search: Smarter Code Space Exploration with LLMs | |
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study | |
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models | |
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking | |
ZipNN: Lossless Compression for AI Models | |
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | |
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI | |
Number Cookbook: Number Understanding of Language Models and How to Improve It | |
Mixtures of In-Context Learners | |
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | |
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | |
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models | |
AFlow: Automating Agentic Workflow Generation | |
Recycled Attention: Efficient inference for long-context language models | |
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | |
Counterfactual Generation from Language Models | |
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | |
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | |
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization | |
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | |
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge | |
Game-theoretic LLM: Agent Workflow for Negotiation Games | |
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction | |
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | |
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models | |
More Expressive Attention with Negative Weights | |
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | |
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models | |
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | |
Learning Code Preference via Synthetic Evolution | |
Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation | |
Scaling Laws for Precision | |
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | |
RedCode: Risky Code Execution and Generation Benchmark for Code Agents | |
Likelihood as a Performance Gauge for Retrieval-Augmented Generation | |
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows | |
Entropy Controllable Direct Preference Optimization | |
SecEncoder: Logs are All You Need in Security | |
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples | |
Toward Optimal Search and Retrieval for RAG | |
The Super Weight in Large Language Models | |
Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data | |
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance | |
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems | |
Towards Low-bit Communication for Tensor Parallel LLM Inference | |
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | |
SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models | |
Stronger Models are NOT Stronger Teachers for Instruction Tuning | |
Hardware and Software Platform Inference | |
Direct Preference Optimization Using Sparse Feature-Level Constraints | |
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection | |
Can sparse autoencoders be used to decompose and interpret steering vectors? | |
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models | |
Large Language Models Can Self-Improve in Long-context Reasoning | |
Language Models as Causal Effect Generators | |
Model Stealing for Any Low-Rank Language Model | |
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs | |
XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | |
Controllable Context Sensitivity and the Knob Behind It | |
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | |
Pie: Pooling CPU Memory for LLM Inference | |
Cut Your Losses in Large-Vocabulary Language Models | |
LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | |
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | |
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look | |
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | |
Squeezed Attention: Accelerating Long Context Length LLM Inference | |
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment