Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark Ray Data performance on image datasets.
"""
import ray
import argparse
import os
import time
import torch
import torch.utils.data
#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark PyTorch DataLoader performance on image datasets.
"""
import argparse
import os
import time
import torch
import torch.utils.data
#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark PyTorch DataLoader performance on image datasets.
"""
import argparse
import os
import time
import torch
import torch.utils.data

Ray Data LLM

High-performance batch inference for large language models, powered by Ray Data.

Overview

Ray Data LLM provides an efficient, scalable solution for batch processing LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

RayLLM-Batch

Engine Configurations

Here are the full supported engine configurations:

model_id: <HF model ID or local model path>
llm_engine: vllm
accelerator_type: <GPU type>

Ray Data LLM

High-performance batch inference for large language models, powered by Ray Data.

Overview

Ray Data LLM provides an efficient, scalable solution for batch processing LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

vLLM Batch

A high-performance batch inference library for Large Language Models (LLMs) powered by vLLM and Ray.

Overview

vLLM-Batch enables efficient, large-scale batch processing of LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

vLLM Batch

High-throughput batch inference for LLMs using vLLM.

Quick Start

from vllm_batch import BatchProcessor, LLMConfig, PromptConfig

# Basic usage

RayLLM-Batch

Engine Configurations

Here are the full supported engine configurations:

model_id: <HF model ID or local model path>
llm_engine: vllm
accelerator_type: <GPU type>

Ray Serve LLM Primitives

Deploy LLMs with full OpenAI API compatibility using Ray Serve primitives.

Features

  • 🚀 Automatic scaling and load balancing
  • 🔄 Tensor parallelism for large models
  • 🎯 JSON mode and function calling
  • 📦 Multi-LoRA adapter support