Skip to content

Instantly share code, notes, and snippets.

import argparse
from datetime import datetime, timedelta
from typing import Dict
import numpy as np
import pandas as pd
# from benchmark import Benchmark
import ray
#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark Ray Data performance on image datasets.
"""
import ray
import argparse
import os
import time
import torch
import torch.utils.data
#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark PyTorch DataLoader performance on image datasets.
"""
import argparse
import os
import time
import torch
import torch.utils.data
#!/usr/bin/env python3
"""
dataloader_benchmark.py - A script to benchmark PyTorch DataLoader performance on image datasets.
"""
import argparse
import os
import time
import torch
import torch.utils.data

Ray Data LLM

High-performance batch inference for large language models, powered by Ray Data.

Overview

Ray Data LLM provides an efficient, scalable solution for batch processing LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

RayLLM-Batch

Engine Configurations

Here are the full supported engine configurations:

model_id: <HF model ID or local model path>
llm_engine: vllm
accelerator_type: <GPU type>

Ray Data LLM

High-performance batch inference for large language models, powered by Ray Data.

Overview

Ray Data LLM provides an efficient, scalable solution for batch processing LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

vLLM Batch

A high-performance batch inference library for Large Language Models (LLMs) powered by vLLM and Ray.

Overview

vLLM-Batch enables efficient, large-scale batch processing of LLM inference workloads with:

  • High Throughput: Optimized performance using vLLM's paged attention and continuous batching
  • Distributed Processing: Scale across multiple GPUs and machines using Ray Data

vLLM Batch

High-throughput batch inference for LLMs using vLLM.

Quick Start

from vllm_batch import BatchProcessor, LLMConfig, PromptConfig

# Basic usage

RayLLM-Batch

Engine Configurations

Here are the full supported engine configurations:

model_id: <HF model ID or local model path>
llm_engine: vllm
accelerator_type: <GPU type>