A high-performance batch inference library for Large Language Models (LLMs) powered by vLLM and Ray.
vLLM-Batch enables efficient, large-scale batch processing of LLM inference workloads with:
- High Throughput: Optimized performance using vLLM's paged attention and continuous batching
- Distributed Processing: Scale across multiple GPUs and machines using Ray Data