High-performance batch inference for large language models, powered by Ray Data.
Ray Data LLM provides an efficient, scalable solution for batch processing LLM inference workloads with:
- High Throughput: Optimized performance using vLLM's paged attention and continuous batching
- Distributed Processing: Scale across multiple GPUs and machines using Ray Data