All things LLM Inference
- Websocket response api support in sglang router, following openai's wss response api, 15-55% improvement latency in multi-turn chat, tool use and agentic workloads
- Virtual Token Counter router in vllm-project/aibrix gateway plugin for fairness guarantees in inference routing. 4.5% improvement in latency because fairness in routing
- lock free arena allocator, upto 30% improvement in TTFT & upto 50% improvement in goodput in mid-sized models verified on sglang w/ and w/o HiCache deployments
- SPSC & MPSC counters benchmark for cross platform & HW measureent
- Developed a High performance, low latency Load Balancer on top of Netty based non- blocking IO and Disruptor (ring-buffer) architecture that outperformed Nginx
- Performance issues debugging using JFR – Java Flight Recorder.
- Google Summer of Code
- Code Repo
- Blog Posts
- MQTT, AMQP protocols as transports, Simple Edge Filtering capabilities.
- Performance benchmark for LIOTA, Unit test suites with 95%+ code coverage.
- Support for
- CoAP and XMPP protocols.
- Edge Intelligence – OTA updates of LIOTA applications and ML models.
- Code Repo