This tutorial captures end-to-end reference flows for running AIPerf against vLLM-hosted models. Each chapter covers a specific OpenAI-compatible endpoint: how to launch the vLLM server, run the AIPerf benchmark, and interpret
This guide will walk you through using TensorRT-Cloud to perform performance sweeping with the TRT LLM PyTorch backend.
--> THIS GUIDE IS PSEUDOCODE AND JUST A PRODUCT MANAGER'S SUGGESTION. WE HAVE NOT YET BUILT THIS FEATURE <---
Unlike the C++ backend which uses ahead-of-time (AoT) compilation, TRT LLM PyTorch uses just-in-time (JIT) compilation. This means we'll be configuring runtime parameters rather than build parameters for our performance sweep.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import speech_recognition as sr | |
from text_to_speech import text_to_speech #a different module, but you can guess what it does | |
import pygame | |
pygame.mixer.init() | |
def capture_speech(): | |
r = sr.Recognizer() | |
with sr.Microphone() as source: # use the default microphone as the audio source | |
pygame.mixer.Sound("open_mic.wav").play() #play a prompt sound so that the user knows that they can speak |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import speech_recognition as sr | |
import time | |
import sys | |
sys.stdout = open('/dev/null', 'w') | |
from text_to_speech import text_to_speech | |
import pygame | |
sys.stdout = sys.__stdout__ | |
pygame.mixer.init() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import threading | |
import time | |
condition = threading.Condition() | |
def lock_door(): | |
global condition | |
while True: | |
print("Lock Thread: Door lock armed") |