This report provides a comprehensive comparison of NVIDIA Dynamo with other frameworks, including vLLM and NVIDIA Triton Server, for large language model (LLM) inference workloads. The key findings highlight Dynamo's superior performance in multi-GPU setups, achieving higher throughput and lower latency compared to vLLM. Additionally, Dynamo's disaggregated serving approach offers more flexible performance tuning, especially for large models and variable workload conditions. In comparison to NVIDIA Triton Server, Dynamo is optimized for low-latency generative AI/LLM workloads, while Triton Server excels in multi-model inference serving. The report also delves into Dynamo's technical architecture, which is designed to accelerate inference workloads through components such as disaggregated serving, smart routing, and distributed KV cache management. Benchmarking methodologies and key performance metrics are also discussed, emphasizing the importance of standardized evaluation for
A modification of the LangChain SQL Q&A tutorial https://python.langchain.com/docs/tutorials/sql_qa/.
The changes are:
- uses Pydantic to type the state and inputs/outputs
- uses duckDB on the
palmerpenguin
dataset - uses a Nvidia NIM for the LLM
- instead of a sequence write_query -> run_query -> gen_answer, this graph adds a LLM that checks the write_query output for validity and to see if it answers the question, leading to a more dynamic graph that looks like this:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import Any | |
from dagster import ConfigurableResource, ConfigurableIOManager, InputContext, OutputContext, asset, Definitions, ResourceDependency, EnvVar | |
from pydantic import Field | |
# https://docs.dagster.io/concepts/resources#resources-that-depend-on-other-resources | |
class myResource(ConfigurableResource): | |
username: str = Field(description="the username") | |
password: str = Field(description="the password") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
from dagster import define_asset_job, load_assets_from_package_module, repository, with_resources, op, job, ScheduleDefinition | |
from my_dagster_project import assets | |
from datadog_api_client import ApiClient, Configuration | |
from datadog_api_client.v2.api.metrics_api import MetricsApi | |
from datadog_api_client.v2.model.metric_intake_type import MetricIntakeType | |
from datadog_api_client.v2.model.metric_payload import MetricPayload | |
from datadog_api_client.v2.model.metric_point import MetricPoint | |
from datetime import datetime |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
import pins | |
import os | |
import seaborn as sns | |
from dagster import asset, asset_check, AssetCheckResult | |
from posit import connect # install as uv pip install posit-sdk | |
from sklearn.linear_model import LogisticRegression | |
from sklearn.model_selection import train_test_split | |
from sklearn.pipeline import Pipeline |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from dagster import asset | |
# add an attribute to all assets using this decorator without users having to adjust it | |
def bi_team_asset(**asset_decorator_kwargs): | |
def _wrapper(f): | |
@asset(**asset_decorator_kwargs, owners=["[email protected]"], name=f.__name__) | |
def _impl(**kwargs): | |
return f(**kwargs) | |
return _impl |
NewerOlder