Sean Lopp slopp

Dynamo

Executive Summary

This report provides a comprehensive comparison of NVIDIA Dynamo with other frameworks, including vLLM and NVIDIA Triton Server, for large language model (LLM) inference workloads. The key findings highlight Dynamo's superior performance in multi-GPU setups, achieving higher throughput and lower latency compared to vLLM. Additionally, Dynamo's disaggregated serving approach offers more flexible performance tuning, especially for large models and variable workload conditions. In comparison to NVIDIA Triton Server, Dynamo is optimized for low-latency generative AI/LLM workloads, while Triton Server excels in multi-model inference serving. The report also delves into Dynamo's technical architecture, which is designed to accelerate inference workloads through components such as disaggregated serving, smart routing, and distributed KV cache management. Benchmarking methodologies and key performance metrics are also discussed, emphasizing the importance of standardized evaluation for

A modification of the LangChain SQL Q&A tutorial https://python.langchain.com/docs/tutorials/sql_qa/.

The changes are:

uses Pydantic to type the state and inputs/outputs
uses duckDB on the palmerpenguin dataset
uses a Nvidia NIM for the LLM
instead of a sequence write_query -> run_query -> gen_answer, this graph adds a LLM that checks the write_query output for validity and to see if it answers the question, leading to a more dynamic graph that looks like this:

This simple streamlit app uses the Google Maps and Places API, along with a hosted Nvidia NIM wrapper of the Llama model, to help you find coffee shops near an address.

To run

Install dependencies

See https://loppsided.blog/posts/2024-12-27-a-basic-tool-calling-agent/

See https://loppsided.blog/posts/2024-12-26-a-rag-to-chat-with-your-favorite-blogger/

Torch Experiment

Goal

First attempt at using Torch for some type of "deep" learning
Take advantage of modal to access serverless Python compute, including GPUs

	from typing import Any
	from dagster import ConfigurableResource, ConfigurableIOManager, InputContext, OutputContext, asset, Definitions, ResourceDependency, EnvVar
	from pydantic import Field

	# https://docs.dagster.io/concepts/resources#resources-that-depend-on-other-resources

	class myResource(ConfigurableResource):
	username: str = Field(description="the username")
	password: str = Field(description="the password")

	import os

	from dagster import define_asset_job, load_assets_from_package_module, repository, with_resources, op, job, ScheduleDefinition
	from my_dagster_project import assets
	from datadog_api_client import ApiClient, Configuration
	from datadog_api_client.v2.api.metrics_api import MetricsApi
	from datadog_api_client.v2.model.metric_intake_type import MetricIntakeType
	from datadog_api_client.v2.model.metric_payload import MetricPayload
	from datadog_api_client.v2.model.metric_point import MetricPoint
	from datetime import datetime

	import datetime

	import pins
	import os
	import seaborn as sns
	from dagster import asset, asset_check, AssetCheckResult
	from posit import connect # install as uv pip install posit-sdk
	from sklearn.linear_model import LogisticRegression
	from sklearn.model_selection import train_test_split
	from sklearn.pipeline import Pipeline

	from dagster import asset

	# add an attribute to all assets using this decorator without users having to adjust it
	def bi_team_asset(**asset_decorator_kwargs):
	def _wrapper(f):
	@asset(**asset_decorator_kwargs, owners=["[email protected]"], name=f.__name__)
	def _impl(**kwargs):
	return f(**kwargs)

	return _impl