marp	theme	_class	paginate	backgroundColor	backgroundImage
true	gaia	lead	true		url('https://marp.app/assets/hero-background.svg')

GTC 2024

The AI Conference from Nvidia

Agenda

Why NVIDIA GTC?(GPU Tech Conference)
What's new from NVIDIA?
NVIDIA Inference Microservices
GTC sessions that are interesting
Tritonserver
Architecting for the New Language Model Stack [S62702]
Summaries for other talks(OpenAI, Together, Job topic)

Why NVIDIA GTC(GPU Tech Conference)?

For AI, what are the options? Google TPU, Groq LPU, AMD ROCm™

Ollama supports AMD
AMD firmware is not open and tinygrad is struggling
Groq does not provide options to run finetuned LLM. Not agnostic yet.
NVIDIA GPUs hold a dominant position

What's new from NVIDIA?

getting serious about cloud service(https://build.nvidia.com/explore/discover)
$4500 per GPU for 1 year. $1 for 1 hour use
docker & kubernetes: can't ignore NVIDIA for the firmware part for max performance
Benchmarking LLM via Performance Analyzer

NIM(NVIDIA Inference Microservices)

import os
from dotenv import load_dotenv
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

# Embedding
load_dotenv()
os.environ['NVIDIA_API_KEY'] = os.getenv('NVIDIA_API_KEY')
llm = ChatNVIDIA(model="mixtral_8x7b")
document_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")
query_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="query")

# LLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages(
    [("system", "You are a helpful AI assistant"), ("user", "{input}")]
)
user_input = st.chat_input("Can you tell me what NVIDIA is known for?")
llm = ChatNVIDIA(model="mixtral_8x7b")

chain = prompt_template | llm | StrOutputParser()

18th	19th	20th/21st
GTC 2024 Keynote [S62542]	Generally Capable Agents in Open-Ended Worlds [S62816]	⭐Transforming AI [S63046]
⭐Deploying, Optimizing, and Benchmarking Large Language Models With Triton Inference Server [S62531]	⭐Retrieval Augmented Generation: Overview of Design Systems, Data, and Customization [S62744]	⭐Architecting for the New Language Model Stack [S62702]

Check triton-inference-server tutorials

Triton server

FROM nvcr.io/nvidia/tritonserver:23.10-py3
RUN pip install transformers==4.34.0 protobuf==3.20.3 sentencepiece==0.1.99 accelerate==0.23.0 einops==0.6.1

mkdir -p model_repository
cp -r hermes_2_pro/ model_repository/
docker build -t triton_transformer_server .

docker run --gpus all -it --rm --net=host \
--shm-size=1G --ulimit memlock=-1 \
--ulimit stack=67108864 \
-v ${PWD}/model_repository:/opt/tritonserver/model_repository \
triton_transformer_server tritonserver --model-repository=model_repository

# Notice that we have models path here
curl -X POST localhost:8000/v2/models/hermes_2_pro/infer \
-d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["I am going"]}]}'

Language Model is the new stack

Example of Single vs Double loop

Language Model Stack Old vs New

Self reflection with code

Foundation Model for Robots: GR00T

Transforming AI pannel talk

Speakers: Jensen Huang(NVIDIA), Ashish Vaswani(Essential AI), Noam Shazeer(Character AI), Aidan Gomez(Cohere), etc

LLM allows software to understand and generate images based on textual prompts, marking the beginning of a new Industrial Revolution.
Future: adaptive computation. universal transformers. with reasoning capability, then we don't need lots of data. then the quality of data matters.
evals. measuring progress. observing the finished task matters.

OpenAI

Speaker: Brad Ligtcap, Chief Operating Officer, OpenAI

Start small, then tackle bigger issues
Use various sizes of language models for different tasks
Monitor and swap models and agents as needed
Aim for reasoning agents for complex actions
Example: AI for patient care - from data to treatment
Adapt interfaces for changing user interactions

Together

Speaker: Percy Liang, Co-Founder, Together AI

we can think about foundation models as infrastructure.
The Center for Research on Foundation Models(CRFM)
- site: https://crfm.stanford.edu/blog.html
- HELM(holistic framework for evaluating foundation models): https://crfm.stanford.edu/helm/lite/latest/
anything new? KTO(Knowledge Transfer Optimization) training

Navigating AI Careers in Europe [SE62721]

AI is democratizing tech. You don't need to know the details of llm.
everything will have some AI in it. dev jobs are not secure.
all employees should be upscaled for AI wheter they are technical or not. latest "skills". if they are outdated then companies are also outdated soon. it will cripple the company's performance compare to other companies.
Reguarding to the coding. we're not there yet for AGI. the way coding is done is changing. the role will not be the same. more like orchestration. validation of biz requirement might be still matters.

jaigouk/recap.md