Leo liaocs2008

LLM Speed on V100

It is glad to see some LLM speed reports online such as CPU and GPU. To give a more comprehensive investigation, this document records some LLM inference measurements on V100 16GB using text-generation-webui.

Test Setup

We test following prompts:

from dataset Sqaud

How many student news papers are found at Notre Dame?

	#!/usr/bin/python
	"""
	imageMe is a super simple image gallery server.

	Run imageme.py from the top level of an image directory to generate gallery
	index HTML and run a SimpleHTTPServer on the localhost.

	Imported as a module, use imageme.serve_dir(your_path) to do the same for any
	directory programmatically. When run as entry point, imageme.serve_dir('.') is
	what's called.

	### Graph Models
	clefourrier/graphormer-base-pcqm4mv1

	### Time Series Models
	huggingface/autoformer-tourism-monthly
	huggingface/informer-tourism-monthly
	huggingface/time-series-transformer-tourism-monthly

	### Reinforcement Learning Models
	edbeeching/decision-transformer-gym-hopper-medium

	# after running the script, use google photos their built-in timestamp based sorting function

	i=0
	find ./ -name "*.jpg" \| sort \| while read filename; do
	i=$((i+1))

	time=`date --date="-${i} seconds" '+%Y:%m:%d-%T'`
	#echo $time

	jhead -mkexif $filename

	from selenium import webdriver
	from selenium.webdriver.chrome.options import Options
	from webdriver_manager.chrome import ChromeDriverManager
	from bs4 import BeautifulSoup
	import pandas as pd
	import time


	options = Options()
	options.add_argument('--headless')

	"""
	start at x, each step either walk +1 or -1. If we arrive at 0 or n, then we stop.
	question: what's the probability to stop at 0 and n with starting position x?


	theoretically: P[stop at 0] = (n-x)/n, P[stop at n] = x/n


	simulation results in the format "x, (P[stop at 0], P[stop at n])":
	2 (0.98032, 0.01968)

	## note, run with py27

	import itertools
	import numpy as np
	from lpdec.codes import BinaryLinearBlockCode
	from lpdec.codes.ldpc import ArrayLDPCCode
	from lpdec.channels import *
	from lpdec.decoders.iterative import IterativeDecoder

	# model + meanfile, https://drive.google.com/drive/folders/1GF_hyCkw46SVxQhP0NSVB4WJraYzlhUp?usp=sharing
	# I0208 21:55:14.949070 18811 caffe.cpp:309] Loss: 1.65299
	# I0208 21:55:14.949090 18811 caffe.cpp:321] loss3/loss3 = 1.65299 (* 1 = 1.65299 loss)
	# I0208 21:55:14.949101 18811 caffe.cpp:321] loss3/top-1 = 0.607879
	# I0208 21:55:14.949111 18811 caffe.cpp:321] loss3/top-5 = 0.838562
	#

	name: "eyeriss"

	layer {

	<!DOCTYPE HTML>
	<html lang="en-US">
	<head>
	<meta charset="UTF-8">
	<meta http-equiv="refresh" content="0; url=https://liaocs2008.github.io/">
	<script type="text/javascript">
	window.location.href = "https://liaocs2008.github.io/"
	</script>
	<title>Siyu Liao</title>
	</head>

	"""
	First Pass:
	( 5 1 4 2 8 ) –> ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1.
	( 1 5 4 2 8 ) –> ( 1 4 5 2 8 ), Swap since 5 > 4
	( 1 4 5 2 8 ) –> ( 1 4 2 5 8 ), Swap since 5 > 2
	( 1 4 2 5 8 ) –> ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not swap them.

	Second Pass:
	( 1 4 2 5 8 ) –> ( 1 4 2 5 8 )
	( 1 4 2 5 8 ) –> ( 1 2 4 5 8 ), Swap since 4 > 2