Simon Mo simon-mo

layout	title	author
post	vLLM Now Supports AMD GPUs	vLLM team and EmbeddedLLM

TL;DR:

With helps from EmbeddedLLM team, vLLM can now run on top of ROCm enabled AMD GPUs.

abstract: | Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model’s outputs. However, its efficacy can be limited due to the low predictive accuracy of the draft model, particularly when faced with diverse text inputs and a significant capability gap between the draft and target models. We introduce online speculative decoding to address this challenge. The main idea is to continually update (multiple) draft model(s) on observed user

WANalytics: Analytics for a Geo-Distributed Data-Intensive World

WANalytics is proposed, a system that pushes computation to edge data centers, automatically optimizing workow execution plans and replicating data when needed, which delivers substantial gains for three standard benchmarks: TPC-CH, Berkeley Big Data, and BigBench.

Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the best of our knowledge, WABD is not supported by currently deployed systems nor suciently studied in literature; it is addressed today by continuously copying raw data to a central location for analysis. We observe from production workloads that WABD is important for large organizations, and that centralized solutions incur subs

ray-project/ray#31384

When will you run into this issue:

This issue will occur when running a python function on ray cluster with @ray.remote and the function runs on head node instead of worker node.

Here's how to fix it:

Functions are scheduled on a node that has available CPUs. So it is normal that it is scheduled on a worker node. However, if you'd like to avoid scheduling the function on a head node, you can set the --num-cpus of a head node as 0 when starting Ray: ray start --head --num-cpus=0. Alternatively, you can use the node affinity scheduling strategy to avoid scheduling on a head node.

Explanation:

By default, Ray distributes workloads on nodes with available CPUs. The head node typically has available CPU resources and may run the function by default. To avoid running the function on the head node, you can set the --num-cpus of the head node as 0 when starting Ray. This will prevent workloads from being scheduled on the head node. Alternatively,

	import pandas as pd

	# Load the two CSV files
	commit_data = pd.read_csv('git_log_summary.csv') # This contains author, email, date, and total lines changed
	author_org_data = pd.read_csv('git_log_grouped_by_author.csv') # This contains author, email, org, and other fields

	# Merge the two dataframes on author and email
	merged_data = pd.merge(commit_data, author_org_data[['Author', 'Email', 'Organization']], on=['Author', 'Email'], how='left')

	# Fill missing organization names with "Community"

	{
	"question_id": "58210e39b3fd4441a2bd4a518bb44c2d",
	"prompt": "What is the difference between OpenCL and CUDA?",
	"openai_scores_raw_choices_nested": [
	{
	"finish_reason": "stop",
	"index": 0,
	"logprobs": null,
	"message": {
	"content": "{\n \"topic_modeling\": \"Technical Comparison\",\n \"score_reason\": \"This prompt requires the AI to accurately compare and contrast two distinct technologies, OpenCL and CUDA. It assesses the AI's factual accuracy and knowledge of these technologies, as well as its ability to articulate the differences between them.\",\n \"score_value\": 9\n}",

	<?xml version="1.0" encoding="UTF-8"?>
	<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
	xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
	<fileDesc>
	<titleStmt>
	<title level="a" type="main">Ambry</title>
	</titleStmt>