Skip to content

Instantly share code, notes, and snippets.

View simon-mo's full-sized avatar
:shipit:

Simon Mo simon-mo

:shipit:
View GitHub Profile
import pandas as pd
# Load the two CSV files
commit_data = pd.read_csv('git_log_summary.csv') # This contains author, email, date, and total lines changed
author_org_data = pd.read_csv('git_log_grouped_by_author.csv') # This contains author, email, org, and other fields
# Merge the two dataframes on author and email
merged_data = pd.merge(commit_data, author_org_data[['Author', 'Email', 'Organization']], on=['Author', 'Email'], how='left')
# Fill missing organization names with "Community"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
{
"question_id": "58210e39b3fd4441a2bd4a518bb44c2d",
"prompt": "What is the difference between OpenCL and CUDA?",
"openai_scores_raw_choices_nested": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "{\n \"topic_modeling\": \"Technical Comparison\",\n \"score_reason\": \"This prompt requires the AI to accurately compare and contrast two distinct technologies, OpenCL and CUDA. It assesses the AI's factual accuracy and knowledge of these technologies, as well as its ability to articulate the differences between them.\",\n \"score_value\": 9\n}",
layout title author
post
vLLM Now Supports AMD GPUs
vLLM team and EmbeddedLLM

TL;DR:

  • With helps from EmbeddedLLM team, vLLM can now run on top of ROCm enabled AMD GPUs.
@simon-mo
simon-mo / osd.md
Last active October 31, 2023 08:25

abstract: | Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model’s outputs. However, its efficacy can be limited due to the low predictive accuracy of the draft model, particularly when faced with diverse text inputs and a significant capability gap between the draft and target models. We introduce online speculative decoding to address this challenge. The main idea is to continually update (multiple) draft model(s) on observed user

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
<teiHeader xml:lang="en">
<fileDesc>
<titleStmt>
<title level="a" type="main">Ambry</title>
</titleStmt>

WANalytics: Analytics for a Geo-Distributed Data-Intensive World

WANalytics is proposed, a system that pushes computation to edge data centers, automatically optimizing workow execution plans and replicating data when needed, which delivers substantial gains for three standard benchmarks: TPC-CH, Berkeley Big Data, and BigBench.

Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the best of our knowledge, WABD is not supported by currently deployed systems nor suciently studied in literature; it is addressed today by continuously copying raw data to a central location for analysis. We observe from production workloads that WABD is important for large organizations, and that centralized solutions incur subs

When will you run into this issue:

This issue will occur when running a python function on ray cluster with @ray.remote and the function runs on head node instead of worker node.

Here's how to fix it:

Functions are scheduled on a node that has available CPUs. So it is normal that it is scheduled on a worker node. However, if you'd like to avoid scheduling the function on a head node, you can set the --num-cpus of a head node as 0 when starting Ray: ray start --head --num-cpus=0. Alternatively, you can use the node affinity scheduling strategy to avoid scheduling on a head node.

Explanation:

By default, Ray distributes workloads on nodes with available CPUs. The head node typically has available CPU resources and may run the function by default. To avoid running the function on the head node, you can set the --num-cpus of the head node as 0 when starting Ray. This will prevent workloads from being scheduled on the head node. Alternatively,