byronyi / standalone_multihead_jvp_test.py

Created June 7, 2025 13:08 — forked from Ryu1845/standalone_multihead_jvp_test.py

	from typing import Tuple
	import gc

	import torch
	import torch.nn.functional as F
	import triton
	import triton.language as tl
	import triton.testing

byronyi / rl-for-llms.md

Created July 15, 2024 00:46 — forked from yoavg/rl-for-llms.md

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

byronyi / bisect.log

Created March 11, 2021 17:00

	git bisect start
	# good: [9b302513f6d82f0ca989b3bb1f5ffc592ed866b7] [AArch64] Add missing intrinsics for vrnd
	git bisect good 9b302513f6d82f0ca989b3bb1f5ffc592ed866b7
	# bad: [c9ff39a3f9840c84453f23a37386a3dc374f055a] Add "assert require" for the test added in df9158c9a45a6902c2b0394f9bd6512e3e441f31
	git bisect bad c9ff39a3f9840c84453f23a37386a3dc374f055a
	# good: [002dd47bdd674fad8186650f07458b1e062545df] [clang] Fix typos in the default logic for CLANG_DEFAULT_UNWINDLIB
	git bisect good 002dd47bdd674fad8186650f07458b1e062545df
	# bad: [bb6732cf622522f17dad948279ba4f68e3bd55e1] [MC] Add parseEOL() overload and migrate some parseToken(AsmToken::EndOfStatement) to parseEOL()
	git bisect bad bb6732cf622522f17dad948279ba4f68e3bd55e1
	# good: [06a8a867d1591cfdab65037eabd7e865113dc7a6] [rs4gc/tests] Remove use of internal debug flags

byronyi / plugin.cc

Created April 24, 2019 12:19

	#define EIGEN_USE_THREADS

	#include "tensorflow/core/common_runtime/optimization_registry.h"
	#include "tensorflow/core/framework/common_shape_fns.h"
	#include "tensorflow/core/framework/op.h"
	#include "tensorflow/core/framework/op_kernel.h"
	#include "tensorflow/core/framework/rendezvous.h"
	#include "tensorflow/core/framework/shape_inference.h"
	#include "tensorflow/core/graph/algorithm.h"
	#include "tensorflow/core/graph/edgeset.h"

byronyi / test.md

Created October 8, 2018 05:34

This is a test gist

Hello gist!

byronyi / compile_tensorflow_serving.sh

Created October 7, 2017 01:42 — forked from jorgemf/compile_tensorflow_serving.sh

Compile TensorFlow Serving with CUDA support (October 2017)

	#!/bin/bash

	TENSORFLOW_COMMIT=9e76bf324f6bac63137a02bb6e6ec9120703ea9b # August 16, 2017
	TENSORFLOW_SERVING_COMMIT=267d682bf43df1c8e87332d3712c411baf162fe9 # August 18, 2017
	MODELS_COMMIT=78007443138108abf5170b296b4d703b49454487 # July 25, 2017

	if [ -z $TENSORFLOW_SERVING_REPO_PATH ]; then
	TENSORFLOW_SERVING_REPO_PATH="serving"
	fi
	INITIAL_PATH=$(pwd)

byronyi / ubuntu-vm.xml

Last active September 20, 2017 12:39 — forked from calerogers/ubuntu-vm.xml

	<domain type='kvm'>
	<name>ubuntu-4b</name>
	<uuid>7dfbcb8a-77da-11e6-a116-408d5cb4b9e6</uuid>
	<memory unit='KiB'>12582912</memory>
	<currentMemory unit='KiB'>12582912</currentMemory>
	<vcpu placement='static'>2</vcpu>
	<os>
	<type arch='x86_64' machine='pc-q35-2.5'>hvm</type>
	<loader readonly='no' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
	<nvram>/var/lib/libvirt/qemu/nvram/ubuntu-4b_VARS.fd</nvram>

byronyi / vlog2_dist.log

Created August 2, 2017 16:05

	2017-08-03 00:02:21.913337: I tensorflow/core/distributed_runtime/rpc/grpc_remote_worker.cc:176] done callback, req: step_id: 102800540342059661
	rendezvous_key: "/job:ps/replica:0/task:0/cpu:0;57cb0fff5df077fe;/job:worker/replica:0/task:0/cpu:0;edge_68_report_uninitialized_variables/boolean_mask/Gather;0:0"
	dma_ok: true
	response send_start_micros: 1501689741913046

	2017-08-03 00:02:21.913389: I tensorflow/core/common_runtime/executor.cc:1611] 0x7fff4c108eb0 Async kernel done: report_uninitialized_variables/boolean_mask/Gather_S15 = _Recv[client_terminated=false, recv_device="/job:worker/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=6326167691039111166, tensor_name="edge_68_report_uninitialized_variables/boolean_mask/Gather", tensor_type=DT_STRING, _device="/job:worker/replica:0/task:0/cpu:0"]()
	2017-08-03 00:02:21.913411: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 4 kernel_name: "report_uninitialized_variables

byronyi / vlog2.log

Created August 2, 2017 15:43

	2017-08-02 23:42:44.550529: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 256B
	2017-08-02 23:42:44.550587: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 512B
	2017-08-02 23:42:44.550607: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 1.0KiB
	2017-08-02 23:42:44.550614: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 2.0KiB
	2017-08-02 23:42:44.550620: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 4.0KiB
	2017-08-02 23:42:44.550627: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 8.0KiB
	2017-08-02 23:42:44.550634: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 16.0KiB
	2017-08-02 23:42:44.550641: I tensorflow/core/common_runtime/bfc_allocator.cc:56] Creating bin of max chunk size 32.0KiB
	2017-08-02 23:42:44.550647: I tensorflow/core/common_runtime/bfc_a

byronyi / rdmacm.BUILD

Created July 11, 2017 12:21

	package(default_visibility = ["//visibility:public"])

	licenses(["notice"]) # OpenIB.org BSD license (MIT variant)

	exports_files(["COPYING"])

	cc_library(
	name = "rdmacm",
	hdrs = [
	"include/rdma/rdma_cma.h",

Bairen Yi byronyi

Reinforcement Learning for Language Models

Why RL?

This is a test gist