Pete Tanski pdtgct

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Get a Free SSL Certificate With Let’s Encrypt

Let’s Encrypt is a free, automated, and open Certificate Authority.

Install tools for using the Let's Encrypt certificates using Certbot

  sudo apt-get update \
  sudo apt-get install software-properties-common

Web Service Fronting

Multiple Web properties on a single IP address

Hosting multiple websites on a single public IP address on the standard HTTP(S) ports is relatively easy with popular web servers like Apache, Nginx and lighttpd all supporting Virtual Hosts.
For Web Services which bundle their own HTTP server, things get more complicated, unless their HTTP stack can be shared somehow. More often than not, the application's HTTP stack listens directly on a dedicated TCP port.

Hosting multiple services on a single IP then requires using a fronting server listening on the standard HTTP port, and routing to the right backend service based on the host name or the path sent by the client.
Path based routing is cumbersome, usually requiring either the service to be aware of the path prefix, or a rewrite by the HTTP fronting server of all absolute URLs in the requests and responses.
Hostname based routing is more straightforward. The fronting server can just look at the [HTTP/1.1 Host header](https://tools

How to convert a Flash/After Effects animation into a JSMovieclip animation

Intro

JSMovieclip is a little javascript framework. It allows you to play, control... animations like a Movieclip object in AS3. It uses a sprite which contains all frames of the animation.

Purpose : We'll create a sprite to use with JSMovieclip, from a Flash animation, using TexturePacker

Requirement : We'll use Flash/After Effects, Texture Packer, and JSMovieclip script

Generate the list yourself:

$ cd /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS*.sdk/System/Library/Frameworks/UIKit.framework/Headers
$ grep UI_APPEARANCE_SELECTOR ./*     | \
  sed 's/NS_AVAILABLE_IOS(.*)//g'     | \
  sed 's/NS_DEPRECATED_IOS(.*)//g'    | \
  sed 's/API_AVAILABLE(.*)//g'        | \
  sed 's/API_UNAVAILABLE(.*)//g'      | \
 sed 's/UI_APPEARANCE_SELECTOR//g' | \

	Understand the Task: Grasp the main objective, goals, requirements, constraints, and expected output.
	- Minimal Changes: If an existing prompt is provided, improve it only if it's simple. For complex prompts, enhance clarity and add missing elements without altering the original structure.
	- Reasoning Before Conclusions: Encourage reasoning steps before any conclusions are reached. ATTENTION! If the user provides examples where the reasoning happens afterward, REVERSE the order! NEVER START EXAMPLES WITH CONCLUSIONS!
	- Reasoning Order: Call out reasoning portions of the prompt and conclusion parts (specific fields by name). For each, determine the ORDER in which this is done, and whether it needs to be reversed.
	- Conclusion, classifications, or results should ALWAYS appear last.
	- Examples: Include high-quality examples if helpful, using placeholders [in brackets] for complex elements.
	- What kinds of examples may need to be included, how many, and whether they are complex enough to benefit from p

	import aiohttp
	import asyncio
	import os
	import re
	from loguru import logger
	from playwright.async_api import async_playwright


	async def check_site(browser, domain, status):
	page = await browser.new_page()

	from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit
	from contextlib import contextmanager

	def noop (x=None, args, *kwargs):
	"Do nothing"
	return x

	@contextmanager
	def no_kaiming():
	old_iku = init.kaiming_uniform_

	frontend default
	bind :80

	# ACL for "example.com" and "www.example.com"
	acl ACL_example.com hdr(host) -i example.com www.example.com
	use_backend be_example.com if ACL_example.com

	# ACL for "example.net"
	acl ACL_example.net hdr(host) -i example.net
	use_backend be_example.net if ACL_example.net

	import gc

	import torch

	## MEM utils ##
	def mem_report():
	'''Report the memory usage of the tensor.storage in pytorch
	Both on CPUs and GPUs are reported'''

	def _mem_report(tensors, mem_type):