Luna LunNova

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

TDP and Turbo Parameter Modification with MSR on Non-Overclockable CPUs

Disclaimer

Modifying MSR may void your CPU's (or system board's) warranty. Proceed with caution. I am not responsible for any damage caused by this article.
MSR addresses vary significantly between CPUs. Check your CPU's MSR address using Intel's documentation.
This has only been tested on the Intel i7-8550U (Kaby Lake R).
This article is a translation of this article. If you can read Korean, I recommend reading that article instead.

Introduction

In experience to get a proper working multiple wan configuration using mwan3 starting from scratch you should:

Important: this works well on OpenWRT 15.05.1, on newer versions there was some breaking changes, for example, the wan ifaces have ipv6 capability and now are named with letters ("wan, wanb... , wanc" instead of "wan, wan2... wan3" so wanb6 means 2nd wan ipv6.): https://github.com/openwrt/packages/blob/master/net/mwan3/files/etc/config/mwan3

The official documentation seems to be very detailed and up to date, I recommend reading those first: https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3 but I recommend to give a look at my config file below, since my approach for policyes is very nice.

First of all: Activate conntrack, docs says that is important and neccesary to get MWAN3 work properly, and it is needed to reboot:

Programmer Collective Nouns

An Enumerable of Rubyists @raganwald
An Indentation of Pythonistas @raganwald
A fold of Haskellers! @ReinH
A Din of Twitterers @raganwald
A callback of JavaScripters @irvingreid
An NCC-1701 of Java Programmers @raganwald
A relation of SQLers @raganwald

	# This is just a scratchpad after I hacked what I needed in an irb session
	require 'octokit'

	Octokit.configure do \|c\|
	c.login = 'searls'
	c.password = 'c0d3b4ssssss!'
	end

	client = Octokit::Client.new
	repos = client.repos #Note, for an org's repos, see `client.orgs.first.rels[:repos].get.data`

	upstream plex-upstream {
	# change plex-server.example.com:32400 to the hostname:port of your plex server.
	# this can be "localhost:32400", for instance, if Plex is running on the same server as nginx.
	server plex-server.example.com:32400;
	}

	server {
	listen 80;

	# server names for this server.

	var parser = document.createElement('a');
	parser.href = "http://example.com:3000/pathname/?search=test#hash";

	parser.protocol; // => "http:"
	parser.hostname; // => "example.com"
	parser.port; // => "3000"
	parser.pathname; // => "/pathname/"
	parser.search; // => "?search=test"
	parser.hash; // => "#hash"
	parser.host; // => "example.com:3000"

	/* Support for executing system calls in the context of the game process. */

	static const int injection_size = 4;

	static const char nop_code_bytes[injection_size] = {
	/* This is the byte pattern used to pad function
	addresses to multiples of 16 bytes. It consists
	of RET and a sequence of NOPs. The NOPs are not
	supposed to be used, so they can be overwritten. */
	0xC3, 0x90, 0x90, 0x90