0x524c 0x524c

Upgrading Kubernetes Cluster with Kops, and Things to Watch Out For

Alright! I'd like to apologize for the inactivity for over a year. Very embarrassingly, I totally dropped the good habit. Anyways, today I'd like to share a not so advanced and much shorter walkthrough on how to upgrade Kubernetes with kops.

At Buffer, we host our own k8s (Kubernetes for short) cluster on AWS EC2 instances since we started our journey before AWS EKS. To do this effectively, we use kops. It's an amazing tool that manages pretty much all aspects of cluster management from creation, upgrade, updates and deletions. It never failed us.

How to start?

Okay, upgrading a cluster always makes people nervous, especially a production cluster. Trust me, I've been there! There is a saying, hope is not a strategy. So instead of hoping things will go smoothly, I always have bias that shit will hit the fan if you skip testing. Plus, good luck explaining to people

tl;dr;

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 200+GB weights. No it is not swap and won't kill your SSD's read/write cycle lifetime.

8k context @ ~1.3 tok/sec single generation
16k context @ ~0.93 tok/sec single generation
2k context @ ~2.08 tok/sec with 8 parallel slots @ ~0.26 tok/sec each concurrently
2k context @ ~2.13 tok/sec single generation after disabling GPU!

Notes and example generations below.

tmux Cheat Sheet

Table of Contents

General Usage
Shortcuts
Commands
Scripting tmux
Configuring tmux

	apiVersion: v1
	kind: ConfigMap
	metadata:
	name: nginx-conf
	data:
	nginx.conf: \|
	user nginx;
	worker_processes 3;
	error_log /var/log/nginx/error.log;
	events {

	import argparse
	from mlx_lm import load, generate

	# Parse CLI arguments
	parser = argparse.ArgumentParser()
	parser.add_argument("--prompt", type=str, default="hello", help="Custom prompt text")
	parser.add_argument("--max-tokens", type=int, default=1024, help="Maximum number of tokens to generate")
	args = parser.parse_args()

	# Load model