Skip to content

Instantly share code, notes, and snippets.

View 0x524c's full-sized avatar

0x524c 0x524c

  • Brazil
  • 22:45 (UTC -03:00)
  • X @0x524c
View GitHub Profile

Upgrading Kubernetes Cluster with Kops, and Things to Watch Out For

Alright! I'd like to apologize for the inactivity for over a year. Very embarrassingly, I totally dropped the good habit. Anyways, today I'd like to share a not so advanced and much shorter walkthrough on how to upgrade Kubernetes with kops.

At Buffer, we host our own k8s (Kubernetes for short) cluster on AWS EC2 instances since we started our journey before AWS EKS. To do this effectively, we use kops. It's an amazing tool that manages pretty much all aspects of cluster management from creation, upgrade, updates and deletions. It never failed us.

How to start?

Okay, upgrading a cluster always makes people nervous, especially a production cluster. Trust me, I've been there! There is a saying, hope is not a strategy. So instead of hoping things will go smoothly, I always have bias that shit will hit the fan if you skip testing. Plus, good luck explaining to people

@0x524c
0x524c / nginx_deployment.yaml
Created June 21, 2024 13:46 — forked from petitviolet/nginx_deployment.yaml
sample Nginx configuration on Kubernetes using ConfigMap to configure nginx.
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-conf
data:
nginx.conf: |
user nginx;
worker_processes 3;
error_log /var/log/nginx/error.log;
events {
@0x524c
0x524c / 8-concurrent-generations.md Aggregate throughput just over 2 tok/sec on R1 671B with 8 concurrent generations.

tl;dr;

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 200+GB weights. No it is not swap and won't kill your SSD's read/write cycle lifetime.

  • 8k context @ ~1.3 tok/sec single generation
  • 16k context @ ~0.93 tok/sec single generation
  • 2k context @ ~2.08 tok/sec with 8 parallel slots @ ~0.26 tok/sec each concurrently
  • 2k context @ ~2.13 tok/sec single generation after disabling GPU!

Notes and example generations below.

@0x524c
0x524c / tmux-cheat-sheet.md
Created February 18, 2025 03:31 — forked from michaellihs/tmux-cheat-sheet.md
tmux Cheat Sheet
@0x524c
0x524c / prompt.py
Created October 7, 2025 03:42 — forked from do-me/prompt.py
A single line to try out mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit on MacOS with mlx
import argparse
from mlx_lm import load, generate
# Parse CLI arguments
parser = argparse.ArgumentParser()
parser.add_argument("--prompt", type=str, default="hello", help="Custom prompt text")
parser.add_argument("--max-tokens", type=int, default=1024, help="Maximum number of tokens to generate")
args = parser.parse_args()
# Load model