Skip to content

Instantly share code, notes, and snippets.

View romilbhardwaj's full-sized avatar

Romil Bhardwaj romilbhardwaj

View GitHub Profile
@romilbhardwaj
romilbhardwaj / boto3_ec2_bench.py
Created September 6, 2022 17:53
Script to benchmark time to SSH into an EC2 instance using boto3
import boto3
import paramiko
import time
# AmzLinux Deep learning AMI: ami-0407d462d1919a19a
# Amazon linux AMI: ami-0c2ab3b8efb09f272
private_key_path = '<PATH_TO_PEM>' # Set to the path of your private key
aws_keypair = '<KEYPAIR_NAME>' # Should already exist in your AWS account
ami = 'ami-0407d462d1919a19a'
@romilbhardwaj
romilbhardwaj / SkyPilotLocalGPUs.md
Last active December 21, 2023 02:56
Using local GPUs with SkyPilot + Kubernetes
@romilbhardwaj
romilbhardwaj / setup.sh
Created January 5, 2024 17:23
Sky Local Up prerequisite installation
sudo usermod -aG docker $USER
# Install KIND
# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Install Kubectl
@romilbhardwaj
romilbhardwaj / RKE2Guide.md
Created January 26, 2024 16:33
RKE2 Installation Guide

Rancher Installation Guide for SkyPilot

Rancher should be super easy to use to setup a k8s cluster. It gives you one line commands to run on your worker nodes to connect them to your k8s cluster.

Their official docs are a little overwhelming, so here's a compressed guide:

  1. On the master node (lets call it master), run sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher . This will launch the rancher web UI.
  2. To access this from your laptop, you may need to port forward port 443 to your laptop. Do so with ssh -L 8888:localhost:443 test
  3. Open localhost:8888 in your browser. Complete the initial setup and set any password for the webui. It will also ask for an address accessible from all nodes. This field will be prepopulated with https://localhost:8888. Change this to the ip address of the node (get with ip a, should be accessible from other nodes as well) and don't specify port. E.g., https://10.35.0.21/.
  4. You'll see the rancher UI. Click on Create clus
@romilbhardwaj
romilbhardwaj / policy.yaml
Created August 21, 2024 21:26
Kyverno Policy for removing smarter-devices/fuse resource when running on DWS.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: mutate-dws-pod-template
annotations:
policies.kyverno.io/title: Remove incompatible resources from PodTemplates
policies.kyverno.io/category: Other
policies.kyverno.io/severity: low
kyverno.io/kyverno-version: 1.10.1
policies.kyverno.io/minversion: 1.6.0
https://github.com/skypilot-org/skypilot/blob/master/examples/torch_ddp_benchmark/torch_ddp_benchmark.yaml
2x A100:8 nodes on GCP.
$ sky launch -c a100 examples/torch_ddp_benchmark/torch_ddp_benchmark.yaml
With gVNIC
(head, rank=0, pid=7056) -----------------------------------
(head, rank=0, pid=7056) PyTorch distributed benchmark suite
(head, rank=0, pid=7056) -----------------------------------
(head, rank=0, pid=7056)
(head, rank=0, pid=7056) * PyTorch version: 2.4.1+cu121