Written by @xxrjun
感謝銘祐兄開給我更多的空間!
透過此文件記錄擴容過程。 Written by @xxrjun
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # verify_mnnvl_health.sh | |
| # Local verification for MNNVL compute tray & NVLink switch health. | |
| # - Runs all checks to completion; no early exit. | |
| # - Safe output mode ON by default (redacts IP/MAC/long HEX/IDs). | |
| # - Prints a final Summary and returns nonzero if any check failed. | |
| # | |
| # Reference: https://docs.nvidia.com/multi-node-nvlink-systems/mnnvl-user-guide/verifying.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env bash | |
| # original source: https://lambda.ai/blog/how-to-serve-deepseek-r1-v3-on-gh200 | |
| # runbootstrap: setup known_hosts and authorized_keys (for passwordless ssh, you need to have your public key in ~/.ssh/id_rsa.pub or ~/.ssh/id_ed25519.pub) | |
| # runip: run command on a specific ip (set ip=host:port or user@host:port) | |
| # runk: run command on the k-th ip in the ips file | |
| # runhead: run command on the first ip in the ips file | |
| # runips: run command on multiple ips in parallel (set ips="..." or read from ips_file) | |
| # runall: run command on all ips in the ips file |