DeepSeek-V3’s engineers optimized GPU performance at the low-level by tailoring kernels and memory access patterns to NVIDIA’s hardware. A key strategy was warp specialization: they partitioned a subset of GPU threads (warps) specifically for communication tasks, allowing compute to overlap with data transfers (DeepSeek-V3 Technical Report). In practice, only ~20 of the GPU’s Streaming Multiprocessors (SMs) were reserved to handle all cross-node communications – enough to saturate both InfiniBand (IB) and NVLink bandwidth – while the remaining SMs focused purely on computation (DeepSeek-V3 Technical Report) ([DeepSeek-V3 Technical Report](https://arx
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <iostream> | |
#include <cuda.h> | |
// CUDA Kernel | |
__global__ void add_arrays(float *a, float *b, float *c, int n) { | |
int idx = blockIdx.x * blockDim.x + threadIdx.x; | |
if (idx < n) { | |
c[idx] = a[idx] + b[idx]; | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ~/.tmux.conf | |
# General settings | |
set -g default-terminal "screen-256color" # Use 256-color terminal | |
set -g history-limit 5000 # Increase scrollback buffer size | |
set -g base-index 0 # Start window indexes at 0 | |
set -g mouse on # Enable mouse control (pane selection, resizing, scrolling) | |
# Restore original prefix key | |
set-option -g prefix C-b # Set prefix to Ctrl-b |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# script to provide a summary of the repositories by only listing each one's name | |
# along with its status (public, public with changes, or private) | |
import os | |
import subprocess | |
import json | |
from concurrent.futures import ThreadPoolExecutor, as_completed | |
def execute_command(command, cwd): | |
"""Executes a shell command in a specified directory and returns the output.""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
import torch | |
from transformers import AutoModelForCausalLM, AutoTokenizer | |
def load_model_and_tokenizer(model_id): | |
""" | |
Load the tokenizer and model based on the specified model ID. | |
Model is set to use float16 for computation to reduce memory usage and improve performance. | |
""" | |
tokenizer = AutoTokenizer.from_pretrained(model_id) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
from transformers import AutoTokenizer, AutoModelForCausalLM | |
from accelerate import Accelerator | |
import os | |
import argparse | |
def main(): | |
# Parse command line arguments | |
args = parse_args() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import json | |
import sys | |
import torch | |
import glob | |
def load_parameters(directory): | |
""" Load model parameters from a JSON file. """ | |
with open(os.path.join(directory, 'params.json'), 'r') as file: | |
return json.load(file) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
NewerOlder