Jetpack 3.2 includes Cuda 9 and CuDNN 7 so it is necessary to compile it from source.
sudo apt-get install openjdk-8-jdk
# -------------------------------------------------------- | |
# Camera sample code for Tegra X2/X1 | |
# | |
# This program could capture and display video from | |
# IP CAM, USB webcam, or the Tegra onboard camera. | |
# Refer to the following blog post for how to set up | |
# and run the code: | |
# https://jkjung-avt.github.io/tx2-camera-with-python/ | |
# | |
# Written by JK Jung <[email protected]> |
# Prerequisites | |
# 1. MSVC 2017 C++ Build Tools | |
# 2. CMAKE 3.0 or up | |
# 3. 64 bits of Windows | |
# 4. Anaconda / MiniConda 64 bits | |
# Prerequisites for CUDA | |
# 1. CUDA 8.0 or up | |
# 2. NVTX( in CUDA as Visual Studio Integration. if fail to install, you can extract | |
# the CUDA installer exe and found the NVTX installer under the CUDAVisualStudioIntegration) |
## Sublime Text 3 Serial key build is 3103 | |
—– BEGIN LICENSE —– | |
Michael Barnes | |
Single User License | |
EA7E-821385 | |
8A353C41 872A0D5C DF9B2950 AFF6F667 | |
C458EA6D 8EA3C286 98D1D650 131A97AB | |
AA919AEC EF20E143 B361B1E7 4C8B7F04 | |
B085E65E 2F5F5360 8489D422 FB8FC1AA |
This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
0x8545
: Original 84
-> 85
0x08FF19
: Original 75
-> EB
0x1932C7
: Original 75
-> 74
(remove UNREGISTERED in title bar, so no need to use a license)# Authors: Mathieu Blondel, Vlad Niculae | |
# License: BSD 3 clause | |
import numpy as np | |
def _gen_pairs(gen, max_iter, max_inner, random_state, verbose): | |
rng = np.random.RandomState(random_state) | |
# if tuple, interpret as randn |
A quick guide on how to setup X11 forwarding on macOS when using docker containers requiring a DISPLAY. Works on both Intel and M1 macs!
This guide was tested on:
On an Orin NX 16G the memory was too low to compile and the SWAP file had to be increased. | |
/etc/systemd/nvzramconfig.sh | |
change: | |
``` | |
# Calculate memory to use for zram (1/2 of ram) | |
totalmem=`LC_ALL=C free | grep -e "^Mem:" | sed -e 's/^Mem: *//' -e 's/ *.*//'` | |
mem=$((("${totalmem}" / 2 / "${NRDEVICES}") * 1024)) | |
``` |
# This is a modified version of TRL's `SFTTrainer` example (https://github.com/huggingface/trl/blob/main/examples/scripts/sft_trainer.py), | |
# adapted to run with DeepSpeed ZeRO-3 and Mistral-7B-V1.0. The settings below were run on 1 node of 8 x A100 (80GB) GPUs. | |
# | |
# Usage: | |
# - Install the latest transformers & accelerate versions: `pip install -U transformers accelerate` | |
# - Install deepspeed: `pip install deepspeed==0.9.5` | |
# - Install TRL from main: pip install git+https://github.com/huggingface/trl.git | |
# - Clone the repo: git clone github.com/huggingface/trl.git | |
# - Copy this Gist into trl/examples/scripts | |
# - Run from root of trl repo with: accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero3.yaml --gradient_accumulation_steps 8 examples/scripts/sft_trainer.py |