Skip to content

Instantly share code, notes, and snippets.

@melMass
Last active December 10, 2024 20:46
Show Gist options
  • Save melMass/8365e938973741cadb8590f5786e179b to your computer and use it in GitHub Desktop.
Save melMass/8365e938973741cadb8590f5786e179b to your computer and use it in GitHub Desktop.

AI Framework Tensor Shape Reference Guide

This guide provides a comprehensive reference for tensor shapes across different AI frameworks and data types.

Important

These are common conventions, not enforced rules

Note

  • Batch dimension (B) is typically the first dimension when present
  • Some frameworks allow flexible dimension ordering through configuration
  • Shape conventions might vary based on specific functions or models within frameworks
  • Many frameworks support both channel-first and channel-last formats with configuration options

Common Notation

  • B: Batch size
  • C: Channels
  • T: Time steps / Sequence length
  • H: Height
  • W: Width
  • D: Depth
  • V: Vertices
  • F: Faces
  • S: Samples
  • Fr: Frames
  • M: Mel bands (for spectrograms)

Image Data

Framework Image Shape Batch Shape Channel Ordering Notes
NumPy (H, W), (H, W, C) (B, H, W, C) RGB C=1/3/4 for gray/RGB/RGBA
PyTorch (C, H, W) (B, C, H, W) RGB C=1/3/4 for gray/RGB/RGBA
TensorFlow (H, W, C) (B, H, W, C) RGB C=1/3/4 for gray/RGB/RGBA
Keras (H, W, C) (B, H, W, C) RGB Same as TF
OpenCV (H, W), (H, W, C) N/A BGR C=3 for BGR, grayscale is 2D
PIL (H, W), (H, W, C) N/A RGB size property is (W, H), but array is (H, W, C)
Matplotlib (H, W), (H, W, C) N/A RGB C=1/3/4 for gray/RGB/RGBA
Scikit-image (H, W), (H, W, C) N/A RGB C=1/3/4 for gray/RGB/RGBA

Audio Data

Framework Raw Audio Shape Spectrogram Shape Notes
Librosa (S,) (M, Fr) M=128 mel bands by default
Torchaudio (C, S) (C, M, Fr) C=1 for mono, C=2 for stereo, M=128 mel bands by default
TensorFlow (S,) (Fr, F) F=frequency bins (129 by default)
Soundfile (S, C) N/A C=1 for mono, C=2 for stereo

3D Mesh Data

Framework Vertices Shape Faces Shape Batch Shape Additional Attributes Notes
PyTorch3D (V, 3) (F, 3) (B, V, 3) Textures: (F, H, W, 3) or (V, 3)
Normals: (V, 3)
UV coords: (V, 2)
V = vertices, F = faces
Supports packed/padded representations
Open3D (V, 3) (F, 3) N/A Vertex normals: (V, 3)
Vertex colors: (V, 3)
Triangle normals: (F, 3)
Primarily for single mesh operations
Trimesh (V, 3) (F, 3) N/A Vertex normals: (V, 3)
Face normals: (F, 3)
UV coords: (V, 2)
Focuses on watertight meshes
Kaolin (V, 3) (F, 3) (B, V, 3) Face uvs: (F, 3, 2)
Vertex normals: (V, 3)
Face normals: (F, 3)
NVIDIA's 3D DL library

Video Data

Framework Common Shape Common Batch Shape Notes
PyTorch (C, Fr, H, W), (B, C, Fr, H, W), Flexible ordering, commonly channel-first for consistency with image processing.
(Fr, C, H, W) (B, Fr, C, H, W) Some models/datasets use frame-first convention
TensorFlow (Fr, H, W, C), (B, Fr, H, W, C), Flexible ordering, commonly channel-last for consistency with image processing.
(H, W, C, Fr) (B, H, W, C, Fr) Some models use different ordering for specific architectures
Torchvision (C, Fr, H, W) (B, C, Fr, H, W) Typically follows PyTorch's channel-first convention, but transformations can modify this
OpenCV (H, W, C) N/A Returns individual frames in BGR (C=3). VideoCapture reads frame by frame

Point Cloud Data

Framework Points Shape Batch Shape Notes
PyTorch3D (N, 3) (B, N, 3) N = number of points
Open3D (N, 3) N/A xyz coordinates
NumPy (N, 3) (B, N, 3) Basic representation

Text/NLP Data

Framework Text Shape Batch Shape Notes
PyTorch (T,) (B, T) T = sequence length
TensorFlow (T,) (B, T) -
HuggingFace (T,) (B, T) Often includes attention masks
SpaCy (T,) N/A Document objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment