This guide provides a comprehensive reference for tensor shapes across different AI frameworks and data types.
Important
These are common conventions, not enforced rules
Note
- Batch dimension (B) is typically the first dimension when present
- Some frameworks allow flexible dimension ordering through configuration
- Shape conventions might vary based on specific functions or models within frameworks
- Many frameworks support both channel-first and channel-last formats with configuration options
- B: Batch size
- C: Channels
- T: Time steps / Sequence length
- H: Height
- W: Width
- D: Depth
- V: Vertices
- F: Faces
- S: Samples
- Fr: Frames
- M: Mel bands (for spectrograms)
Framework | Image Shape | Batch Shape | Channel Ordering | Notes |
---|---|---|---|---|
NumPy | (H, W), (H, W, C) | (B, H, W, C) | RGB | C=1/3/4 for gray/RGB/RGBA |
PyTorch | (C, H, W) | (B, C, H, W) | RGB | C=1/3/4 for gray/RGB/RGBA |
TensorFlow | (H, W, C) | (B, H, W, C) | RGB | C=1/3/4 for gray/RGB/RGBA |
Keras | (H, W, C) | (B, H, W, C) | RGB | Same as TF |
OpenCV | (H, W), (H, W, C) | N/A | BGR | C=3 for BGR, grayscale is 2D |
PIL | (H, W), (H, W, C) | N/A | RGB | size property is (W, H), but array is (H, W, C) |
Matplotlib | (H, W), (H, W, C) | N/A | RGB | C=1/3/4 for gray/RGB/RGBA |
Scikit-image | (H, W), (H, W, C) | N/A | RGB | C=1/3/4 for gray/RGB/RGBA |
Framework | Raw Audio Shape | Spectrogram Shape | Notes |
---|---|---|---|
Librosa | (S,) | (M, Fr) | M=128 mel bands by default |
Torchaudio | (C, S) | (C, M, Fr) | C=1 for mono, C=2 for stereo, M=128 mel bands by default |
TensorFlow | (S,) | (Fr, F) | F=frequency bins (129 by default) |
Soundfile | (S, C) | N/A | C=1 for mono, C=2 for stereo |
Framework | Vertices Shape | Faces Shape | Batch Shape | Additional Attributes | Notes |
---|---|---|---|---|---|
PyTorch3D | (V, 3) | (F, 3) | (B, V, 3) | Textures: (F, H, W, 3) or (V, 3) Normals: (V, 3) UV coords: (V, 2) |
V = vertices, F = faces Supports packed/padded representations |
Open3D | (V, 3) | (F, 3) | N/A | Vertex normals: (V, 3) Vertex colors: (V, 3) Triangle normals: (F, 3) |
Primarily for single mesh operations |
Trimesh | (V, 3) | (F, 3) | N/A | Vertex normals: (V, 3) Face normals: (F, 3) UV coords: (V, 2) |
Focuses on watertight meshes |
Kaolin | (V, 3) | (F, 3) | (B, V, 3) | Face uvs: (F, 3, 2) Vertex normals: (V, 3) Face normals: (F, 3) |
NVIDIA's 3D DL library |
Framework | Common Shape | Common Batch Shape | Notes |
---|---|---|---|
PyTorch | (C, Fr, H, W), | (B, C, Fr, H, W), | Flexible ordering, commonly channel-first for consistency with image processing. |
(Fr, C, H, W) | (B, Fr, C, H, W) | Some models/datasets use frame-first convention | |
TensorFlow | (Fr, H, W, C), | (B, Fr, H, W, C), | Flexible ordering, commonly channel-last for consistency with image processing. |
(H, W, C, Fr) | (B, H, W, C, Fr) | Some models use different ordering for specific architectures | |
Torchvision | (C, Fr, H, W) | (B, C, Fr, H, W) | Typically follows PyTorch's channel-first convention, but transformations can modify this |
OpenCV | (H, W, C) | N/A | Returns individual frames in BGR (C=3). VideoCapture reads frame by frame |
Framework | Points Shape | Batch Shape | Notes |
---|---|---|---|
PyTorch3D | (N, 3) | (B, N, 3) | N = number of points |
Open3D | (N, 3) | N/A | xyz coordinates |
NumPy | (N, 3) | (B, N, 3) | Basic representation |
Framework | Text Shape | Batch Shape | Notes |
---|---|---|---|
PyTorch | (T,) | (B, T) | T = sequence length |
TensorFlow | (T,) | (B, T) | - |
HuggingFace | (T,) | (B, T) | Often includes attention masks |
SpaCy | (T,) | N/A | Document objects |