melMass/image shapes.md

Last active December 10, 2024 20:46

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/melMass/8365e938973741cadb8590f5786e179b.js"></script>
Save melMass/8365e938973741cadb8590f5786e179b to your computer and use it in GitHub Desktop.

Download ZIP

Raw

image shapes.md

AI Framework Tensor Shape Reference Guide

This guide provides a comprehensive reference for tensor shapes across different AI frameworks and data types.

Important

These are common conventions, not enforced rules

Note

Batch dimension (B) is typically the first dimension when present
Some frameworks allow flexible dimension ordering through configuration
Shape conventions might vary based on specific functions or models within frameworks
Many frameworks support both channel-first and channel-last formats with configuration options

Common Notation

B: Batch size
C: Channels
T: Time steps / Sequence length
H: Height
W: Width
D: Depth
V: Vertices
F: Faces
S: Samples
Fr: Frames
M: Mel bands (for spectrograms)

Image Data

Framework	Image Shape	Batch Shape	Channel Ordering	Notes
NumPy	(H, W), (H, W, C)	(B, H, W, C)	RGB	C=1/3/4 for gray/RGB/RGBA
PyTorch	(C, H, W)	(B, C, H, W)	RGB	C=1/3/4 for gray/RGB/RGBA
TensorFlow	(H, W, C)	(B, H, W, C)	RGB	C=1/3/4 for gray/RGB/RGBA
Keras	(H, W, C)	(B, H, W, C)	RGB	Same as TF
OpenCV	(H, W), (H, W, C)	N/A	BGR	C=3 for BGR, grayscale is 2D
PIL	(H, W), (H, W, C)	N/A	RGB	size property is (W, H), but array is (H, W, C)
Matplotlib	(H, W), (H, W, C)	N/A	RGB	C=1/3/4 for gray/RGB/RGBA
Scikit-image	(H, W), (H, W, C)	N/A	RGB	C=1/3/4 for gray/RGB/RGBA

Audio Data

Framework	Raw Audio Shape	Spectrogram Shape	Notes
Librosa	(S,)	(M, Fr)	M=128 mel bands by default
Torchaudio	(C, S)	(C, M, Fr)	C=1 for mono, C=2 for stereo, M=128 mel bands by default
TensorFlow	(S,)	(Fr, F)	F=frequency bins (129 by default)
Soundfile	(S, C)	N/A	C=1 for mono, C=2 for stereo

3D Mesh Data

Framework	Vertices Shape	Faces Shape	Batch Shape	Additional Attributes	Notes
PyTorch3D	(V, 3)	(F, 3)	(B, V, 3)	Textures: (F, H, W, 3) or (V, 3) Normals: (V, 3) UV coords: (V, 2)	V = vertices, F = faces Supports packed/padded representations
Open3D	(V, 3)	(F, 3)	N/A	Vertex normals: (V, 3) Vertex colors: (V, 3) Triangle normals: (F, 3)	Primarily for single mesh operations
Trimesh	(V, 3)	(F, 3)	N/A	Vertex normals: (V, 3) Face normals: (F, 3) UV coords: (V, 2)	Focuses on watertight meshes
Kaolin	(V, 3)	(F, 3)	(B, V, 3)	Face uvs: (F, 3, 2) Vertex normals: (V, 3) Face normals: (F, 3)	NVIDIA's 3D DL library

Video Data

Framework	Common Shape	Common Batch Shape	Notes
PyTorch	(C, Fr, H, W),	(B, C, Fr, H, W),	Flexible ordering, commonly channel-first for consistency with image processing.
	(Fr, C, H, W)	(B, Fr, C, H, W)	Some models/datasets use frame-first convention
TensorFlow	(Fr, H, W, C),	(B, Fr, H, W, C),	Flexible ordering, commonly channel-last for consistency with image processing.
	(H, W, C, Fr)	(B, H, W, C, Fr)	Some models use different ordering for specific architectures
Torchvision	(C, Fr, H, W)	(B, C, Fr, H, W)	Typically follows PyTorch's channel-first convention, but transformations can modify this
OpenCV	(H, W, C)	N/A	Returns individual frames in BGR (C=3). VideoCapture reads frame by frame

Point Cloud Data

Framework	Points Shape	Batch Shape	Notes
PyTorch3D	(N, 3)	(B, N, 3)	N = number of points
Open3D	(N, 3)	N/A	xyz coordinates
NumPy	(N, 3)	(B, N, 3)	Basic representation

Text/NLP Data

Framework	Text Shape	Batch Shape	Notes
PyTorch	(T,)	(B, T)	T = sequence length
TensorFlow	(T,)	(B, T)	-
HuggingFace	(T,)	(B, T)	Often includes attention masks
SpaCy	(T,)	N/A	Document objects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment