Metal Programming Guide

Metal Programming Guide

The Metal framework supports GPU-accelerated advanced 3D graphics rendering and data-parallel computation workloads. Metal provides a modern and streamlined API for fine-grained, low-level control of the organization, processing and submission of graphic and computation commands, as well as the management of the associated data and resources for these commands. A primary goal of Metal is to minimize the CPU overhead incurred by executing GPU workloads.

Command submission model
Memory management model
Use of independently compiled code for graphic shader
Data-parallel computation functions
Use of Metal API

Fundamental Metal Concepts

Metal provides a single, unified programming interface and language for both graphics and data-parallel computation workloads. Metal enables you to integrate graphics and computation tasks much more efficiently without needing to use separate APIs and shader languages.

Metal framework provides

Low-overhead interface
- You get control over the asynchronous behavior of the GPU for efficient multithreading used to create and commit command buffers in parallel.
Memory and resource menagement
Integrated support for both graphic and compute operations
Precompiled shaders

Command Organization and Execution Model

MTLDevice

MTLDevice protocol defines the interface that represents a single GPU.
A command queue consists of a queue of command buffers, and a command queue organizes the order of execution of those command buffers.
A command buffer contains encoded commands that are intended for execution on a particular device.
A command encoder appends rendering, computing, and blitting commands onto a command buffer, and those command buffers are eventually committed for execution on the device.

MTLCommandQueue

The MTLCommandQueue protocol defines an interface for command queues, primarily supporting methods for creating command buffer objects.
The MTLCommandBuffer protocol defines an interface for command buffers and provides methods for creating command encoders, enqueueing command buffers for execution, checking status, and other operations.
Encoder types
- The MTLRenderCommandEncoder protocol encodes graphics (3D) rendering commands for a single rendering pass.
- The MTLComputeCommandEncoder protocol encodes data-parallel computation workloads.
- The MTLBlitCommandEncoder protocol encodes simple copy operations between buffers and textures, as well as utility operations like mipmap generation.
At any point in time, only a single command encoder can be active and append commands into a command buffer.
Each command encoder must be ended before another command encoder can be created for use with the same command buffer.
The one exception to the “one active command encoder for each command buffer” rule is the MTLParallelRenderCommandEncoder protocol, discussed in Encoding a Single Rendering Pass Using Multiple Threads.

The Device Object Represents a GPU

The MTLDevice object represent a GPU that can execute commands. The MTLDevice protocol has methods to create new command queues, to allocate buffers from memory, to create textures, and to make queries about the device’s capabilities.

let defaultDevice = MTLCreateSystemDefaultDevice()

Transient and Non-transient Objects in Metal

Command buffer and command encoder objects are transient and designed for a single use. They are very inexpensive to allocate and deallocate, so their creation methods return autoreleased objects.
The following objects are not transient. Reuse these objects in performance sensitive code, and avoid creating them repeatedly.
- Command queues
- Data buffers
- Textures
- Sampler states
- Libraries
- Compute states
- Render pipeline states
- Depth/stencil states

Command Queue

A command queue accepts an ordered list of command buffers that the GPU will execute.
All command buffers sent to a single queue are guaranteed to execute in the order in which the command buffers were enqueued.
In general, command queues are expected to be long-lived, so they should not be repeatedly created and destroyed.

let commandQueue = defaultDevice.makeCommandQueue()

Command Buffer

A command buffer stores encoded commands until the buffer is committed for execution by the GPU.
A single command buffer can contain many different kinds of encoded commands, depending on the number and type of encoders that are used to build it.
In a typical app, an entire frame of rendering is encoded into a single command buffer, even if rendering that frame involves multiple rendering passes, compute processing functions, or blit operations.
Command buffers are transient single-use objects and do not support reuse.
Once a command buffer has been committed for execution, the only valid operations are to wait for the command buffer to be scheduled or completed—through synchronous calls or handler blocks discussed in Registering Handler Blocks for Command Buffer Execution—and to check the status of the command buffer execution.
Command buffers also represent the only independently trackable unit of work by the app, and they define the coherency boundaries established by the Metal memory model, as detailed in Resource Objects: Buffers and Textures.

Creating a Command Buffer

let commandBuffer = queue.makeCommandBuffer()
let commandBuffer = queue.makeCommandBufferWithUnretainedReferences() // only for extremely performance-critical apps

Executing Commands

A command buffer does not begin execution until it is committed.
Once committed, command buffers are executed in the order in which they were enqueued.
The enqueue method reserves a place for the command buffer on the command queue, but does not commit the command buffer for execution. When this command buffer is eventually committed, it is executed after any previously enqueued command buffers within the associated command queue.
The commit method causes the command buffer to be executed as soon as possible, but after any previously enqueued command buffers in the same command queue are committed. If the command buffer has not previously been enqueued, commit makes an implied enqueue call.

commandBuffer.enqueue()
commandBuffer.commit()

Registering Handler Blocks for Command Buffer Execution

The MTLCommandBuffer methods listed below monitor command execution. Scheduled and completed handlers are invoked in execution order on an undefined thread. Any code you execute in these handlers should complete quickly; if expensive or blocking work needs to be done, defer that work to another thread.

// should be called before calling commit() method
commandBuffer?.addCompletedHandler({ commandBuffer in })
commandBuffer?.addScheduledHandler({ commandBuffer in })

Monitoring Command Buffer Execution Status

The read-only status property contains a MTLCommandBufferStatus enum value listed in Command Buffer Status Codes that reflects the current scheduling stage in the lifetime of this command buffer.
If execution finishes successfully, the value of the read-only error property is nil. If execution fails, then status is set to MTLCommandBufferStatusError, and the error property may contain a value listed in Command Buffer Error Codes that indicates the cause of the failure.

Command Encoder

A command encoder is a transient object that you use once to write commands and state into a single command buffer in a format that the GPU can execute.
Many command encoder object methods append commands onto the command buffer.
While a command encoder is active, it has the exclusive right to append commands for its command buffer.
Once you finish encoding commands, call the endEncoding method. To write further commands, create a new command encoder.

Creating a Command Encoder Object

Because a command encoder appends commands into a specific command buffer, you create a command encoder by requesting one from the MTLCommandBuffer object you want to use it with. Use the following MTLCommandBuffer methods to create command encoders of each type:
- The renderCommandEncoderWithDescriptor: method creates a MTLRenderCommandEncoder object for graphics rendering to an attachment in a MTLRenderPassDescriptor.
- The computeCommandEncoder method creates a MTLComputeCommandEncoder object for data-parallel computations.
- The blitCommandEncoder method creates a MTLBlitCommandEncoder object for memory operations.
- The parallelRenderCommandEncoderWithDescriptor: method creates a MTLParallelRenderCommandEncoder object that enables several MTLRenderCommandEncoder objects to run on different threads while still rendering to an attachment that is specified in a shared MTLRenderPassDescriptor.

Render Command Encoder

Graphics rendering can be described in terms of a rendering pass. A MTLRenderCommandEncoder object represents the rendering state and drawing commands associated with a single rendering pass. A MTLRenderCommandEncoder requires an associated MTLRenderPassDescriptor (described in Creating a Render Pass Descriptor) that includes the color, depth, and stencil attachments that serve as destinations for rendering commands.

Specify graphics resources, such as buffer and texture objects, that contain vertex, fragment, or texture image data
Specify a MTLRenderPipelineState object that contains compiled rendering state, including vertex and fragment shaders
Specify fixed-function state, including viewport, triangle fill mode, scissor rectangle, depth and stencil tests, and other values
Draw 3D primitives

let commandEncoder = commandBuffer?.makeComputeCommandEncoder()

Compute Command Encoder

For data-parallel computing, the MTLComputeCommandEncoder protocol provides methods to encode commands in the command buffer that can specify the compute function and its arguments (for example, texture, buffer, and sampler state) and dispatch the compute function for execution.
To create a compute command encoder object, use the computeCommandEncoder method of MTLCommandBuffer.

Multiple Threads, Command Buffers, and Command Encoders

Most apps use a single thread to encode the rendering commands for a single frame in a single command buffer.
At the end of each frame, you commit the command buffer, which both schedules and begins command execution.

Resource Objects: Buffers and Textures

Metal resource objects (MTLResource) for storing unformatted memory and formatted image data. There are two types of MTLResource objects:
- MTLBuffer represents an allocation of unformatted memory that can contain any type of data. Buffers are often used for vertex, shader, and compute state data.
- MTLTexture represents an allocation of formatted image data with a specified texture type and pixel format. Texture objects are used as source textures for vertex, fragment, or compute functions, as well as to store graphics rendering output (that is, as an attachment).

Buffers Are Typeless Allocations of Memory

A MTLBuffer object represents an allocation of memory that can contain any type of data.

Creating a Buffer Object

The following MTLDevice methods create and return a MTLBuffer object:

Textures Are Formatted Image Data

A MTLTexture object represents an allocation of formatted image data that can be used as a resource for a vertex shader, fragment shader, or compute function, or as an attachment to be used as a rendering destination.
A MTLTexture object can have one of the following structures:
- A 1D, 2D, or 3D image
- An array of 1D or 2D images
- A cube of six 2D images

Creating a Texture Object with a Texture Descriptor

MTLTextureDescriptor defines the properties that are used to create a MTLTexture object, including its image size (width, height, and depth), pixel format, arrangement (array or cube type) and number of mipmaps.
The MTLTextureDescriptor properties are only used during the creation of a MTLTexture object. After you create a MTLTexture object, property changes in its MTLTextureDescriptor object no longer have any effect on that texture.
To create one or more textures from a descriptor:
1. Create a custom MTLTextureDescriptor object that contains texture properties that describe the texture data:
2. Create a texture from the MTLTextureDescriptor object by calling the newTextureWithDescriptor: method of a MTLDevice object. After texture creation, call the replaceRegion:mipmapLevel:slice:withBytes:bytesPerRow:bytesPerImage: method to load the texture image data.
3. To create more MTLTexture objects, you can reuse the same MTLTextureDescriptor object, modifying the descriptor’s property values as needed.

Working with Texture Slices

A slice is a single 1D, 2D, or 3D texture image and all its associated mipmaps. For each slice:
All texture objects have at least one slice; cube and array texture types may have several slices.

Pixel Formats for Textures

MTLPixelFormat specifies the organization of color, depth, and stencil data storage in individual pixels of a MTLTexture object. There are three varieties of pixel formats: ordinary, packed, and compressed.
For example, MTLPixelFormatRGBA8Unorm is a 32-bit format with eight bits for each color component; the lowest addresses contains red,

Creating a Sampler States Object for Texture Lookup

A MTLSamplerState object defines the addressing, filtering, and other properties that are used when a graphics or compute function performs texture sampling operations on a MTLTexture object.

let desc = MTLSamplerDescriptor()
desc.minFilter = MTLSamplerMinMagFilter.linear
desc.magFilter = MTLSamplerMinMagFilter.linear
desc.sAddressMode = MTLSamplerAddressMode.repeat
desc.tAddressMode = MTLSamplerAddressMode.repeat
// all properties below have default values
desc.mipFilter = MTLSamplerMipFilter.notMipmapped
desc.maxAnisotropy = 1
desc.normalizedCoordinates = true
desc.lodMinClamp = 0.0
desc.lodMaxClamp = Float.greatestFiniteMagnitude
// create MTLSamplerState
let sampler = device.makeSamplerState(descriptor: desc)

Functions and Libraries

link

`MTLFunction` Represents a Shader or Compute Function

A MTLFunction object represents a single function that is written in the Metal shading language and executed on the GPU as part of a graphics or compute pipeline.

defaultLibrary.makeFunction(name: "vertexPassThrough")

To pass data or state between the Metal runtime and a graphics or compute function written in the Metal shading language, you assign an argument index for textures, buffers, and samplers.
- The argument index identifies which texture, buffer, or sampler is being referenced by both the Metal runtime and Metal shading code.
For a rendering pass, you specify a MTLFunction object for use as a vertex or fragment shader in a MTLRenderPipelineDescriptor object, as detailed in Creating a Render Pipeline State.
For a compute pass, you specify a MTLFunction object when creating a MTLComputePipelineState object for a target device, as described in Specify a Compute State and Resources for a Compute Command Encoder.

A Library Is a Repository of Functions

A MTLLibrary object represents a repository of one or more MTLFunction objects. A single MTLFunction object represents one Metal function that has been written with the shading language.
In the Metal shading language source code, any function that uses a Metal function qualifier (vertex, fragment, or kernel) can be represented by a MTLFunction object in a library.
A Metal function without one of these function qualifiers cannot be directly represented by a MTLFunction object, although it can called by another function within the shader.
The MTLFunction objects in a library can be created from either of these sources:
- Metal shading language code that was compiled into a binary library format during the app build process.
- A text string containing Metal shading language source code that is compiled by the app at runtime.

Creating a Library from Compiled Code

For the best performance, compile your Metal shading language source code into a library file during your app's build process in Xcode, which avoids the costs of compiling function source during the runtime of your app.
To create a MTLLibrary object from a library binary, call one of the following methods of MTLDevice:
- newDefaultLibrary retrieves a library built for the main bundle that contains all shader and compute functions in an app’s Xcode project.
- newLibraryWithFile:error: takes the path to a library file and returns a MTLLibrary object that contains all the functions stored in that library file.
- newLibraryWithData:error: takes a binary blob containing code for the functions in a library and returns a MTLLibrary object.

Creating a Library from Source Code

To create a MTLLibrary from a string of Metal shading language source code that may contain several functions, call one of the following methods of MTLDevice.
These methods compile the source code when the library is created. To specify the compiler options to use, set the properties in a MTLCompileOptions object.
- newLibraryWithSource:options:error: synchronously compiles source code from the input string to create MTLFunction objects and then returns a MTLLibrary object that contains them.
- newLibraryWithSource:options:completionHandler: asynchronously compiles source code from the input string to create MTLFunction objects and then returns a MTLLibrary object that contains them. completionHandler is a block of code that is invoked when object creation is completed.

Getting a Function from a Library

The newFunctionWithName: method of MTLLibrary returns a MTLFunction object with the requested name. If the name of a function that uses a Metal shading language function qualifier is not found in the library, then newFunctionWithName: returns nil.

defaultLibrary.makeFunction(name: "vertexPassThrough")

Determining Function Details at Runtime

Because the actual contents of a MTLFunction object are defined by a graphics shader or compute function that may be compiled before the MTLFunction object was created, its source code might not be directly available to the app. You can query the following MTLFunction properties at run time:
name, a string with the name of the function.
functionType, which indicates whether the function is declared as a vertex, fragment, or compute function.
vertexAttributes, an array of MTLVertexAttribute objects that describe how vertex attribute data is organized in memory and how it is mapped to vertex function arguments.

Graphics Rendering: Render Command Encoder

Metal Graphics Rendering Pipeline

A MTLRenderCommandEncoder object represents a single rendering command encoder
A MTLParallelRenderCommandEncoder object enables a single rendering pass to be broken into a number of separate MTLRenderCommandEncoder objects, each of which may be assigned to a different thread.

Creating and Using a Render Command Encoder

Create a MTLRenderPassDescriptor object to define a collection of attachments that serve as the rendering destination for the graphics commands in the command buffer for that rendering pass. Typically, you create a MTLRenderPassDescriptor object once and reuse it each time your app renders a frame. See Creating a Render Pass Descriptor.
Create a MTLRenderCommandEncoder object by calling the renderCommandEncoderWithDescriptor: method of MTLCommandBuffer with the specified render pass descriptor. See Using the Render Pass Descriptor to Create a Render Command Encoder.
Create a MTLRenderPipelineState object to define the state of the graphics rendering pipeline (including shaders, blending, multisampling, and visibility testing) for one or more draw calls. To use this render pipeline state for drawing primitives, call the setRenderPipelineState: method of MTLRenderCommandEncoder. For details, see Creating a Render Pipeline State.
Set textures, buffers, and samplers to be used by the render command encoder, as described in Specifying Resources for a Render Command Encoder.
Call MTLRenderCommandEncoder methods to specify additional fixed-function state, including the depth and stencil state, as explained in Fixed-Function State Operations.
Finally, call MTLRenderCommandEncoder methods to draw graphics primitives, as described in Drawing Geometric Primitives.

Creating a Render Pass Descriptor

Creating a Render Pipeline State

to define the graphics state for any draw calls. https://developer.apple.com/documentation/metal/mtlrenderpipelinestate

trilliwon/metal-study-log.md

Metal Programming Guide

Fundamental Metal Concepts

Metal framework provides

Command Organization and Execution Model

MTLDevice

MTLCommandQueue

The Device Object Represents a GPU

Transient and Non-transient Objects in Metal

Command Queue

Command Buffer

Creating a Command Buffer

Executing Commands

Registering Handler Blocks for Command Buffer Execution

Monitoring Command Buffer Execution Status

Command Encoder

Creating a Command Encoder Object

Render Command Encoder

Compute Command Encoder

Multiple Threads, Command Buffers, and Command Encoders

Resource Objects: Buffers and Textures

Buffers Are Typeless Allocations of Memory

Creating a Buffer Object

Textures Are Formatted Image Data

Creating a Texture Object with a Texture Descriptor

Working with Texture Slices

Pixel Formats for Textures

Creating a Sampler States Object for Texture Lookup

Functions and Libraries

MTLFunction Represents a Shader or Compute Function

A Library Is a Repository of Functions

Creating a Library from Compiled Code

Creating a Library from Source Code

Getting a Function from a Library

Determining Function Details at Runtime

Graphics Rendering: Render Command Encoder

Creating and Using a Render Command Encoder

Creating a Render Pass Descriptor

Creating a Render Pipeline State

Specifying Resources for a Render Command Encoder

`MTLFunction` Represents a Shader or Compute Function