The OpenCL Runtime
Command queues
_command_queue clCreateCommandQueueWithProperties (cl_context context, cl_device_id device, cl_queue_properties *properties, cl_int *errcode_ret)
creates a host or device command-queue on a specific device. device must be a device associated with context.
errcode_ret is set to CL_SUCCESS
CL_INVALID_CONTEXT if context is not a valid context.
CL_INVALID_DEVICE if device is not a valid device or is not associated with context.
CL_INVALID_VALUE if values specified in properties are not valid.
CL_INVALID_QUEUE_PROPERTIES if values specified in properties are valid but are not supported by the device.
CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.
CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the
OpenCL implementation on the host.
cl_int clRetainCommandQueue (cl_command_queue command_queue)
increments the command_queue reference count.
CL_SUCCESS
CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue.
CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.
CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.
cl_int clReleaseCommandQueue (cl_command_queue command_queue)
decrements the command_queue reference count.
CL_SUCCESS
CL_INVALID_COMMAND_QUEUE if command_queue is not a valid command-queue.
CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.
CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.
After the command_queue reference count becomes zero and all commands queued to command_queue have finished (eg. kernel executions, memory object updates etc.), the command-queue is deleted.
clReleaseCommandQueue performs an implicit flush to issue any previously queued OpenCL commands in command_queue.
cl_int clGetCommandQueueInfo (cl_command_queue command_queue, cl_command_queue_info param_name, void *param_value, size_t param_value_size, size_t *param_value_size_ret)
can be used to query information about a command-queue.
command_queue specifies the command-queue being queried. param_name specifies the information to query. param_value is a pointer to memory where the appropriate result being queried is returned. If param_value is NULL, it is ignored.
param_value_size is used to specify the size in bytes of memory pointed to by param_value. This size must be >= size of return type as described in table 5.2. If param_value is NULL, it is ignored. param_value_size_ret returns the actual size in bytes of data being queried by param_value. If param_value_size_ret is NULL, it is ignored.
5.2
A buffer object stores a one-dimensional collection of elements. Elements of a buffer object can be a scalar data type (such as an int, float), vector data type, or a user-defined structure.
A buffer object is created using the following function
cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret)
flag: ro, wo, rw
CL_MEM_READ_WRITE
CL_MEM_WRITE_ONLY
CL_MEM_READ_ONLY
CL_MEM_USE_HOST_PTR
CL_MEM_ALLOC_HOST_PTR
CL_MEM_COPY_HOST_PTR
CL_MEM_HOST_WRITE_ONLY
CL_MEM_HOST_READ_ONLY
CL_MEM_HOST_NO_ACCESS
size is the size in bytes of the buffer memory object to be allocated.
host_ptr is a pointer to the buffer data that may already be allocated by the application. The size of the buffer that host_ptr points to must be >= size bytes.
errcode_ret will return an appropriate error code. If errcode_ret is NULL, no error code is returned.
CL_INVALID_CONTEXT if context is not a valid context.
CL_INVALID_VALUE if values specified in flags are not valid as defined in table 5.3.
CL_INVALID_BUFFER_SIZE if size is 013.
CL_INVALID_HOST_PTR if host_ptr is NULL and CL_MEM_USE_HOST_PTR or
CL_MEM_COPY_HOST_PTR are set in flags or if host_ptr is not NULL but
CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in flags.
CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate memory for buffer object.
CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.
CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.
cl_mem clCreateSubBuffer (cl_mem buffer,
cl_mem_flags flags,
cl_buffer_create_type buffer_create_type,
const void *buffer_create_info,
cl_int *errcode_ret)
can be used to create a new buffer object (referred to as a sub-buffer object) from an existing buffer object.
5.2.2 Reading, Writing and Copying Buffer Objects
cl_int clEnqueueReadBuffer (cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_read,
size_t offset,
size_t size,
void *ptr,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
cl_int clEnqueueWriteBuffer (cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_write,
size_t offset,
size_t size,
const void *ptr,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
buffer refers to a valid buffer object.
blocking_read and blocking_write indicate if the read and write operations are blocking or non- blocking.
If blocking_read is CL_TRUE i.e. the read command is blocking, clEnqueueReadBuffer does not return until the buffer data has been read and copied into memory pointed to by ptr.
If blocking_read is CL_FALSE i.e. the read command is non-blocking, clEnqueueReadBuffer queues a non-blocking read command and returns. The contents of the buffer that ptr points to cannot be used until the read command has completed. The event argument returns an event object which can be used to query the execution status of the read command. When the read command has completed, the contents of the buffer that ptr points to can be used by the application.
cl_int clEnqueueReadBufferRect (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueWriteBufferRect (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, const size_t *buffer_origin, const size_t *host_origin, const size_t *region, size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch, size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
buffer_origin defines the (x, y, z) offset in the memory region associated with buffer. host_origin defines the (x, y, z) offset in the memory region pointed to by ptr. region defines the (width in bytes, height in rows, depth in slices) of the 2D or 3D rectangle being read or written. buffer_row_pitch is the length of each row in bytes to be used for the memory region associated with buffer. buffer_slice_pitch is the length of each 2D slice in bytes to be used for the memory region associated with buffer. host_row_pitch is the length of each row in bytes to be used for the memory region pointed to by ptr. host_slice_pitch is the length of each 2D slice in bytes to be used for the memory region pointed to by ptr.
cl_int clEnqueueCopyBuffer (cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer, size_t src_offset, size_t dst_offset, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
enqueues a command to copy a buffer object identified by src_buffer to another buffer object identified by dst_buffer.
Rect version cl_int clEnqueueCopyBufferRect (cl_command_queue command_queue, omitted...)
cl_int clEnqueueFillBuffer (cl_command_queue command_queue, cl_mem buffer, const void *pattern, size_t pattern_size, size_t offset, size_t size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
enqueues a command to fill a buffer object with a pattern of a given pattern size.
void * clEnqueueMapBuffer (cl_command_queue command_queue, cl_bool blocking_map, cl_uint num_events_in_wait_list, cl_mem buffer, cl_map_flags map_flags, size_t offset, size_t size, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)
**important?
enqueues a command to map a region of the buffer object given by buffer into the host address space and returns a pointer to this mapped region.
5.3
A 1D image, 1D image buffer, 1D image array, 2D image, 2D image array and 3D image object can be created using the following function
cl_mem clCreateImage (cl_context context, cl_mem_flags flags, const cl_image_format *image_format, const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret)
5.3.1.1 Image Format Descriptor
typedef struct _cl_image_format {
cl_channel_order image_channel_order;
cl_channel_type image_channel_data_type;
} cl_image_format;
typedef struct _cl_image_desc {
cl_mem_object_type image_type,
size_t image_width;
size_t image_height;
size_t image_depth;
size_t image_array_size;
size_t image_row_pitch;
size_t image_slice_pitch;
cl_uint num_mip_levels;
cl_uint num_samples;
cl_mem mem_object;
} cl_image_desc;
5.3.2 Querying List of Supported Image Formats cl_int clGetSupportedImageFormats (cl_context context, cl_mem_flags flags, cl_mem_object_type image_type, cl_uint num_entries, cl_image_format *image_formats, cl_uint *num_image_formats)
can be used to get the list of image formats supported by an OpenCL implementation when the following information about an image memory object is specified:
clGetSupportedImageFormats returns a union of image formats supported by all devices in the context.
Context Image type – 1D, 2D, or 3D image, 1D image buffer, 1D or 2D image array. Image object allocation information
5.3.3 Reading, Writing and Copying Image Objects
cl_int clEnqueueReadImage (cl_command_queue command_queue, omitted...) cl_int clEnqueueWriteImage (cl_command_queue, omitted...) cl_int clEnqueueCopyImage (cl_command_queue command_queue, cl_int clEnqueueFillImage (cl_command_queue command_queue,
5.3.5 Copying between Image and Buffer Objects
cl_int clEnqueueCopyImageToBuffer (cl_command_queue command_queue, cl_int clEnqueueCopyBufferToImage (cl_command_queue command_queue,
5.3.6 Mapping Image Objects
void * clEnqueueMapImage (cl_command_queue command_queue, cl_mem image, cl_bool blocking_map, cl_map_flags map_flags, const size_t *origin, const size_t *region, size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)
5.3.7 Image Object Queries
cl_int clGetImageInfo (cl_mem image, cl_image_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret)
image specifies the image object being queried.
5.4 pipes A pipe is a memory object that stores data organized as a FIFO. Pipe objects can only be accessed using built-in functions that read from and write to a pipe. Pipe objects are not accessible from the host. A pipe object encapsulates the following information:
5.4.1 Creating Pipe Objects
A pipe object is created using the following function
cl_mem clCreatePipe (cl_context context,
cl_mem_flags flags,
cl_uint pipe_packet_size,
cl_uint pipe_max_packets,
const cl_pipe_properties *properties,
cl_int *errcode_ret)
5.4.2 Pipe Object Queries
To get information that is common to all memory objects, use the clGetMemObjectInfo function described in section 5.5.5.
To get information specific to a pipe object created with clCreatePipe, use the following
cl_int clGetPipeInfo (cl_mem pipe,
cl_pipe_info param_name,
size_t param_value_size,
void *param_value,
size_t *param_value_size_ret)
5.5 Querying, Unmapping, Migrating, Retaining and
Releasing Memory Objects
5.5.1 Retaining and Releasing Memory Objects
cl_int clRetainMemObject (cl_mem memobj)
cl_int clReleaseMemObject (cl_mem memobj)
cl_int clSetMemObjectDestructorCallback (cl_mem memobj,
void (CL_CALLBACK *pfn_notify)(cl_mem memobj, void *user_data)
void *user_data),
cl_int clEnqueueUnmapMemObject ()
5.5.4 Migrating Memory Objects
cl_int clEnqueueMigrateMemObjects (cl_command_queue command_queue,
cl_int clGetMemObjectInfo (cl_mem memobj,
5.6 Shared Virtual Memory OpenCL 2.0 adds support for shared virtual memory (a.k.a. SVM). SVM allows the host and kernels executing on devices to directly share complex, pointer-containing data structures such as trees and linked lists. It also eliminates the need to marshal data between the host and devices. As a result, SVM substantially simplifies OpenCL programming and may improve performance.
Coarse-grained sharing: Coarse-grain sharing may be used for memory and virtual pointer sharing between multiple devices as well as between the host and one or more devices.
Fine-grained sharing: Shared virtual memory where memory consistency is maintained at a granularity smaller than a buffer. How fine-grained SVM is used depends on whether the device supports SVM atomic operations.
As an illustration of fine-grain SVM using SVM atomic operations to maintain memory consistency, consider the following example. The host and a set of devices can simultaneously access and update a shared work-queue data structure holding work-items to be done. The host can use atomic operations to insert new work-items into the queue at the same time as the devices using similar atomic operations to remove work-items for processing.
void * clSVMAlloc () clSVMAlloc returns a valid non-NULL shared virtual memory address if the SVM buffer is successfully allocated. Otherwise, like malloc, it returns a NULL pointer value.
void clSVMFree ()