Created
March 27, 2020 23:01
-
-
Save SteveBronder/0c94b465dbf4ef9dbccd3fedddae4d6d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /** | |
| * \ingroup opencl | |
| * \defgroup opencl_kernel_generator OpenCL Kernel Generator | |
| * | |
| * The OpenCL kernel generator is used to combine multiple matrix operations into a | |
| * single OpenCL kernel. This is much simpler than writing multi-operation kernels by | |
| * hand. | |
| * | |
| * Because global GPU memory loads and stores are relativly slow compared to | |
| * calculations in a kernel, using one kernel for multiple operations is faster than using one kernel | |
| * per operation. | |
| * | |
| * The kernel generator uses lazy evaluation. Each operation is represented by | |
| * an object derived from `operation_cl`. Such an object holds arguments of the | |
| * operations as well as meta information needed to generate calculations on the | |
| * arguments. Arguments to operations can be other operations, scalars | |
| * or `matrix_cl` objects. An operation is evaluated when either an operation is assigned | |
| * to a `matrix_cl` or a left-hand-side operation or `.eval()` is called. | |
| * | |
| * ## Defining a new kernel generator operation | |
| * | |
| * New kernel generator classes must satsify the conditions below: | |
| * | |
| * 1. The class must be derived from a class inheriting from `operation_cl`. | |
| * Optionally, if the operation should support being assigned to, it can be | |
| * derived from a class inheriting `operation_cl_lhs` instead. | |
| * 2. It's parent template arguments should be set to derived type, type of | |
| * scalar and types of any expression arguements. | |
| * 3. Member type `Scalar` should be defined as scalar type of the result of | |
| * the operation. | |
| * 4. Member function `generate` has the signature | |
| * ```cpp | |
| * inline kernel_parts generate(const std::string& i, const std::string& j, | |
| * const std::string& var_name_arg) | |
| * ``` | |
| * 5. Member function `view()` should return the correct `matrix_cl_view` after | |
| * applying the operation. For instance `transpose()` returns an `UPPER` View | |
| * if a `matrix_cl` with a `LOWER` view was the input. | |
| * 6. Member function `deep_copy` should make a copy of the expression. | |
| * Arguments that are operations should be copied by calling their `deep_copy`. | |
| * | |
| * The following functions can optionally be defined. Defaults are implemented in | |
| * `operation_cl`: | |
| * - `void modify_argument_indices(std::string& i, std::string& j)`: | |
| * - Modifies what indices are passed to argument's `generate()`. | |
| * - Default: No-op | |
| * - `void set_args(std::set<const operation_cl_base*>& generated, | |
| * cl::Kernel& kernel, int& arg_num)`: | |
| * - Sets additional kernel arguments. | |
| * - Default: Calls `set_args()` on arguments. | |
| * - `int rows()`: | |
| * - Returns Number of rows of the result. | |
| * - Default: Returns maximum of the arguments' rows. | |
| * - `int cols()`: | |
| * - Returns number of columns of the result. | |
| * - Default: Returns maximum of the arguments' columns. | |
| * - `int thread_rows()`: | |
| * - Number of threads required for this operation in rows direction. | |
| * - Default: returns `rows()`. | |
| * - `int thread_cols()`: | |
| * - Number of threads required for this operation in cols direction. | |
| * - Default: `cols()`. | |
| * - `int bottom_diagonal()`: | |
| * - Index of bottom nonzero diagonal of the result (0 is the diagonal, positive values are superdiagonals, negative | |
| * values are subdiagonals). | |
| * - Default: Returns minimum of applying `bottom_diagonal()` to arguments. | |
| * - `int top_diagonal()`: | |
| * - Index of top nonzero diagonal of the result (0 is the diagonal, positive values are superdiagonals, negative | |
| * values are subdiagonals). | |
| * - Default: Returns maximum of arguments `top_diagonal()`. | |
| * | |
| * If an operation should support being assigned to it should also define the | |
| * following: | |
| * | |
| * 1. Member function `generate_lhs` with same signature as `generate` | |
| * that returns generated code when the operation is assigned to. | |
| * | |
| * The below functions can be optionally defined for operations that support | |
| * being assigned to. Defaults are in `operation_cl_lhs`. | |
| * - `void set_view(int bottom_diagonal, int top_diagonal, int bottom_zero_diagonal, int top_zero_diagonal)`: | |
| * - Sets view of the underlying `matrix_cl` depending on where the extreme sub-/super-diagonals are written. | |
| * - Default: Calls `set_view` on arguments with same arguments. | |
| * - `void check_assign_dimensions(int rows, int cols)`: | |
| * - If the operation size can be modified, it should be set to given size. Otherwise it | |
| * should check that this operation's size matches given size. | |
| * - Default: By default calls `check_assign_dimensions` on arguments with same arguments. | |
| * | |
| * A new operation should also have a user-facing function that accepts | |
| * arguments to the operation and returns the operation object. Arguments should | |
| * be passed trough function `as_operation_cl` so that they are wrapped in | |
| * operations if they are not already operations. If the operation defines | |
| * `modify_argument_indices` this function should make copies of arguments by | |
| * calling `deep_copy()` on them internally. | |
| */ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment