Skip to content

Instantly share code, notes, and snippets.

@brycelelbach
Last active September 12, 2019 20:35
Show Gist options
  • Select an option

  • Save brycelelbach/731533a57562b0bde4a0626c14d77204 to your computer and use it in GitHub Desktop.

Select an option

Save brycelelbach/731533a57562b0bde4a0626c14d77204 to your computer and use it in GitHub Desktop.
/******************************************************************************
* Copyright (c) 2011, Duane Merrill. All rights reserved.
* Copyright (c) 2011-2018, NVIDIA CORPORATION. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of the NVIDIA CORPORATION nor the
* names of its contributors may be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
******************************************************************************/
/******************************************************************************
* Evaluates different tuning configurations of DeviceReduce.
*
* The best way to use this program:
* (1) Find the best all-around single-block tune for a given arch.
* For example, 100 samples [1 ..512], 100 timing iterations per config per sample:
* ./bin/tune_device_reduce_sm200_nvvm_5.0_abi_i386 --i=100 --s=100 --n=512 --single --device=0
* (2) Update the single tune in device_reduce.cuh
* (3) Find the best all-around multi-block tune for a given arch.
* For example, 100 samples [single-block tile-size .. 50,331,648], 100 timing iterations per config per sample:
* ./bin/tune_device_reduce_sm200_nvvm_5.0_abi_i386 --i=100 --s=100 --device=0
* (4) Update the multi-block tune in device_reduce.cuh
*
******************************************************************************/
// Ensure printing of CUDA runtime errors to console
#define CUB_STDERR
#include <vector>
#include <algorithm>
#include <stdio.h>
#include <cub/cub.cuh>
#include "../test/test_util.h"
using namespace cub;
using namespace std;
//---------------------------------------------------------------------
// Globals, constants and typedefs
//---------------------------------------------------------------------
#ifndef TUNE_ARCH
#define TUNE_ARCH 100
#endif
int g_max_items = 48 * 1024 * 1024;
int g_samples = 100;
int g_timing_iterations = 2;
bool g_verbose = false;
bool g_single = false;
bool g_verify = true;
CachingDeviceAllocator g_allocator;
//---------------------------------------------------------------------
// Host utility subroutines
//---------------------------------------------------------------------
/**
* Initialize problem
*/
template <typename T>
void Initialize(
GenMode gen_mode,
T *h_in,
int num_items)
{
for (int i = 0; i < num_items; ++i)
{
InitValue(gen_mode, h_in[i], i);
}
}
/**
* Sequential reduction
*/
template <typename T, typename ReductionOp>
T Reduce(
T *h_in,
ReductionOp reduction_op,
int num_items)
{
T retval = h_in[0];
for (int i = 1; i < num_items; ++i)
retval = reduction_op(retval, h_in[i]);
return retval;
}
//---------------------------------------------------------------------
// Full tile test generation
//---------------------------------------------------------------------
/**
* Wrapper structure for generating and
* running different tuning configurations
*/
template <
typename T,
typename OffsetT,
typename ReductionOp>
struct Schmoo
{
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment