Last active
January 19, 2025 12:51
-
-
Save frogcjn/c74c6955a90cfb4b2f75cec7de11bb4e to your computer and use it in GitHub Desktop.
This is for Equivbench prompts template
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
true_label: "YES" | |
false_label: "NO" | |
true_keywords: | |
- "YES" | |
- "equivalent" | |
false_keywords: | |
- "NO" | |
- "inequivalent" | |
DCE: | |
ZERO: | | |
You are here to judge if two programs are functionally equivalent. | |
Here equivalence means that, when run on the same input, the two programs always have the same program state at all corresponding points reachable by program execution. | |
[Program 1]: | |
{program_1_code} | |
[Program 2]: | |
{program_2_code} | |
What I expect from your answer: | |
Do not output any thoughts, just the answer. | |
Whether these two programs are equivalent or not. You should output {true_label} or {false_label} in the end. | |
ZERO_COT: | | |
You are here to judge if two programs are functionally equivalent. | |
Here equivalence means that, when run on the same input, the two programs always have the same program state at all corresponding points reachable by program execution. | |
[Program 1]: | |
{program_1_code} | |
[Program 2]: | |
{program_2_code} | |
What I expect from your answer: | |
1. Output any thhoughts you have about the process to the answer. You should have a clear reasoning explanation: Which aspects of the code indicate whether the kernels preserve the same computation? | |
2. whether these two programs are equivalent or not. You should output {true_label} or {false_label} in the end. | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
TVM: | |
ZERO: | | |
I have two CUDA kernels. I need to determine if these two kernels are functionally equivalent—that is, whether they produce identical results for all valid inputs. | |
Your task: Inspect the given CUDA kernel source codes. Determine if they are functionally equivalent. Both kernels should, in principle, compute the same mathematical result (neglecting floating point rounding errors) on all valid inputs, despite differing low-level optimizations. | |
If equivalent, explain how their transformations differ (e.g., different block/thread configurations, different loop split factors) and why these differences do not change the final result. | |
If not equivalent, identify the parts of the code or transformations that alter the semantics, leading to potentially different outputs. | |
What I will provide: | |
1. Two CUDA kernel implementations generated by TVM with different schedules. | |
Kernel A: | |
{program_1_code} | |
Kernel B: | |
{program_2_code} | |
What I expect from your answer: | |
Do not output any thoughts, just the answer. | |
Whether these two programs are equivalent or not. You should output {true_label} or {false_label} in the end. | |
ZERO_COT: | | |
I have two CUDA kernels. I need to determine if these two kernels are functionally equivalent—that is, whether they produce identical results for all valid inputs. | |
Your task: Inspect the given CUDA kernel source codes. Determine if they are functionally equivalent. Both kernels should, in principle, compute the same mathematical result (neglecting floating point rounding errors) on all valid inputs, despite differing low-level optimizations. | |
If equivalent, explain how their transformations differ (e.g., different block/thread configurations, different loop split factors) and why these differences do not change the final result. | |
If not equivalent, identify the parts of the code or transformations that alter the semantics, leading to potentially different outputs. | |
What I will provide: | |
1. Two CUDA kernel implementations generated by TVM with different schedules. | |
Kernel A: | |
{program_1_code} | |
Kernel B: | |
{program_2_code} | |
What I expect from your answer: | |
1. Output any thhoughts you have about the process to the answer. You should have a clear reasoning explanation: Which aspects of the code indicate whether the kernels preserve the same computation? | |
2. whether these two programs are equivalent or not. You should output {true_label} or {false_label} in the end. | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
STOKE: | |
ZERO: "What is a good name for a company that makes {product}?" | |
ZERO_COT: "" | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
OJ_V: | |
ZERO: "What is a good name for a company that makes {product}?" | |
ZERO_COT: "" | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
OJ_A: | |
ZERO: "What is a good name for a company that makes {product}?" | |
ZERO_COT: "" | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
OJ_VA: | |
ZERO: "What is a good name for a company that makes {product}?" | |
ZERO_COT: "" | |
FEW: "What is a good name for a company that makes {product}?" | |
FEW_COT: "" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment