Performance measurements are done on a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, the minimum (or average, geometric mean, and so on) value for the execution time is usually used for final projections.
An alternative to calling kernel several times is using a single “warm-up” run.
The warm-up run might be helpful for kernels with small amount of computations, as it helps to amortize the following potential (one-time) costs:
NOTE: You need to make your performance
conclusions on reproducible data. If warm-up run does not help or execution
time still varies, try running large number of iterations and then average
the results. For time values that range too much, consider using geomean
.
Consider the following:
Refer to the “OpenCL™ Optimizations Tutorial” SDK sample for code examples of performing warm-up run before starting performance measurement.