Check-list for OpenCL™ Optimizations
Use Array Notation with int32 Indices: A[i][j]
Use Floating Point for Calculations
Note on Local Memory Use
Use Branching Accurately
Map Memory Objects (USE_HOST_PTR)
Prefer Buffers Over Images
Use Lower Math Precision
Use Restrict Qualifier for Kernel Arguments
See Also
OpenCL™ Device Fission for CPU Performance