Coding for Intel® Xeon Phi™ Coprocessors
Introduction for OpenCL™ Coding on Intel® Xeon Phi™ Coprocessors
Threading: Achieving Parallelism Between Work-Groups
Vectorization: SIMD Processing Within a Work-group
Work-Group Size Considerations for Intel® Xeon Phi™ Coprocessors
Efficient Data Layout
Note on the Non-Uniform Control Flow
Saturating the Memory Bandwidth
Utilizing the Blocking Technique
Hardware Prefetching Overview
Utilizing Software Prefetching
remove
Global Memory Size for Intel® Xeon Phi™ Coprocessors