Hardware Prefetching Overview

Intel® Xeon Phi™ coprocessor comprises a default-enabled hardware prefetcher from GDDR into L2 that works within a memory page. It is triggered by L2 miss to a new 4K page. Once a forward or backward stream direction is detected, the hardware prefetcher issues up to four-cache line prefetch requests, which are the same type as the access. As the prefetcher is sequential, only sequential L2 cache-line misses trigger this prefetcher (unlike the Intel® Xeon® CPU, which has stride prefetcher capable of detecting regular strides forward or backward up to 2K bytes). For Intel Xeon Phi coprocessor you need to use software prefetching for the strided access.