Intel® C++ Compiler 18.0 Developer Guide and Reference
Intel Graphics Technology enables you to isolate a part of level 3 cache and use it as high-bandwidth memory explicitly addressable by program code. This memory is called shared local memory (SLM). SLM is useful for storing data that is frequently accessed by multiple threads in a group. The data allocated in SLM is completely protected from level 3 cache misses. Each thread group is assigned its own portion of SLM. You can access SLM programmatically and share it between threads within the same thread group, where a thread group is a set of threads sharing some common hardware-defined characteristics, including the same hardware thread group id, the same synchronization domain for the thread group barrier and others.
In comparison to main memory SLM offers increased bandwidth, lower latency, and improved performance for gather and scatter operations. There is up to 64 KB of SLM per half-slice, half the size of level 3 cache. The compiler reserves 16 bytes of those 64 KB, so the actual amount of SLM available is slightly less.
A half-slice is the basic hardware building block. Different HD graphics configurations differ in the number of half slices: 1 for GT1, 2 for GT2, 4 for GT3, and so on.
To use shared local memory, you need to understand:
extensions to the programming model.
the syntax for using shared local memory, using Intel® Cilk™ Plus.
Intel® Cilk™ Plus is a deprecated feature in the Intel® C++ Compiler 18.0. An alternative for offloading to the processor graphics is planned for a future release. For more information see Migrate Your Application to use OpenMP* or Intel® TBB Instead of Intel® Cilk™ Plus.
semantics and restrictions.