SIMD Data Layout Templates (SDLT) is a C++11 template library providing containers that represent arrays of Plain Old Data objects (a struct whose data members do not have any pointers/references and no virtual functions) using layouts that enable generation of efficient SIMD (single instruction multiple data) vector code. SDLT uses standard ISO C++11 code. It does not require a special language or compiler to be functional. Still, it takes advantage of performance features (such as OpenMP* SIMD extensions and pragma ivdep) that may not be available to all compilers. It is designed to promote scalable SIMD vector programming. To use the library, specify SIMD loops and data layouts using explicit vector programming model and SDLT containers, and let the compiler efficiently generate efficient SIMD code.
Many library interfaces employ generic programming, in which interfaces are defined by requirements on types and not specific types. The C++ Standard Template Library (STL) is an example of generic programming. Generic programming enables SDLT to be flexible yet efficient. The generic interfaces enable you to customize components to your specific needs.
The net result is that SDLT enables you to specify a preferred SIMD data layout far more conveniently than re-structuring your code completely with a new data structure for effective vectorization and can improve performance at the same time.
C++ programs often represent an algorithm in terms of high-level objects. There is a set of data for many algorithms that the algorithm will need to process. It is common for the dataset to be represented as an array of plain old data objects. It is common for developers to represent that array with a container from the C++ Standard Template Library, like std::vector. For example:
struct Point3s
{
float x;
float y;
float z;
// helper methods
};
std::vector<Point3s> inputDataSet(count);
std::vector<Point3s> outputDataSet(count);
for(int i=0; i < count; ++i) {
Point3s inputElement = inputDataSet[i];
Point3s result = // transformation of inputElement that is independent of other iterations
// can keep algorithm high level using object helper methods
outputDataSet[i] = result;
}
When possible, a compiler may attempt to vectorize the loop above. However, the overhead of loading the Array of Structures dataset into vector registers may overcome any performance gain of vectorizing. Programs exhibiting the scenario above could be good candidates for use in an SDLT container with a SIMD-friendly internal memory layout. SDLT containers provide accessor objects to import and export primitives between the underlying memory layout and the object's original representation. For example:
SDLT_PRIMITIVE(Point3s, x, y, z)
sdlt::soa1d_container<Point3s> inputDataSet(count);
sdlt::soa1d_container<Point3s> outputDataSet(count);
auto inputData = inputDataSet.const_access();
auto outputData = outputDataSet.access();
#pragma forceinline recursive
#pragma omp simd
for(int i=0; i < count; ++i) {
Point3s inputElement = inputData[i];
Point3s result = // transformation of inputElement that is independent of other iterations
// can keep algorithm high level using object helper methods
outputData[i] = result;
}
When a local variable inside the loop is initialized or stored using that loop's index , the compiler's vectorizer can now access the underlying SIMD-friendly data format and, when possible, perform unit stride loads. If the compiler can prove that nothing outside the loop can access its local object, then it can optimize its private representation of the loop object as Structure of Arrays (SOA). In our example, the container's underlying memory layout is also SOA, and unit stride loads can be generated. The container also allocates aligned memory, and its accessor objects provide the compiler with the correct alignment information to optimize code generation accordingly.
This documentation is for SDLT version 2, which extends version 1 by introducing support for n-dimensional containers.
Backwards Compatibility
Public interfaces of version 2 are fully backward compatible with interfaces of version 1.
The backward compatibility includes:
Limitations on backward compatibility include:
This compatibility does not cover internal implementation. The internal implementation for SDLT v1 was updated and unified with parts introduced in v2, so backward compatibility is not guaranteed for codes dependent on internal interfaces.
Deprecated Interface | Deprecated in Version | Replaced By |
---|---|---|
sdlt::fixed_offset<> |
v2 | sdlt::fixed<> |
sdlt::aligned_offset<> |
v2 | sdlt::aligned<> |
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |