get link
|
sync toc
|
<<
|
>>
Search Options:
Search Titles Only
Match All Words
Match Whole Words
Show Results in Tree
OpenCL™ Applications Developer Guide for Intel® Core™ and Intel® Xeon® Processors
Legal Information
Getting Help and Support
Introduction
About This Document
OpenCL™ Standard
Basic Concepts
Using Data Parallelism
Check-list for OpenCL™ Optimizations
Use Array Notation with int32 Indices: A[i][j]
Use Floating Point for Calculations
Note on Local Memory Use
Use Branching Accurately
Map Memory Objects (USE_HOST_PTR)
Prefer Buffers over Images
Use Lower Math Precision
Use Restrict Qualifier for Kernel Arguments
Tips and Tricks for Kernel Development
Why Optimizing Kernels Is Important?
Avoid Spurious Operations in Kernels
Avoid Handling Edge Conditions in Kernels
Use the Preprocessor for Constants
Prefer (32-bit) Signed Integer Data Types
Prefer Row-Wise Data Accesses
Use Built-In Functions
Avoid Extracting Vector Components
Task-Parallel Programming Model Hints
Common Mistakes in OpenCL™ Applications
Application-Level Optimizations
Avoid Needless Synchronization
Reuse Compilation Results with clCreateProgramWithBinary
Debugging OpenCL™ Kernels on Linux* OS
Enabling Debugging in OpenCL™ CPU Compiler and Runtime
Start a Debugging Session
Conditional Breakpoints on Work Items
Performance Debugging with Intel® SDK for OpenCL™ Applications
Performance Debugging Introduction
Host-Side Timing
Profiling Operations Using OpenCL™ Profiling Events
Comparing OpenCL™ and Native Code Performance
Getting Credible Performance Numbers
Tools for OpenCL™ Development
Coding for the Intel® Architecture Processors
Introduction for OpenCL™ Coding on Intel® Architecture Processors
Vectorization Basics for Intel® Architecture Processors
Vectorization: SIMD Processing Within a Work Group
Benefitting from Implicit Vectorization
Vectorizer Knobs
Targeting a Different CPU Architecture
Using Vector Data Types
Writing Kernels to Directly Target the Intel® Architecture Processors
Work-Group Size Considerations
Threading: Achieving Work-Group Level Parallelism
Efficient Data Layout
Using the Blocking Technique
Intel® Turbo Boost Technology Support
Global Memory Size