opencl_node enables execution of OpenCL* kernels on devices that have the OpenCL support, such as integrated and discrete graphics processing units or CPUs. opencl_node simplifies the integration of OpenCL kernels into programs powered by the Intel TBB Flow Graph. opencl_node also handles memory buffer management in such programs.
template < typename JP, typename Factory, typename... Ports > class opencl_node < tuple < Ports... >, JP, Factory > // As a Factory - default_opencl_factory is used template < typename JP, typename... Ports > class opencl_node < tuple < Ports... >, JP > // As a Factory - default_opencl_factory is used // As a JP - queueing policy is used template < typename... Ports > class opencl_node < tuple < Ports... > >
#define TBB_PREVIEW_FLOW_GRAPH_NODES 1 #define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1 #include "tbb/flow_graph_opencl_node.h"
The "flow_graph_opencl_node.h" header is not included in "tbb/tbb.h" and "tbb/flow_graph.h". Due to this, you should include "flow_graph_opencl_node.h" directly.
opencl_node provides a simple interface for working with devices that have an OpenCL support. opencl_node is implemented as a streaming_node that uses opencl_factory. opencl_factory hides all complexity associated with kernel execution, synchronization of memory with the device, and results retrieval. opencl_factory implements the Factory concept and follows the streaming_node execution workflow.
opencl_factory uses two separate entities (Device Filter and Device Selector) for device initialization and execution management:
Entity |
Usage model |
---|---|
Device Filter |
Used to initialize the opencl_factory with a set of devices that will be available for kernel execution. All filtered devices must belong to the same OpenCL Platform. Default device filter includes all available devices from the first OpenCL Platform found. |
Device Selector |
Used to select the device for a particular opencl_node instance to execute its kernels. Default device selector selects the first device from the list of available devices that was constructed by device filter. The selection is done for every kernel execution. |
Among other things, opencl_node provides a set of helper classes, which simplifies work with OpenCL entities (devices, memory buffers, kernels, ND-ranges):
Entity |
Description |
---|---|
opencl_buffer/opencl_subbuffer |
A template class that is an abstraction over a strongly typed linear array. The OpenCL kernel works with special memory objects allocated on the host, opencl_buffer and opencl_subbuffer encapsulate the logic of the memory transactions between the host and the target. |
opencl_device |
A class that is an abstraction over OpenCL device. Used mainly inside Device Selector and Device Filter and has a public API to query information about devices. |
opencl_device_list |
A container of opencl_device instances. |
opencl_program |
A class that you can use to specify the type of the kernel, read the kernel from a file, or build the kernel. |
opencl_program_type |
The enumeration opencl_program_type provides the set of kernel types supported by opencl_node:
|
opencl_range |
Enables opencl_node to support ranges (see the OpenCL clEnqueueNDRangeKernel() method for details). |
Argument binding
By default, opencl_node binds the first input port to the first kernel argument, the second input port to the second kernel argument and so on. The set_args method of opencl_node allows to bind a specific value to each kernel argument, if they do not change from run to run. To specify that the value should be taken from the input port instead, use the port_ref helper function. See its description and usage examples in the streaming_node section of the documentation.
The example below shows the cubator and squarer example changed to offload part of computation to the GPU.
Kernels__kernel void cuber( __global int* buf ){ int idx = get_global_id(0); buf[idx] = buf[idx] * buf[idx] * buf[idx]; } __kernel void squarer( __global int* buf ){ int idx = get_global_id(0); buf[idx] = buf[idx] * buf[idx]; }Example
#define TBB_PREVIEW_FLOW_GRAPH_NODES 1 #define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1 #include "tbb/flow_graph_opencl_node.h" typedef tbb::flow::opencl_buffer<cl_int> opencl_vector; class gpu_device_selector { public: template <typename DeviceFilter> tbb::flow::opencl_device operator()(tbb::flow::opencl_factory<DeviceFilter>& f) { // Set your GPU device if available to execute kernel on const tbb::flow::opencl_device_list &devices = f.devices(); tbb::flow::opencl_device_list::const_iterator it = std::find_if( devices.cbegin(), devices.cend(), [](const tbb::flow::opencl_device &d) { cl_device_type type; d.info(CL_DEVICE_TYPE, type); return CL_DEVICE_TYPE_GPU == type; }); if (it == devices.cend()) { std::cout << "Info: could not find any GPU devices. Choosing the first available device (default behaviour)." << std::endl; return *(f.devices().begin()); } else { // Return GPU device from factory return *it; } } }; int main() { const int vector_size = 10; int sum = 0; tbb::flow::graph g; tbb::flow::broadcast_node< opencl_vector > broadcast(g); // GPU computation part gpu_device_selector gpu_selector; tbb::flow::opencl_program<> program("vector_operations.cl"); tbb::flow::opencl_node< std::tuple< opencl_vector > > vector_cuber(g, program.get_kernel("cuber"), gpu_selector); tbb::flow::opencl_node< std::tuple< opencl_vector > > vector_squarer(g, program.get_kernel("squarer"), gpu_selector); // Define kernel argument and problem size vector_cuber.set_args(tbb::flow::port_ref<0>); vector_cuber.set_range({{ vector_size }}); vector_squarer.set_args(tbb::flow::port_ref<0>); vector_squarer.set_range({{ vector_size }}); // Computation results join tbb::flow::join_node< std::tuple< opencl_vector, opencl_vector > > join_data(g); // CPU computation part tbb::flow::function_node< std::tuple< opencl_vector, opencl_vector > > summer( g, 1, [&](const std::tuple< opencl_vector, opencl_vector > &res ) { opencl_vector vect_cubed = std::get<0>(res); opencl_vector vect_squared = std::get<1>(res); for (int i = 0; i < vector_size; i++) { sum += vect_cubed[i] + vect_squared[i]; } }); // Graph topology make_edge( broadcast, vector_cuber ); make_edge( broadcast, vector_squarer ); make_edge( vector_cuber, tbb::flow::input_port<0>(join_data) ); make_edge( vector_squarer, tbb::flow::input_port<1>(join_data) ); make_edge( join_data, summer ); // Data initialization opencl_vector vec(vector_size); std::fill( vec.begin(), vec.end(), 2 ); // Run the graph broadcast.try_put(vec); g.wait_for_all(); std::cout << "Sum is " << sum << "\n"; return 0; }
namespace tbb { namespace flow { template < typename... Args > class opencl_node; template < typename JP, typename Factory, typename... Ports > class opencl_node < tuple < Ports... >, JP, Factory > : public streaming_node < tuple < Ports... >, JP, Factory > { public: typedef implementation-dependent kernel_type; opencl_node( graph &g, const kernel_type& kernel ); opencl_node( graph &g, const kernel_type& kernel, Factory &f ); template < typename DeviceSelector > opencl_node( graph &g, const kernel_type& kernel, DeviceSelector d, Factory &f); }; template < typename JP, typename... Ports > class opencl_node < tuple < Ports... >, JP > : public opencl_node < tuple < Ports... >, JP, opencl_info::default_opencl_factory > { public: typedef implementation-dependent kernel_type; opencl_node( graph &g, const kernel_type& kernel ); template < typename DeviceSelector > opencl_node( graph &g, const kernel_type& kernel, DeviceSelector d ); }; template < typename... Ports > class opencl_node < tuple < Ports... > > : public opencl_node < tuple < Ports... >, queueing, opencl_info::default_opencl_factory > { public: typedef implementation-dependent kernel_type; opencl_node( graph &g, const kernel_type& kernel ); template < typename DeviceSelector > opencl_node( graph &g, const kernel_type& kernel, DeviceSelector d ); }; } }
The following table provides additional information on the members of this template class. All other public members are described in the base streaming_node class.
Member |
Description |
---|---|
typename... Ports |
The node's incoming/outgoing data types. |
typename JP |
Join Policy. See the description of the class join_node for details. |
typename Factory |
The device-specific Factory type. opencl_factory is used for this node by default. |
opencl_node( graph &g, const kernel_type& kernel ) opencl_node( graph &g, const kernel_type& kernel, Factory &f ) template < typename DeviceSelector > opencl_node( graph &g, const kernel_type& kernel, DeviceSelector d ) template < typename DeviceSelector > opencl_node( graph &g, const kernel_type& kernel, DeviceSelector d, Factory &f) |
Main constructors. See streaming_node for details. |