gfx_factory Class

Summary

gfx_factory class implements the Factory concept of streaming_node to simplify use of Intel® Graphics Technology for general-purpose computing in a program based on Intel® Threading Building Blocks (Intel® TBB).

CAUTION

The current implementation of gfx_factory does not allow memory buffer objects to be used concurrently. As a result, several streaming nodes customized with gfx_factory cannot be connected with each other directly.

Syntax

class gfx_factory;

Header

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1
#include "tbb/gfx_factory.h"

Description

gfx_factory is responsible for low-level aspects of using Intel® processor graphics (further referred to as the target) from an Intel® TBB flow graph: uploading input data to the target, running a kernel there, and passing the results back to the graph.

gfx_factory is implemented on top of the API provided by the Intel® C++ Compiler to organize queued offload of user-defined kernel functions and data sharing between the CPU and the processor graphics. Intel® C++ Compiler 16.0 or newer is required in order to use gfx_factory.

For additional details about the underlying API, refer to the Intel® C++ Compiler User and Reference Guide section Optimization and Programming Guide > Intel® Graphics Technology > Programming for Intel® Graphics Technology > Overview: API-Based Offloading.

Kernel function

A kernel function to use with gfx_factory is a separate user-defined function with data-parallel sections written using Intel® Cilk™ Plus. The function has to be annotated with __declspec(target(gfx_kernel)) to be converted to a kernel entry point for processor graphics execution:

Example

static __declspec(target(gfx_kernel))
void vector_square(int *v, size_t n) {
    cilk_for(size_t i = 0; i < n; ++i) {
        v[i] = v[i] * v[i];
    }
}

GFX buffer

gfx_factory requires the use of the gfx_buffer template class, which is an abstraction over a data array. This class is responsible for sharing an array of data between the host and the target while compute offload kernels are being executed on processor graphics.

template <typename T>
class gfx_buffer {
public:

  typedef implementation-defined iterator;
  typedef implementation-defined const_iterator;

  typedef std::size_t size_type;

  gfx_buffer();
  gfx_buffer(size_type size);

  T* data();
  const T* data() const;

  size_type size() const;

  const_iterator cbegin() const;
  const_iterator cend() const;
  iterator begin();
  iterator end();

  T& operator[](size_type pos);
  const T& operator[](size_type pos) const;
};

The following table provides additional information on the members of this template class.

Member

Description

iterator;

const_iterator;

Implementation-defined iterator types.

gfx_buffer();

The constructor to create an empty gfx_buffer.

gfx_buffer(size_type size);

The constructor to create gfx_buffer of a certain size. The elements are value initializated by calling T() for each.

T* data();

const T* data() const;

Return a pointer to the data storage array.

size_type size() const;

Return the number of elements in the buffer.

iterator begin();

const_iterator cbegin() const;

Return an iterator to the first element of the container.

iterator end();

const_iterator cend() const;

Return an iterator to the element following the last element of the container..

T& operator[](size_type pos);

const T& operator[](size_type pos) const;

Return a reference to the element at the specified position.

Device selector

streaming_node requires a device selector: a functor that selects a device for offloading a particular computation. However, since the underlying API only works with Intel processor graphics, it has no option to select a particular device. Because of this, you have to use the dummy device selector provided by the factory: gfx_factory::dummy_device_selector().

Example

See a simple vector squaring example below.

#include <iostream>

#include <cilk/cilk.h>

#include "tbb/flow_graph.h"
#include "tbb/gfx_factory.h"

static __declspec(target(gfx_kernel))
void vector_square(int *v, size_t n) {
    cilk_for(size_t i = 0; i < n; ++i) {
        v[i] = v[i] * v[i];
    }
}

int main() {
    using namespace tbb::flow;

    typedef tuple< gfx_buffer<int>, size_t > kernel_args;
    typedef streaming_node< kernel_args, queueing, gfx_factory > gfx_node;

    graph g;
    gfx_factory factory(g);

    gfx_node squaring(g, vector_square, gfx_factory::dummy_device_selector(), factory);


    function_node< gfx_buffer<int> > 
        validation(g, unlimited, 
            [](const gfx_buffer<int>& buffer) {
                bool is_correct = std::all_of(buffer.cbegin(), buffer.cend(),
                                                  [](int i) {return i == 4; });
                if (is_correct) {
                    std::cout << "Results are correct." << std::endl;
                }
            });

    make_edge(output_port<0>(squaring), validation);

    const size_t array_size = 1000000;
    gfx_buffer<int> buffer(array_size);
    std::fill(buffer.begin(), buffer.end(), 2);

    squaring.set_args(port_ref<0, 1>);
    input_port<0>(squaring).try_put(buffer);
    input_port<1>(squaring).try_put(array_size);

    g.wait_for_all();
}

Public members

The gfx_factory class implements the Factory Concept defined by streaming_node.

For details, see streaming_node reference.

namespace tbb {
namespace flow {

class gfx_factory {
public:

  typedef implementation-defined device_type;
  typedef implementation-defined kernel_type;

  gfx_factory(tbb::flow::graph& g);

  template <typename ...Args>
  void send_data(device_type device, Args&... args);

  template <typename ...Args>
  void send_kernel(device_type device, const kernel_type& kernel, Args&... args);

  template <typename FinalizeFn, typename ...Args>
  void finalize(device_type device, FinalizeFn fn, Args&... args);

  class dummy_device_selector;
};

}
}

The following table provides additional information on the members of this template class.

Member

Description

device_type;

kernel_type;

Implementation-defined types.

gfx_factory(tbb::flow::graph& g);

Main constructor. Store a reference to the graph for synchronization between the graph and the device.

template <typename ...Args>

void send_data(device_type device, Args&... args);

Share data with the device.

template <typename ...Args>

void send_kernel(device_type device, const kernel_type& kernel, Args&... args);

Put kernel into the in-order offload queue.

template <typename FinalizeFn, typename ...Args>

void finalize(device_type device, FinalizeFn fn, Args&... args);

Finalize the kernel run if no node successors exist.

class dummy_device_selector;

Dummy device selector functor. Has to be passed to the streaming_node constructor.

See Also