Tutorial 8. Using CmBuffer¶
In the previous examples, we have been using CmSurface to store image data. In this tutorial, we show the usage of CmBuffer to store generic data, and use oword-block read and write to access such data. The following is what we do in the nbody example
Host Program: Set up CmBuffers before enqueue¶
// CmBuffer represents a 1D surface in video memory.
// This function creates a CmBuffer in memory with linear layout.
CmBuffer *surf1 = nullptr;
device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf1);
cm_result_check(surf1->WriteSurface((unsigned char *)h_pos, nullptr));
// Gets the input surface index.
SurfaceIndex *input_surface_idx1 = nullptr;
cm_result_check(surf1->GetIndex(input_surface_idx1));
CmBuffer *surf2 = nullptr;
device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf2);
cm_result_check(surf2->WriteSurface((unsigned char *)h_vel, nullptr));
// Gets the input surface index.
SurfaceIndex *input_surface_idx2 = nullptr;
cm_result_check(surf2->GetIndex(input_surface_idx2));
CmBuffer *surf3 = nullptr;
device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf3);
// Gets the output surface index.
SurfaceIndex *output_surface_idx1 = nullptr;
cm_result_check(surf3->GetIndex(output_surface_idx1));
CmBuffer *surf4 = nullptr;
device->CreateBuffer(num_bodies * ELEMS_BODY * sizeof(float), surf4);
// Gets the output surface index.
SurfaceIndex *output_surface_idx2 = nullptr;
cm_result_check(surf4->GetIndex(output_surface_idx2));
Host Program: read output buffer after enqueue¶
// Reads the output surface content to the system memory using the CPU.
// The size of data copied is the size of data in Surface.
// It is a blocking call. The function will not return until the copy
// operation is completed.
// The dependent event "sync_event" ensures that the reading of the surface
// will not happen until its state becomes CM_STATUS_FINISHED.
cm_result_check(surf3->ReadSurface((unsigned char *)new_pos, sync_event));
cm_result_check(surf4->ReadSurface((unsigned char *)new_vel, sync_event));
Kernel Program: buffer reads and writes¶
Here we only show the use of block reads and block writes from single address. CM also provide various scattered reads and writes using a vector of addresses.
Read example¶
for (int i = 0; i < BODIES_CHUNK; i += BODIES_PER_RW) {
read(INPOS, (thisMB_ID * BODIES_CHUNK + i) * BODY_SIZE,
chunk.select<ELEMS_RW, 1>(ELEMS_BODY * i));
}
Write example¶
for (int i = 0; i < BODIES_CHUNK; i += BODIES_PER_RW) {
write(OUTPOS, (thisMB_ID * BODIES_CHUNK + i) * BODY_SIZE,
chunk.select<ELEMS_RW, 1>(ELEMS_BODY * i));
}