CM (C for Media) Tutorial

Revision 6.0

Primary Author(s): Gang Chen and Guei-Yuan Lueh (with sources from current and past CM members)

Contributor(s): Kaiyu Chen, Michael Liao, Fang Liu, Wei Pan, Yuting Yang

Special thanks to our previous teammates in UK, David Stuttard, Tim Renouf, Tim Corringham, and Stephen Thomas, who were the main developers of this llvm-based compiler.

Intro: Comparing CM with CUDA/OpenCL

Similarities

  • Program for GPU acceleration: host program on CPU; kernel program on GPU.
  • Host API provided in C++. GPU kernel programming in a subset of C++.

Differences

  • Explicit SIMD-programming model. Allow varying SIMD width.
  • One CM thread is equivalent to one CUDA warp or one OCL subgroup, operates on a block of pixels instead of one pixel.
    • access the entire vector register file instead of one-lane of the register file. Enable developers to achieve the most efficient register usage.
  • Predefined vector and matrix type.
    • Natural representation of media data
    • Parallelism expressed through vector and matrix operations
  • Expose GEN hardware media-acceleration functions
  • Cm is good for applications that
    • Need cross SIMD-lane operations
    • Need mixed SIMD width
    • Need to change data layout (e.g., transpose)