This lesson is in the early stages of development (Alpha version)

Advanced features

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • Some pointers on further OpenMP features

Objectives
  • Understand further material and topics that can used.

Using OpenMP on GPUs

GPUs are very efficient at parallel workloads where data can be offloaded to the device and processed since communication between GPU and main memory is limited by the interface (e.g. PCI).

NVIDIA GPUs are available and usually programmed using NVIDIA’s own CUDA technology. This leads to code that is limited to only working on NVIDIA’s ecosystem. This limits choice for the programmer and portability for others to use your code. Recent versions of OpenMP since 4.0 has supproted offload functionality.

Info from Nvidia suggests this is possible but still work in progress. Compilers have to be built to support CUDA (LLVM/Clang is one such compiler).

Example code is:

#pragma omp \
#ifdef GPU
target teams distribute \
#endif
parallel for reduction(max:error) \
#ifdef GPU
collapse(2) schedule(static,1)
#endif
for( int j = 1; j < n-1; j++)
{
  for( int i= 1; i < m-1; i++ )
  {
    Anew[j][i] = 0.25 * ( A[j][i+1] + A[j][i-1]+ A[j-1][i] + A[j+1][i]);
    error = fmax( error, fabs(Anew[j][i] -A[j][i]));
  }
}

If interested, come and talk to us and we can see how we can help.

Further material

Key Points

  • OpenMP is still an evolving interface to parallel code.