GPU Overdrive

Karthik Nadig

GPUs are designed for high performance graphics computing, but what if we could harness that for general purpose. CUDA is a supercomputing architecture developed by NVIDIA which enables that. Developers can write programs in C like format, OpenCL or DirectX Compute and the code will be compiled to work with GPUs which support CUDA. There is an entire website dedicated to this, GPGPU.org.

Lets see how it affects the way code is written. Consider Matrix Multiplication, on a normal CPU the complexity is O(N³). If you use Strassen Algorithm you can reduce it to O(N^log₂7), but for now lets see the basic algorithm. On a GPU the process can be reduced to one loop. Here is how it’s done.

on a CPU

// A(N:P), B(P:M) and C(N:M)
for(int i=0;i<N;i++)
for(int j=0;j<M;j++)
for(int k=0;k<P;k++)
C[i][j] += A[i][k] * B[k][j];

Of the three loops, the loop required in GPU version is the one on line 4.

on a GPU ( using C for CUDA )

// A(N:P), B(P:M) and C(N:M)
fnMultiply(A,B,C)
{
// index i,j is obtained from threadId
for(int k=0;k<P;k++)
C[i][j] += A[i][k] * B[k][j];
}
main()
{
// call by creating N*M threads
fnMultiply<<<1,dim3(N,M)>>>(A,B,C);
}

In the case of GPU there are NxM threads computing in parallel, assuming that the GPU hardware supports NxM threads. By using thread blocks and shared memory the overall performance can be further improved. In the example above one thread block of size NxM is assumed. See the samples in the CUDA SDK for a optimized version.

Tags: CUDA, GPGPU, GPU, NVIDIA, Supercomputing

This entry was posted on Monday, August 17th, 2009 at 11:25and is filed under . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Normal

GPU Overdrive

Leave a Reply

Categories

Archives