http://developer.nvidia.com/object/cuda_3_2_downloads.html#Linux
under "CUDA Toolkit for RedHat Enterprise Linux 5.5" (make sure to pick the architecture that matches your machine!)
Also grab the latest SDK, labeled "GPU Computing SDK code samples".
Now, open a shell and cd into your Downloads directory, and run the CUDA toolkit installation like so:
> chmod u+x cudatoolkit_3.2.16_linux_64_rhel5.5.run
> ./cudatoolkit_3.2.16_linux_64_rhel5.5.run
Make sure to specify an installation directory that makes sense, either the default or e.g. "~/local" if you're doing a test install.
Same goes for the SDK:
> chmod u+x gpucomputingsdk_3.2.16_linux.run
> ./gpucomputingsdk_3.2.16_linux.run
Again, specify where to put the SDK stuff. I thought ~/local/cuda/sdk was nice but it may have its drawbacks as you update your CUDA toolkit but want to keep your SDK the same. Also make sure to point the SDK installer to the directory just specified for the CUDA toolkit installation (e.g. /usr/local/cuda or ~/local/cuda) when prompted.
Now, the first thing you want to build is "deviceQuery", which will tell you if your device driver matches your toolkit. It's really important that these two match, as most of the simple "hello world" stuff will otherwise just quietly not execute any of the code on the device, only the host code! So, starting from your SDK root installation directory, e.g. ~/local/cuda/sdk
> make
This builds a shared library needed by most of the SDK tools. Next,
> cd ../C/src/deviceQuery
> make
Now deviceQuery should be ready to use.
> cd ../../bin/linux/release
> ./deviceQuery
If your driver is up to date this should output something like:
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Quadro FX 3800"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.3
Total amount of global memory: 1073020928 bytes
Multiprocessors x Cores/MP = Cores: 24 (MP) x 8 (Cores/MP) = 192 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.20 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 1, Device = Quadro FX 3800
PASSED
Press
-----------------------------------------------------------
Otherwise you may have to install an updated device driver. These can be found at:
http://www.nvidia.com/Download/index.aspx
NB! You need root permissions to install updated drivers, and if you're not comfortable dealing with things that may potentially break your X graphics settings you may want to consult a systems expert.
That being said, the installer asks very few questions and generally seems very good-behaved. You will however need to kill your X windows first of all, which will obviously close this browser window along with every other piece of X interface!
Closing X can be done in a myriad ways, one of which is:
> init 3
Once you're in the shell, cd to your Downloads directory and (as usual):
> chmod u+x NVIDIA-Linux-x86_64-260.19.44.run
> sudo ./NVIDIA-Linux-x86_64-260.19.44.run
Once your device upgrade has finished, start x again:
> startx
and you should now be able to build and run CUDA SDK samples as well as tutorials such as these:
http://llpanorama.wordpress.com/2008/05/21/my-first-cuda-program/
http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/
(note that the latter requires you to build 'cutil' which can be found in sdk/C/common)
Good luck!