Overview

Training deep learning models is very costing task in terms of the required amount of calculations, specifically matrix operations, that need to be done to update the model’s trainable parameters. Using CPU to train deep learning models is not very efficient as it doesn’t provide a lot of parallelism, and hence takes long periods of time to train even simple DNNs.

Using a GPU (Nvidia GPU) is a great alternative for training DNNs as it contains thousands of cores designed for parallel execution, which makes the training much faster. In order to make use of the power of the Nvidia GPU, we should also install CUDA, which is a parallel computing platform and programming model designed for high performance computing.

In this article, we’ll explain how we can set up our Nvidia driver, and install the correct CUDA version.

Check compatibility

Before installing cuda, you should first verify that you have a CUDA-Capable GPU. To do so, you can run

lspci | grep -i nvidia

If you do not see any settings, update the PCI hardware database that Linux maintains by running:

sudo /sbin/update-pciids

and rerun the previous lspci command. If your graphics card is from NVIDIA and it is listed in https://developer.nvidia.com/cuda-gpus, then your GPU is CUDA-capable.

Install Nvidia Driver

In order to use CUDA, we should first install the recommended Nvidia driver. To do so, we should remove any existing Nvidia drivers, update our system, and reinstall the recommended version using the ubuntu-drivers autoinstall command:

sudo apt-get remove --purge nvidia-*
sudo apt update
sudo ubuntu-drivers autoinstall # Installs the recommended version for your system

# Reboot the system for the actions to take place
sudo reboot

Now, your can verify the installation by running nvidia-smi. You should see something like this:

Install CUDA toolkit

In order to start a fresh installation and avoid conflicts, we should first remove any potentially installed cuda toolkit on our system. We can do the following by running:

dpkg -l | grep cuda # Check if there are any CUDA packages installed
sudo dpkg --purge $(dpkg -l | grep cuda | awk '{print $2}') # Remove all cuda packages

# Clean up potential residual files
sudo apt autoremove
sudo apt clean

# Remove NVIDIA repository
sudo rm /etc/apt/sources.list.d/cuda*.list
sudo apt update

# Verify removal
which nvcc
sudo rm -rf /usr/local/cuda* # [Optional] If the path to nvcc is still shown

reboot

Now, to install the new version, we can use the guide in https://developer.nvidia.com/cuda-downloads.

Important !! We should first verify that we’re installing the right cuda version which is compatible with our Nvidia driver, in our case 560. You can use the guide in the detailed documentation here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#local-repo-installation-for-ubuntu, to avoid mistakes.

To install the cuda toolkit 12.6 in Ubuntu, we should follow the steps below:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda-repo-ubuntu2004-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-6-local_12.6.3-560.35.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6

Note: we can see that the cuda version is compatible with our Nvidia driver: …-560.35.05-…, so if we have a different driver, the cuda version might differ.

Now, to verify the installation, we can run:

nvcc --version

Important !! until this step, the cuda toolkit should normally be installed, however you will probably get the following output:

Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

DO NOT run sudo apt install nvidia-cuda-toolkit, as this will install CUDA again, probably with different version, and it will make conflicts. Instead, we should complete the following post-installation actions:

vim ~/.bashrc

# Add the following variables (Change with your version !!)
export PATH=/usr/local/cuda-12.6/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

. ~/.bashrc

Now, we should normally have the command nvcc --version return a correct output.

Test CUDA is working

There’s a GitHub repository that we can use to test whether our installed CUDA is working correctly. To do so, we can run the following:

git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples/Samples/6_Performance/LargeKernelParameter
make
./LargeKernelParameter

If our CUDA runs correctly, we should get:

Kernel 4KB parameter limit - time (us):75.7728
Kernel 32,764 byte parameter limit - time (us):55.2884
Test passed!

Conclusion

In this article, we have demonstrated how to correctly setup and test the Nvidia driver and the CUDA toolkit on a Ubuntu machine. We have also highlighted some of the problems that one can encounter during this setup.
If you find this article useful, please let us know :)

A Guide to Setting Up Nvidia Driver and Cuda Toolkit On Ubuntu (20.04)

Table of contents