Nvidia_Docker相关工具安装
1. Nvidia 驱动安装
首先,编辑/etc/modprobe.d/blacklist.conf,把可能导致重复load的开源驱动拉入黑名单,添加如下内容:
blacklist nouveau
卸载所有安装过的nvidia驱动,添加驱动源
$ sudo apt-get remove --purge nvidia-*
$ sudo apt-get update
查找合适的驱动版本,这里选择 'nvidia-driver-535 - distro non-free'
s$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:*************************************************
vendor : NVIDIA Corporation
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-535-open - distro non-free recommended
driver : nvidia-driver-535 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
通过如下指令安装后重启
$ sudo apt-get install nvidia-driver-535
$ sudo reboot
重启系统后,可执行如下命令查看驱动的安装状态
$ nvidia-smi
2. CUDA 的安装
根据 nvidia-smi 的提示,在cuda-toolkit-archive中找到对应的版本,这里选择 12.2。
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
3. Docker 配置
3.1. 安装docker
# docker安装文档: https://docs.docker.com/engine/install/
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
sudo systemctl --now enable docker
3.2. 安装 nvidia-container-toolkit
在docker环境中加载nvidia和cuda驱动,需要安装 nvidia-container-toolkit,安装指令如下:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker