windows下NVIDIA的cudnn及cuda配置

waifu2x有cpu或cudnn2种模式,默认cpu模式的计算时间太长,而cudnn模式依赖的是显卡,处理速度会变快。而一般系统不自带cudnn,需要我们自己去配置

需要的工具、软件:(部分软件下载链接会在文章中提到)

  • 显卡(这里以GTX 950M演示)
  • TechPowerUp GPU-Z
  • Visual Studio(版本最好是2015及以上)
  • nvidia的显卡最新驱动
  • cuda
  • cudnn

第一步:检查显卡是否支持cuda

需使用TechPowerUp GPU-Z

如图所示部分,这个多选框若是被选中,则说明你的显卡支持cuda,你可以继续下一步了

第二步:更新显卡驱动

为了使用最新的cuda、cudnn,保险一点,我们还是去升级一下驱动

前往Nvidia(中国)根据自己的显卡型号选择下载

在更新前建议先备份显卡驱动,打开驱动程序,选择“图形驱动程序”

实际上,“显卡驱动”和“图形驱动”是一样的,不选择第一个是因为会附加一个“GeForce Experience”

第三步:安装cuda

前往Nvidia(中国)下载驱动,推荐下载最新的,一定要注意匹配自己的操作系统

文件比较大,下载打开,记录下解压路径(之后可能需要用到),解压完后选择自定义

然后选择性的打勾(下图作为参考)

我没有把Visual Studio Integration打勾,因为在我的系统环境下,打勾之后无法安装成功,当然这只是我的系统,你可以先打勾,不行再取消,然后参考下面的“Visual Studio Integration导致安装失败

后来发现是vs版本太老了…

运行cmd > nvcc -V查看是否安装成功

其实没什么意外的话下面一步应该可以省略

用vs打开CUDA Samples目录,比如我是D:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1,找到对应vs版本文件进行编译

最后在cmd中打开编译好的deviceQuery.exe和bandwidthTest.exe,查看GPU是否处于正常状态

>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 950M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     1.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     1.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     68.7

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
>deviceQuery.exe
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 950M"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            928 MHz (0.93 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 4 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

Visual Studio Integration导致安装失败

对Visual Studio Integration不打勾,打开cuda的解压目录(千万不要关闭cuda安装程序)

将里面的CUDAVisualStudioIntegration目录复制出来留着

查看vs中的具体报错信息,定位到报错的目录,比如我是C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations\CUDA 10.1.props

把之前复制出来文件夹下extras\visual_studio_integration\MSBuildExtensions所有文件复制进去,即可解决

若想在vs中添加新建cuda这样的选项,请参考这里

第四步:cudnn安装

前往官网根据cuda的版本下载对应版本

将下载好的压缩包中的cuda\下所有文件复制到cuda安装目录,比如我是D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1

至此,安装完毕,我们可以用waifu2x的cudnn,并感受cudnn所带来的快感吧

参考资料:

windows下安装配置cudn和cudnn

CUDA安装失败解决方法

本站遵循「CC BY 4.0」创作共享协议,转载请注明出处