YashasSamaga/D0_NOTICE.md

Last active January 19, 2022 16:51

Star (9) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/YashasSamaga/985071dc57885348bec072b4dc23824f.js"></script>
Save YashasSamaga/985071dc57885348bec072b4dc23824f to your computer and use it in GitHub Desktop.

Download ZIP

[UNOFFICIAL] Summary of the CUDA backend in OpenCV DNN

Raw

D0_NOTICE.md

DISCLAIMER

This gist is unofficial. It was created for personal use but have kept it public in case it would be of use to others. This document is not updated regularly and may not reflect the current status of the CUDA backend.

Raw

D1_Requirements.md

Internal Dependencies

The minimum set of dependencies required to use the CUDA backend in OpenCV DNN is:

cudev
opencv_core
opencv_dnn
opencv_imgproc

You might also require the following to read/write/display images and videos:

opencv_imgcodecs
opencv_highgui
opencv_videoio

You will require the following to run the tests:

opencv_ts
opencv_videoio

You also have to set BUILD_TESTS and BUILD_PERF_TESTS.

External Dependencies

The CUDA backend requires CUDA Toolkit (min: 9.2) and cuDNN (min: 7.5) to be installed on the system. CMake will automatically detect CUDA Toolkit and cuDNN when the following options are set:

WITH_CUDA
WITH_CUDNN

The CUDA backend is enabled by setting the following option:

OPENCV_DNN_CUDA

Running tests

Clone opencv_extra repository
cd opencv_extra/testdata/dnn
python3 download_models.py
cd path/to/opencv/repository
cd build
export OPENCV_TEST_DATA_PATH=/path/to/opencv_extra/testdata
Run bin/opencv_test_dnn
Refer to this guide to use perf tests to compare performance between versions

Usage

The CUDA backend can be selected by choosing one of the following backend/target options:

Backend	Target
`DNN_BACKEND_CUDA`	`DNN_TARGET_CUDA`
`DNN_BACKEND_CUDA`	`DNN_TARGET_CUDA_FP16`

A CC 5.3+ device is required to use DNN_TARGET_CUDA_FP16. Note that not all CUDA devices offer high FP16 thoughput. Hence, DNN_TARGET_CUDA_FP16 may perform worse than DNN_TARGET_CUDA. You can check if your device supports high FP16 throughput in the CUDA Programming Guide.

Examples

Raw

D2_SupportMatrix.md

Support Matrix

The CUDA backend uses OpenCV's CPU backend as a fallback for unsupported layers and partially supported layers with unsupported configurations.

Layer	Status	Note
Slice	✔️
Split	✔️
Concat	✔️
Reshape	✔️
Flatten	✔️
Resize, Interp (nearest neighbor, bilinear)	✔️
CropAndResize	✔️
Convolution 1D	✔️(OpenCV 4.5.2)
Convolution 2D	✔️
Convolution 3D	✔️
Deconvolution 2D	broken
Deconvolution 3D	broken
MaxPooling 1D	✔️ (OpenCV 4.5.2)
MaxPooling 2D	✔️
MaxPooling 3D	✔️
AveragePooling 1D	✔️ (OpenCV 4.5.2)
AveragePooling 2D	✔️
AveragePooling 3D	✔️
MaxPoolingWithIndices 2D	✔️
MaxPoolingWithIndices 3D	✔️
MaxUnpool 2D	✔️
MaxUnpool 3D	✔️
ROI Pooling	✔️
PSROI Pooling	❌
LRN	✔️
InnerProduct (constant weights)	✔️
MatMul (runtime blobs)	✔️ (OpenCV 4.5.3)
Softmax	✔️
LogSoftmax	✔️
MVN	✔️ (OpenCV 4.5.0)
ReLU (with configurable negative slope)	✔️
ReLU6 (with configurable ceil and floor)	✔️
Channelwise Paramteric ReLU	✔️
Sigmoid	✔️
TanH	✔️
Swish	✔️
Mish	✔️
ELU	✔️
BNLL	✔️
Abs	✔️
Power (configurable exp, scale and shift)	✔️
Batch Normalization	✔️
Const	✔️
Crop	✔️
Eltwise (sum, product, div, max)	✔️
Weighted Eltwise (sum)	✔️
Shortcut (sum)	✔️ (OpenCV 4.3.0)
Permute	✔️
ShuffleChannel	✔️
PriorBox	✔️
Reorg	✔️
Region	✔️	`scale_xy` parameter added in OpenCV 4.4.0
DetectionOutput	✔️ (OpenCV 4.5.0)
Normalization (L1, L2)	✔️
Shift	✔️
Padding (constant padding, reflection101 padding)	✔️
Proposal	❌
Scale	✔️
DataAugmentation	❌
Correlation	❌
Accum	❌
FlowWarp	❌
LSTM Layer	❌
RNN Layer	❌

KiaDavari commented May 14, 2021

yeah, I see.
Sure @YashasSamaga.
Thank you so much for helping

JulienMaille commented May 25, 2021

@YashasSamaga quick question: is DNN_BACKEND_CUDA the only way to run dnn on Nvidia? Put another way, which device will be selected when using this:

net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_OPENCL);

Author

YashasSamaga commented May 25, 2021

@JulienMaille You can use the OpenCL backend on most of NVIDIA's GPUs. You can select an OpenCL device in code or using an environment variable (OPENCV_OPENCL_DEVICE). I am not sure what the default device would be (I always explicitly select the device).

JulienMaille commented May 25, 2021

May I ask how you select a device at runtime? cf. opencv/opencv#20160 (comment)

Author

YashasSamaga commented May 26, 2021

@JulienMaille https://stackoverflow.com/questions/33417451/how-can-i-change-the-device-on-which-opencl-code-will-be-executed-with-umat-in-o

robmang commented Jul 16, 2021

Hi, again thanks for the great work.

Previously I successfully built and ran with:

Opencv4.4.0
CUDA 10.0
cuDNN 7.5.1,
on windows and ubuntu20.

I would now like to update to Opencv4.5.3. What is the recommended version of CUDA and cuDNN to build opencv with?

Author

YashasSamaga commented Jul 16, 2021

I would now like to update to Opencv4.5.3. What is the recommended version of CUDA and cuDNN to build opencv with?

For best performance, I would recommend using cuDNN 7.6.5 unless you are using new device that is not supported by it. If you use a lot of depthwise convolutions in your model, you might see huge benefits from cuDNN 8.2 if your model was performing worse than CPU inference.

robmang commented Jul 16, 2021

@YashasSamaga thanks for you quick reply.
I think I'll have to go with cuDNN 8.2 for more recent devices.

One of the reasons I'm updating is because I need it to run on an Nvidai 3080 which has a compute capability of 8.6, but at the time I compiled with -D CUDA_ARCH_BIN=5.3,6.0,6.1,7.0,7.2,7.5 -D CUDA_ARCH_PTX=7.5

robmang commented Jul 19, 2021

I'm getting the following error while trying to build opencv 4.5.3 with cuda support:

/usr/bin/ld: ../../lib/libopencv_dnn.so.4.5.3: undefined reference to cudnnGetConvolutionBackwardDataAlgorithm'
`
Could this be because I'm building opencv 4.5.3 with CUDA 11.4 (and cuDNN 8.2), should I be building it with CUDA 11.2?

--   NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS FAST_MATH)
--     NVIDIA GPU arch:             53 60 61 62 70 72 75 80 86
--     NVIDIA PTX archs:            86
-- 
--   cuDNN:                         YES (ver 8.2.2)

Author

YashasSamaga commented Jul 19, 2021

@robmang Please try purging all previous OpenCV installations and rebuild from a clean state.

cudnnGetConvolutionBackwardDataAlgorithm is an API in cuDNN 7 which is no longer used in OpenCV with cuDNN 8. Your CMake output shows that cuDNN 8 was detected correctly. OpenCV codebase has conditional compilation branches that avoid the use of cudnnGetConvolutionBackwardDataAlgorithm in cuDNN 8.

robmang commented Jul 22, 2021

@robmang Please try purging all previous OpenCV installations and rebuild from a clean state.

cudnnGetConvolutionBackwardDataAlgorithm is an API in cuDNN 7 which is no longer used in OpenCV with cuDNN 8. Your CMake output shows that cuDNN 8 was detected correctly. OpenCV codebase has conditional compilation branches that avoid the use of cudnnGetConvolutionBackwardDataAlgorithm in cuDNN 8.

@YashasSamaga, thank you!
It would have taken me quite a while to resolve the issue.

npvu1510 commented Sep 4, 2021

Please help me.Why open cv dnn gpu slower than cpu when i use yolov4 to detect image
opencv 4.5.1 , Cuda 11.2 , cudnn 8.1.0 , gpu 1660ti
sorry my english is bad

Author

YashasSamaga commented Sep 4, 2021

Please help me.Why open cv dnn gpu slower than cpu when i use yolov4 to detect image
opencv 4.5.1 , Cuda 11.2 , cudnn 8.1.0 , gpu 1660ti
sorry my english is bad

Can you share the code you used?

npvu1510 commented Sep 4, 2021

It's here.
GPU many times slower than CPU. I have build and installed opencv successfully and there are no errors

import cv2
import time
CONFIG_FILE='./yolov4.cfg'
WEIGHTS_FILE='./yolov4.weights'

image=cv2.imread('test.jpg')

net = cv2.dnn.readNet(CONFIG_FILE, WEIGHTS_FILE)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)

output_layer_name = net.getLayerNames()
output_layer_name = [output_layer_name[i[0] - 1] for i in net.getUnconnectedOutLayers()]
output_layer_name = net.getUnconnectedOutLayers()

blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (608, 608),swapRB=True, crop=False)
net.setInput(blob)

start = time.time()
layerOutputs = net.forward(output_layer_name)
end = time.time()
print("[FOWARD] took {:.6f} seconds".format(end - start))

Author

YashasSamaga commented Sep 4, 2021

@PhanVu1510 OpenCV DNN performs lazy initialization in the first forward pass. The first forward pass includes time to allocate memory, create handles, etc. Initializing the CUDA backend happens to be really slow compared to initializing CPU backends. Therefore, it looks like the CUDA backend is slower than CPU backend.

Ignore the first forward call and measure time from the second forward pass onwards.

Example code: https://gist.github.com/YashasSamaga/e2b19a6807a13046e399f4bc3cca3a49

npvu1510 commented Sep 4, 2021

Thank you!!!.
It achieves 30 frames per second for 416x416.Is there any other way to increase fps on my gpu?
Bc i want to 800x800 but it just 9-10 fps.

Author

YashasSamaga commented Sep 4, 2021 •

edited

Loading

@PhanVu1510 You can try pipelining to gain more FPS. You can also trade latency for throughput. Batched inference will give you higher throughput with higher latency. You can also use multiple cv::dnn::Net objects to do inference in parallel. This will help minimize GPU idle time. Again, this gives higher throughput at the cost of higher latency. If your application is not latency-critical, you should try using multiple Net objects and batched inference. You might be able to get anywhere from few dozen percentage increase to doubling the FPS.

JulienMaille commented Sep 4, 2021

As anyone benchmarked 3080 gpus? Last time I tried the first convolution took 30+sec!

npvu1510 commented Sep 25, 2021

Hi,
I want to use yolov4 p5 darknet 896x896(mentioned here but idk how to config it for 1 class.Can u help me ?
Thank u

YashasSamaga/D0_NOTICE.md

DISCLAIMER

Internal Dependencies

External Dependencies

Running tests

Usage

Examples

Support Matrix

KiaDavari commented May 14, 2021

Uh oh!

JulienMaille commented May 25, 2021

Uh oh!

YashasSamaga commented May 25, 2021

Uh oh!

JulienMaille commented May 25, 2021

Uh oh!

YashasSamaga commented May 26, 2021

Uh oh!

robmang commented Jul 16, 2021

Uh oh!

YashasSamaga commented Jul 16, 2021

Uh oh!

robmang commented Jul 16, 2021

Uh oh!

robmang commented Jul 19, 2021

Uh oh!

YashasSamaga commented Jul 19, 2021

Uh oh!

robmang commented Jul 22, 2021

Uh oh!

npvu1510 commented Sep 4, 2021

Uh oh!

YashasSamaga commented Sep 4, 2021

Uh oh!

npvu1510 commented Sep 4, 2021

Uh oh!

YashasSamaga commented Sep 4, 2021

Uh oh!

npvu1510 commented Sep 4, 2021

Uh oh!

YashasSamaga commented Sep 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienMaille commented Sep 4, 2021

Uh oh!

npvu1510 commented Sep 25, 2021

Uh oh!

YashasSamaga commented Sep 4, 2021 •

edited

Loading