Created
September 8, 2019 16:35
-
-
Save CasiaFan/2be30d08d4a3d348c4194f3ba062fb9f to your computer and use it in GitHub Desktop.
Use pipe to read ffmped decoded video frames with NVIDIA GPU hardware acceleration
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import subprocess as sp | |
import cv2 | |
import numpy as np | |
from PIL import Image | |
import tensorflow as tf | |
ffmpeg_cmd_1 = ["./ffmpeg", "-y", | |
"-hwaccel", "nvdec", | |
"-c:v", "h264_cuvid", | |
"-vsync", "0", | |
"-max_delay", "500000", | |
"-reorder_queue_size", "10000", | |
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1", | |
"-f", "rawvideo", | |
"-pix_fmt", "yuv420p", | |
"-preset", "slow", | |
"-an", "-sn", | |
"-vf", "fps=15", | |
"-"] | |
ffmpeg_cmd_2 = ["./ffmpeg", "-y", | |
"-hwaccel", "nvdec", | |
"-c:v", "h264_cuvid", | |
"-vsync", "0", | |
"-max_delay", "500000", | |
"-reorder_queue_size", "10000", | |
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1", | |
"-f", "rawvideo", | |
"-preset", "slow", | |
"-an", "-sn", | |
"-pix_fmt", "yuv420p", | |
"-vf", "fps=15", | |
"-"] | |
ffmpeg1 = sp.Popen(ffmpeg_cmd_1, stdout=sp.PIPE, bufsize=10) | |
ffmpeg2 = sp.Popen(ffmpeg_cmd_2, stdout=sp.PIPE, bufsize=10) | |
class YUV2RGB_GPU(): | |
def __init__(self, w=1920, h=1080): | |
config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.03)) | |
self.y = tf.placeholder(shape=(1, h, w), dtype=tf.float32) | |
self.u = tf.placeholder(shape=(1, h, w), dtype=tf.float32) | |
self.v = tf.placeholder(shape=(1, h, w), dtype=tf.float32) | |
r = self.y+1.371*(self.v-128) | |
g = self.y+0.338* (self.u-128)-0.698*(self.v-128) | |
b = self.y+1.732*(self.u-128) | |
result = tf.stack([b, g, r], axis=-1) | |
self.result = tf.clip_by_value(result, 0, 255) | |
self.sess = tf.Session(config=config) | |
def convert(self, y, u, v): | |
results = self.sess.run(self.result, feed_dict={self.y:y, self.u: u, self.v: v}) | |
return results.astype(np.uint8) | |
C = YUV2RGB_GPU() | |
ffmpegs = [ffmpeg1, ffmpeg2] | |
while True: | |
w = 1920 | |
h = 1080 | |
k = w*h | |
ys = [] | |
us = [] | |
vs = [] | |
for i in ffmpegs: | |
x = i.stdout.read(int(w*h*6//4)) # read bytes of single frames | |
y = np.frombuffer(x[0:k], dtype=np.uint8).reshape((h, w)) | |
u = np.frombuffer(x[k:k+k//4], dtype=np.uint8).reshape((h//2, w//2)) | |
v = np.frombuffer(x[k+k//4:], dtype=np.uint8).reshape((h//2, w//2)) | |
u = np.reshape(cv2.resize(np.expand_dims(u, -1), (w, h)), (h, w)) | |
v = np.reshape(cv2.resize(np.expand_dims(v, -1), (w, h)), (h, w)) | |
image = np.stack([y, u, v], axis=-1) | |
ys.append(y) | |
us.append(u) | |
vs.append(v) | |
image = C.convert(ys, us, vs) | |
image = np.concatenate(image, axis=0) | |
image = cv2.resize(image, None, fx=1/2, fy=1/2) | |
cv2.imshow("im", image) | |
if cv2.waitKey(20) & 0xFF == ord('q'): | |
break | |
cv2.destroyAllWindows() | |
- Since we redirect the ffmpeg decoder to std output, the output bytes have been transferred on CPU. Converting color channels is a time-consuming operation compared with reshaping. But as you pointed out, numpy operations are only held on CPU. So we could transfer the bytes to GPU first and then use
t.io.decode_raw
andtf.image.resize
to replace the corresponding numpy and opencv operation for further acceleration. - Stacking is for acceleration because GPU operates image batches with higher efficiency.
@zEdS15B3GCwq
Thanks for clarifying my questions, much appreciated. I've learned a lot from your code.
I was looking for a way to keep FFMPEG's output on the GPU for further processing. Unfortunately, it seems that that's not possible without customising FFMPEG's code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for posting this. I'm trying to put this code to use, but as I've just started to learn about GPU processing, I don't understand some points. I'd appreciate if you could explain them to me.
I'm confused about how many times the data is transferred between the GPU and CPU. I have the impression that the frame data are #1 decoded in GPU, #2 reshaped with numpy in CPU, #3 converted to RGB by TF in GPU, #4 resized and displayed in CPU. Or is numpy able to handle data in the GPU directly?
Why are the frames loaded from two ffmpeg sources and stacked before invoking convert?