Skip to content

Instantly share code, notes, and snippets.

@CasiaFan
Created September 8, 2019 16:35
Show Gist options
  • Save CasiaFan/2be30d08d4a3d348c4194f3ba062fb9f to your computer and use it in GitHub Desktop.
Save CasiaFan/2be30d08d4a3d348c4194f3ba062fb9f to your computer and use it in GitHub Desktop.
Use pipe to read ffmped decoded video frames with NVIDIA GPU hardware acceleration
import subprocess as sp
import cv2
import numpy as np
from PIL import Image
import tensorflow as tf
ffmpeg_cmd_1 = ["./ffmpeg", "-y",
"-hwaccel", "nvdec",
"-c:v", "h264_cuvid",
"-vsync", "0",
"-max_delay", "500000",
"-reorder_queue_size", "10000",
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1",
"-f", "rawvideo",
"-pix_fmt", "yuv420p",
"-preset", "slow",
"-an", "-sn",
"-vf", "fps=15",
"-"]
ffmpeg_cmd_2 = ["./ffmpeg", "-y",
"-hwaccel", "nvdec",
"-c:v", "h264_cuvid",
"-vsync", "0",
"-max_delay", "500000",
"-reorder_queue_size", "10000",
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1",
"-f", "rawvideo",
"-preset", "slow",
"-an", "-sn",
"-pix_fmt", "yuv420p",
"-vf", "fps=15",
"-"]
ffmpeg1 = sp.Popen(ffmpeg_cmd_1, stdout=sp.PIPE, bufsize=10)
ffmpeg2 = sp.Popen(ffmpeg_cmd_2, stdout=sp.PIPE, bufsize=10)
class YUV2RGB_GPU():
def __init__(self, w=1920, h=1080):
config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.03))
self.y = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
self.u = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
self.v = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
r = self.y+1.371*(self.v-128)
g = self.y+0.338* (self.u-128)-0.698*(self.v-128)
b = self.y+1.732*(self.u-128)
result = tf.stack([b, g, r], axis=-1)
self.result = tf.clip_by_value(result, 0, 255)
self.sess = tf.Session(config=config)
def convert(self, y, u, v):
results = self.sess.run(self.result, feed_dict={self.y:y, self.u: u, self.v: v})
return results.astype(np.uint8)
C = YUV2RGB_GPU()
ffmpegs = [ffmpeg1, ffmpeg2]
while True:
w = 1920
h = 1080
k = w*h
ys = []
us = []
vs = []
for i in ffmpegs:
x = i.stdout.read(int(w*h*6//4)) # read bytes of single frames
y = np.frombuffer(x[0:k], dtype=np.uint8).reshape((h, w))
u = np.frombuffer(x[k:k+k//4], dtype=np.uint8).reshape((h//2, w//2))
v = np.frombuffer(x[k+k//4:], dtype=np.uint8).reshape((h//2, w//2))
u = np.reshape(cv2.resize(np.expand_dims(u, -1), (w, h)), (h, w))
v = np.reshape(cv2.resize(np.expand_dims(v, -1), (w, h)), (h, w))
image = np.stack([y, u, v], axis=-1)
ys.append(y)
us.append(u)
vs.append(v)
image = C.convert(ys, us, vs)
image = np.concatenate(image, axis=0)
image = cv2.resize(image, None, fx=1/2, fy=1/2)
cv2.imshow("im", image)
if cv2.waitKey(20) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
@zEdS15B3GCwq
Copy link

Thanks for posting this. I'm trying to put this code to use, but as I've just started to learn about GPU processing, I don't understand some points. I'd appreciate if you could explain them to me.

  1. I'm confused about how many times the data is transferred between the GPU and CPU. I have the impression that the frame data are #1 decoded in GPU, #2 reshaped with numpy in CPU, #3 converted to RGB by TF in GPU, #4 resized and displayed in CPU. Or is numpy able to handle data in the GPU directly?

  2. Why are the frames loaded from two ffmpeg sources and stacked before invoking convert?

@CasiaFan
Copy link
Author

CasiaFan commented Dec 20, 2019

  1. Since we redirect the ffmpeg decoder to std output, the output bytes have been transferred on CPU. Converting color channels is a time-consuming operation compared with reshaping. But as you pointed out, numpy operations are only held on CPU. So we could transfer the bytes to GPU first and then use t.io.decode_raw and tf.image.resize to replace the corresponding numpy and opencv operation for further acceleration.
  2. Stacking is for acceleration because GPU operates image batches with higher efficiency.
    @zEdS15B3GCwq

@zEdS15B3GCwq
Copy link

Thanks for clarifying my questions, much appreciated. I've learned a lot from your code.

I was looking for a way to keep FFMPEG's output on the GPU for further processing. Unfortunately, it seems that that's not possible without customising FFMPEG's code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment