Skip to content

Instantly share code, notes, and snippets.

@CasiaFan
Created September 8, 2019 16:35
Show Gist options
  • Save CasiaFan/2be30d08d4a3d348c4194f3ba062fb9f to your computer and use it in GitHub Desktop.
Save CasiaFan/2be30d08d4a3d348c4194f3ba062fb9f to your computer and use it in GitHub Desktop.
Use pipe to read ffmped decoded video frames with NVIDIA GPU hardware acceleration
import subprocess as sp
import cv2
import numpy as np
from PIL import Image
import tensorflow as tf
ffmpeg_cmd_1 = ["./ffmpeg", "-y",
"-hwaccel", "nvdec",
"-c:v", "h264_cuvid",
"-vsync", "0",
"-max_delay", "500000",
"-reorder_queue_size", "10000",
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1",
"-f", "rawvideo",
"-pix_fmt", "yuv420p",
"-preset", "slow",
"-an", "-sn",
"-vf", "fps=15",
"-"]
ffmpeg_cmd_2 = ["./ffmpeg", "-y",
"-hwaccel", "nvdec",
"-c:v", "h264_cuvid",
"-vsync", "0",
"-max_delay", "500000",
"-reorder_queue_size", "10000",
"-i", "rtsp://admin:[email protected]/Streaming/Channels/1",
"-f", "rawvideo",
"-preset", "slow",
"-an", "-sn",
"-pix_fmt", "yuv420p",
"-vf", "fps=15",
"-"]
ffmpeg1 = sp.Popen(ffmpeg_cmd_1, stdout=sp.PIPE, bufsize=10)
ffmpeg2 = sp.Popen(ffmpeg_cmd_2, stdout=sp.PIPE, bufsize=10)
class YUV2RGB_GPU():
def __init__(self, w=1920, h=1080):
config = tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.03))
self.y = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
self.u = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
self.v = tf.placeholder(shape=(1, h, w), dtype=tf.float32)
r = self.y+1.371*(self.v-128)
g = self.y+0.338* (self.u-128)-0.698*(self.v-128)
b = self.y+1.732*(self.u-128)
result = tf.stack([b, g, r], axis=-1)
self.result = tf.clip_by_value(result, 0, 255)
self.sess = tf.Session(config=config)
def convert(self, y, u, v):
results = self.sess.run(self.result, feed_dict={self.y:y, self.u: u, self.v: v})
return results.astype(np.uint8)
C = YUV2RGB_GPU()
ffmpegs = [ffmpeg1, ffmpeg2]
while True:
w = 1920
h = 1080
k = w*h
ys = []
us = []
vs = []
for i in ffmpegs:
x = i.stdout.read(int(w*h*6//4)) # read bytes of single frames
y = np.frombuffer(x[0:k], dtype=np.uint8).reshape((h, w))
u = np.frombuffer(x[k:k+k//4], dtype=np.uint8).reshape((h//2, w//2))
v = np.frombuffer(x[k+k//4:], dtype=np.uint8).reshape((h//2, w//2))
u = np.reshape(cv2.resize(np.expand_dims(u, -1), (w, h)), (h, w))
v = np.reshape(cv2.resize(np.expand_dims(v, -1), (w, h)), (h, w))
image = np.stack([y, u, v], axis=-1)
ys.append(y)
us.append(u)
vs.append(v)
image = C.convert(ys, us, vs)
image = np.concatenate(image, axis=0)
image = cv2.resize(image, None, fx=1/2, fy=1/2)
cv2.imshow("im", image)
if cv2.waitKey(20) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
@CasiaFan
Copy link
Author

CasiaFan commented Dec 20, 2019

  1. Since we redirect the ffmpeg decoder to std output, the output bytes have been transferred on CPU. Converting color channels is a time-consuming operation compared with reshaping. But as you pointed out, numpy operations are only held on CPU. So we could transfer the bytes to GPU first and then use t.io.decode_raw and tf.image.resize to replace the corresponding numpy and opencv operation for further acceleration.
  2. Stacking is for acceleration because GPU operates image batches with higher efficiency.
    @zEdS15B3GCwq

@zEdS15B3GCwq
Copy link

Thanks for clarifying my questions, much appreciated. I've learned a lot from your code.

I was looking for a way to keep FFMPEG's output on the GPU for further processing. Unfortunately, it seems that that's not possible without customising FFMPEG's code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment