Skip to content

Instantly share code, notes, and snippets.

@AndreVallestero
Last active March 18, 2021 15:16
Show Gist options
  • Save AndreVallestero/b08559cdc689d22587f6cf483e87e30f to your computer and use it in GitHub Desktop.
Save AndreVallestero/b08559cdc689d22587f6cf483e87e30f to your computer and use it in GitHub Desktop.
A faster, lower latency frame grabber.
import win32gui, win32ui
from win32con import SRCCOPY
from numpy import fromstring
'''
Optimized to be 6 times faster using the following techniques
- Reuse bitmaps, handles, and device contexts
- Use the application framebuffer instead of the compositor frame buffer(entire desktop)
This is not the fastest method. That would be to directly copy the data from the GPU back buffer
- https://web.archive.org/web/20121205062922/http://www.ring3circus.com/blog/2007/11/22/case-study-fraps/
'''
class FrameGrabber():
def __init__(self, x: float, y: float, w: float, h: float, windowTitle: str = ""):
self.hwnd = win32gui.FindWindow(None, windowTitle) if windowTitle else win32gui.GetDesktopWindow()
win_x1, win_y1, win_x2, win_y2 = win32gui.GetWindowRect(self.hwnd)
win_w = win_x2 - win_x1
win_h = win_y2 - win_y1
self.pos = (
round(x * win_w if 0 < x < 1 else x),
round(y * win_h if 0 < y < 1 else y)
)
self.w = round(w * win_w if 0 < w < 1 else w)
self.h = round(h * win_h if 0 < h < 1 else h)
self.hwnddc = win32gui.GetWindowDC(self.hwnd)
self.hdcSrc = win32ui.CreateDCFromHandle(self.hwnddc)
self.hdcDest = self.hdcSrc.CreateCompatibleDC()
self.bmp = win32ui.CreateBitmap()
self.bmp.CreateCompatibleBitmap(self.hdcSrc, self.w, self.h)
self.hdcDest.SelectObject(self.bmp)
def grab(self):
self.hdcDest.BitBlt((0, 0), (self.w, self.h), self.hdcSrc, self.pos, SRCCOPY)
img = fromstring(self.bmp.GetBitmapBits(True), dtype='uint8')
img.shape = (self.h ,self.w, 4)
# To convert to RGB, use cv2.cvtColor(img, cv2.COLOR_BGRA2RGB)
# This is often unnecessary if simple image filtering is being done
return img
def __del__(self):
self.hdcSrc.DeleteDC()
self.hdcDest.DeleteDC()
win32gui.ReleaseDC(self.hwnd, self.hwnddc)
win32gui.DeleteObject(self.bmp.GetHandle())
@Sentdex
Copy link

Sentdex commented Jan 15, 2021

Thanks! will compare to one of our other updated scripts. I thought I was on the latest version of our screen grabber, but wasn't.

Will also take a peak into the hardware acceleration. Optimizing this grabbing is super important since this is the first step and it dictates the processing speed of literally everything else after it, so whatever is the quickest way, I am all ears :D

@AndreVallestero
Copy link
Author

Optimizing this grabbing is super important since this is the first step and it dictates the processing speed of literally everything else after it, so whatever is the quickest way, I am all ears :D

For the absolute fastest way, I believe copying the data directly from the GPU back buffer is the fastest

I believe this is the technique that Fraps, Nvidia Shadowplay, and Windows Game DVR uses for high quality/frame rate recording with minimal resource usage. I've never attempted this myself but I might give it a go in python if I have some free time over the weekend.

@AndreVallestero
Copy link
Author

AndreVallestero commented Jan 15, 2021

For the absolute fastest way, I believe copying the data directly from the GPU back buffer is the fastest

Did some more digging and it seems like someone has been able to do what I mentioned.

https://github.com/SerpentAI/D3DShot

Unfortunately it's a little out of date and unable to compile Pillow 7 for me on Python 3.9 + Windows, hopefully other people have better luck.

I downgraded to python 3.8 to get D3DShot working but unfortunately, the performance was less than stellar

start_time = time()
for _ in range(1000):
	img = grab_screen_old((0,0,512,512))
print(time() - start_time)
# 7.05

start_time = time()
fg = FrameGrabber(0, 0, 512, 512, "Counter-Strike: Global Offensive")
for _ in range(1000):
	img = fg.grab()
print(time() - start_time)
# 1.68

start_time = time()
d = d3dshot.create(capture_output="numpy")
for _ in range(1000):
	img = d.screenshot((0, 0, 512, 512))
print(time() - start_time)
# 21.48

I haven't taken the time to look through D3DShot's implementation but I'm very surprised to see it perform so poorly. I'll continue to see if I can make my own hardware accelerated implementation and report back my findings.

@AndreVallestero
Copy link
Author

I did some testing and found that the fastest approach to this problem was indeed to copy from the GPU back-buffer. There's an article that goes into great detail on the subject here:

https://web.archive.org/web/20121205062922/http://www.ring3circus.com/blog/2007/11/22/case-study-fraps/

However, doing this approach entirely in Python would likely result in a performance hit and would be quite difficult considering the necessity for granular access of low level system and hardware calls. Maybe in the future I'll consider making a lib in C with bindings for python that is able to do the aforementioned work, but until then, I'll be putting this project on pause until I have some more time on my hands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment