Last active
October 8, 2016 16:59
-
-
Save rygorous/9124356 to your computer and use it in GitHub Desktop.
On "Understanding Sources of Inefficiency in General-Purpose Chips"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
My problems with the paper: | |
- There is no comparison of resulting video quality. The amount of encode time (and power | |
expended) to produce a H.264 bit stream *dramatically* depends on the desired quality level; | |
e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the | |
difference between the fastest and best quality settings is close to 2 orders of magnitude | |
in both speed and power use. This is not negligible! | |
[NOTE: This is excluding quality-presets like "placebo", which are more demanding still. | |
Even just comparing between different settings usable for real-time encoding, we still have | |
at least an order of magnitude difference.] | |
- They have their encoder, which is apparently based on JM 8.6 (*not* a good encoder!), for | |
the SW implementation they use a H.264 encoder by Intel that I do not know (but running | |
on a P4 2.8GHz), and for the ASIC they have an ASIC from 2006. These are three different | |
impls, at three different quality targets, that are not accounted for in the paper. | |
- You can be fairly certain that the ASIC is targeting reasonable quality and using more or | |
less current algorithms. The same cannot be said for their solution; as a result, we do | |
know how perf/W improved from their changes, but we do not actually know | |
1. how the resulting perf/W actually compares against the ASIC | |
(resulting quality may be way worse, or better, we have no idea.) | |
2. whether the perf/W gains were actually relevant; an efficient HW impl of a sub-par | |
algorithm will beat the corresponding SW version, but how big would the gains be | |
had the SW version (without the added instrs etc.) been better to begin with? | |
I do agree that this kind of HW/SW codesign is interesting. I just wanted to point out | |
that, for the application they've chosen, their perf metrics indicate that they are using | |
subpar algorithms (which are inefficient, but also amenable to a HW implementation | |
that has better perf/W due to lower overhead). This exaggerates the gains they get from | |
specialized instructions in this case. Furthermore, because they do not evaluate the | |
quality of the resulting video (and because both encode time and power scales with the | |
quality of encoding!), their comparisons with the ASIC/SW implementations are essentially | |
meaningless. | |
In short, while I like the idea, I'm very doubtful about the execution, and all the | |
conclusions drawn from it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment