rygorous · October 8, 2016 16:59
diff --git a/gistfile1.txt b/gistfile1.txt
 My problems with the paper:
 - There is no comparison of resulting video quality. The amount of encode time (and power
  expended) to produce a H.264 bit stream *dramatically* depends on the desired quality level;
  e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the
  difference between the fastest and best quality settings is close to 2 orders of magnitude
  in both speed and power use. This is not negligible!
  [NOTE: This is excluding quality-presets like "placebo", which are more demanding still.
  Even just comparing between different settings usable for real-time encoding, we still have
  at least an order of magnitude difference.]
 - They have their encoder, which is apparently based on JM 8.6 (*not* a good encoder!), for
  the SW implementation they use a H.264 encoder by Intel that I do not know (but running
  on a P4 2.8GHz), and for the ASIC they have an ASIC from 2006. These are three different
  impls, at three different quality targets, that are not accounted for in the paper.
 - You can be fairly certain that the ASIC is targeting reasonable quality and using more or
  less current algorithms. The same cannot be said for their solution; as a result, we do
  know how perf/W improved from their changes, but we do not actually know
  1. how the resulting perf/W actually compares against the ASIC
     (resulting quality may be way worse, or better, we have no idea.)
  2. whether the perf/W gains were actually relevant; an efficient HW impl of a sub-par
     algorithm will beat the corresponding SW version, but how big would the gains be
     had the SW version (without the added instrs etc.) been better to begin with?

 I do agree that this kind of HW/SW codesign is interesting. I just wanted to point out
 that, for the application they've chosen, their perf metrics indicate that they are using
 subpar algorithms (which are inefficient, but also amenable to a HW implementation
 that has better perf/W due to lower overhead). This exaggerates the gains they get from
 specialized instructions in this case. Furthermore, because they do not evaluate the
 quality of the resulting video (and because both encode time and power scales with the
 quality of encoding!), their comparisons with the ASIC/SW implementations are essentially
 meaningless.

 In short, while I like the idea, I'm very doubtful about the execution, and all the
 conclusions drawn from it.
	My problems with the paper:
	- There is no comparison of resulting video quality. The amount of encode time (and power
	expended) to produce a H.264 bit stream dramatically depends on the desired quality level;
	e.g. for x264 (state of the art SW encoder, already in 2010 when the paper was written), the
	difference between the fastest and best quality settings is close to 2 orders of magnitude
	in both speed and power use. This is not negligible!
	[NOTE: This is excluding quality-presets like "placebo", which are more demanding still.
	Even just comparing between different settings usable for real-time encoding, we still have
	at least an order of magnitude difference.]
	- They have their encoder, which is apparently based on JM 8.6 (not a good encoder!), for
	the SW implementation they use a H.264 encoder by Intel that I do not know (but running
	on a P4 2.8GHz), and for the ASIC they have an ASIC from 2006. These are three different
	impls, at three different quality targets, that are not accounted for in the paper.
	- You can be fairly certain that the ASIC is targeting reasonable quality and using more or
	less current algorithms. The same cannot be said for their solution; as a result, we do
	know how perf/W improved from their changes, but we do not actually know
	1. how the resulting perf/W actually compares against the ASIC
	(resulting quality may be way worse, or better, we have no idea.)
	2. whether the perf/W gains were actually relevant; an efficient HW impl of a sub-par
	algorithm will beat the corresponding SW version, but how big would the gains be
	had the SW version (without the added instrs etc.) been better to begin with?

	I do agree that this kind of HW/SW codesign is interesting. I just wanted to point out
	that, for the application they've chosen, their perf metrics indicate that they are using
	subpar algorithms (which are inefficient, but also amenable to a HW implementation
	that has better perf/W due to lower overhead). This exaggerates the gains they get from
	specialized instructions in this case. Furthermore, because they do not evaluate the
	quality of the resulting video (and because both encode time and power scales with the
	quality of encoding!), their comparisons with the ASIC/SW implementations are essentially
	meaningless.

	In short, while I like the idea, I'm very doubtful about the execution, and all the
	conclusions drawn from it.