xlab · August 13, 2013 18:42
diff --git a/gistfile1.txt b/gistfile1.txt
 John Carmack on shadow volumes...

 I recieved this in email from John on May 23rd, 2000.

 - Mark Kilgard


 I solved this in a way that is so elegant you just won't believe it.  Here
 is a description that I posted to a private mailing list:

 ----------------------------------------------------------

 I first implemented stencil shadow volumes over two years ago in the
 post-Q2 research period.  They looked great until you flew the viewpoint
 into one of the volumes, and depending on the exact test you used, either
 most of the screen went into negative shadow, or most of the shadows
 disappeared.

 The classic shadow volume works that stencil shadows are derived from
 usually suggest "inverting the test when the view is inside a shadow
 volume".  That is not a robust solution, because a non-zero near clip plane
 will give situations where the plane is not cleanly on one side or the
 other of the view point.  It is also non-trivial to make the "inside a
 shadow volume" determination, especially after silhouette optimizations.

 The conventional wisdom has been that you will need to clip the shadow
 volumes to the view plane and cap with triangles, treating the shadow
 volumes as if they were polyhedrons.

 I implemented the easy cases of this, choosing to project the silhouette
 points to either the far plane of the light's effect or the view plane.
 For the clear-cut cases, this worked fine, allowing you to walk in front of
 a shadowed object, or look directly at it with the light behind it.
 Intermediate cases, where some of the vertexes should project onto the
 light plane and some should project onto the view plane could also be
 handled, but the cost of all the testing was starting to pile up.

 Unfortunately, there are cases when an occluding triangle projects a shadow
 volume that will clip to something other than a triangular prism.  There
 are cases where real, honest volume clipping must take place.

 Anything that requires finding convex hulls in realtime is starting to
 sound like a Bad Idea.

 I sweated over this for a while, with the code getting grosser and grosser,
 but then I had an idea for a different direction.

 It should be possible to let the shadow volumes get clipped off at the view
 plane like they always do, then find the clipped off areas in image space
 and correct them.

 The way to find if a volume has been clipped off is to render the shadow
 volume with depth testing disabled, incrementing for the front faces and
 decrementing for the back faces.  If the stencil buffer ends up with the
 original value, the shadow volume is well formed in front of the view volume.

 My first attempt to utilize this involved a whole bunch of passes to
 determine if it was well formed and combine it with the standard volume
 stencil operations.  It was an interesting experiment with masking and
 anding in the stencil buffer to perform two operations, but it turned out
 that, while it worked for simple shapes, complex shapes needed more
 information from the volume clipping than just "well formed" or not.

 The next iteration involved attempting to "preload" the standard stencil
 shadow algorithm by the number of clipped away planes.  I first drew the
 shadow volumes with depth test disabled, incrementing for back sides and
 decrementing for front sides.  This finishes with a positive value in the
 stencil buffer for each plane that is clipped away at the view plane.  The
 normal depth tested shadow volume is drawn next, with the change polarity
 reversed, decrementing for back sides and incrementing for front sides.
 The areas not equal to the initial clear value are in shadow.

 That works all the time.

 Later, I realized something else.  The algorithm was now basically:

 Draw back sides, incrementing both with depth pass and depth fail.
 Draw front sides, decrementing both with depth pass and depth fail.
 Draw back sides, decrementing with depth pass and doing nothing with depth
 fail.
 Draw front sides, incrementing both with depth and doing nothing with depth
 fail.

 Rearrange the passes and you get:
 Draw back sides, incrementing both with depth pass and depth fail.
 Draw back sides, decrementing with depth pass and doing nothing with depth
 fail.
 Draw front sides, decrementing both with depth pass and depth fail.
 Draw front sides, incrementing both with depth and doing nothing with depth
 fail.

 It is then obvious that they partially cancel each out and can be combined
 into:

 Draw back sides, doing nothing with depth pass and incrementing with depth
 fail.
 Draw front sides, doing nothing with depth pass and decrementing with depth
 fail.

 I was shocked.  I went from feeling pretty clever with my unbalanced
 preloading algorithm (which I would only apply on surfaces that were likely
 to intersect the view plane) to just feeling dumb that I had never seen the
 trivial solution before.  Thinking about operating on depth test fails is a
 bit non-intuitive, but if you work it through a couple times, what is going
 on makes pretty good sense.

 Shadows done this way have none of the "fragile" feel that geometric
 algorithms tend to give.  You can use them for major occluders in the world
 and noclip fly right through them without any problems at all.

 Stencil shadows still aren't cheap by any means.  It can cost 3x the
 triangle count of the source model (although <2x with some optimizations is
 reasonable) per shadowing light, and it can have pathological fill rate
 utilization in some cases, like a light shining out horizontally through a
 jail cell door.  Still, they are quick operations even if there are a lot
 of them.  The vertexes are just bare xyz points without texcoords or color,
 and the fill rate is only to the depth/stencil buffer.

 There are lots of subtleties to actually using this, like making sure your
 shadow volumes are capped on both ends if they need to be (you can often
 optimize away the caps based on culling information), making sure that none
 of the shadow volumes get clipped off by your far clipping plane (which
 would unbalance the count), and all the normal picky silhouette
 optimization issues.

 Depth buffer based shadows still sound like they have a lot of advantages:

 Not much in the way of coding subtleties required.

 The performance is more level (fixed fill rate overhead) and theoretically
 somewhat faster (only one extra drawing of the surface into the shadow
 buffer) in most cases.

 They avoid the silhouette finding work that still needs to be done with the
 shadow volumes (a per-face dot product and some copying), and don't require
 any connectivity information.

 Unfortunately, the quality just isn't good enough unless you use extremely
 high resolution shadow maps (or possibly many offset passes with a lower
 resolution map, although the bias issues become complex), and you need to
 tweak the biases and ranges in many scenes.   For comparison, Pixar will
 commonly use 2k or 4k shadow maps, focused in on a very narrow field of
 view (they assume projections outside the map are NOT in shadow, which
 works for movie sets but not for architectural walkthroughs), along with 16
 jittered samples of the shadow map for each pixel and occasional hand
 tweaking of the bias.

 I still want to research the options for cropping and skewing shadow depth
 buffer projection planes, but I am now positive that the stencil shadow
 architecture works out.


 John Carmack
	John Carmack on shadow volumes...

	I recieved this in email from John on May 23rd, 2000.

	- Mark Kilgard


	I solved this in a way that is so elegant you just won't believe it. Here
	is a description that I posted to a private mailing list:

	----------------------------------------------------------

	I first implemented stencil shadow volumes over two years ago in the
	post-Q2 research period. They looked great until you flew the viewpoint
	into one of the volumes, and depending on the exact test you used, either
	most of the screen went into negative shadow, or most of the shadows
	disappeared.

	The classic shadow volume works that stencil shadows are derived from
	usually suggest "inverting the test when the view is inside a shadow
	volume". That is not a robust solution, because a non-zero near clip plane
	will give situations where the plane is not cleanly on one side or the
	other of the view point. It is also non-trivial to make the "inside a
	shadow volume" determination, especially after silhouette optimizations.

	The conventional wisdom has been that you will need to clip the shadow
	volumes to the view plane and cap with triangles, treating the shadow
	volumes as if they were polyhedrons.

	I implemented the easy cases of this, choosing to project the silhouette
	points to either the far plane of the light's effect or the view plane.
	For the clear-cut cases, this worked fine, allowing you to walk in front of
	a shadowed object, or look directly at it with the light behind it.
	Intermediate cases, where some of the vertexes should project onto the
	light plane and some should project onto the view plane could also be
	handled, but the cost of all the testing was starting to pile up.

	Unfortunately, there are cases when an occluding triangle projects a shadow
	volume that will clip to something other than a triangular prism. There
	are cases where real, honest volume clipping must take place.

	Anything that requires finding convex hulls in realtime is starting to
	sound like a Bad Idea.

	I sweated over this for a while, with the code getting grosser and grosser,
	but then I had an idea for a different direction.

	It should be possible to let the shadow volumes get clipped off at the view
	plane like they always do, then find the clipped off areas in image space
	and correct them.

	The way to find if a volume has been clipped off is to render the shadow
	volume with depth testing disabled, incrementing for the front faces and
	decrementing for the back faces. If the stencil buffer ends up with the
	original value, the shadow volume is well formed in front of the view volume.

	My first attempt to utilize this involved a whole bunch of passes to
	determine if it was well formed and combine it with the standard volume
	stencil operations. It was an interesting experiment with masking and
	anding in the stencil buffer to perform two operations, but it turned out
	that, while it worked for simple shapes, complex shapes needed more
	information from the volume clipping than just "well formed" or not.

	The next iteration involved attempting to "preload" the standard stencil
	shadow algorithm by the number of clipped away planes. I first drew the
	shadow volumes with depth test disabled, incrementing for back sides and
	decrementing for front sides. This finishes with a positive value in the
	stencil buffer for each plane that is clipped away at the view plane. The
	normal depth tested shadow volume is drawn next, with the change polarity
	reversed, decrementing for back sides and incrementing for front sides.
	The areas not equal to the initial clear value are in shadow.

	That works all the time.

	Later, I realized something else. The algorithm was now basically:

	Draw back sides, incrementing both with depth pass and depth fail.
	Draw front sides, decrementing both with depth pass and depth fail.
	Draw back sides, decrementing with depth pass and doing nothing with depth
	fail.
	Draw front sides, incrementing both with depth and doing nothing with depth
	fail.

	Rearrange the passes and you get:
	Draw back sides, incrementing both with depth pass and depth fail.
	Draw back sides, decrementing with depth pass and doing nothing with depth
	fail.
	Draw front sides, decrementing both with depth pass and depth fail.
	Draw front sides, incrementing both with depth and doing nothing with depth
	fail.

	It is then obvious that they partially cancel each out and can be combined
	into:

	Draw back sides, doing nothing with depth pass and incrementing with depth
	fail.
	Draw front sides, doing nothing with depth pass and decrementing with depth
	fail.

	I was shocked. I went from feeling pretty clever with my unbalanced
	preloading algorithm (which I would only apply on surfaces that were likely
	to intersect the view plane) to just feeling dumb that I had never seen the
	trivial solution before. Thinking about operating on depth test fails is a
	bit non-intuitive, but if you work it through a couple times, what is going
	on makes pretty good sense.

	Shadows done this way have none of the "fragile" feel that geometric
	algorithms tend to give. You can use them for major occluders in the world
	and noclip fly right through them without any problems at all.

	Stencil shadows still aren't cheap by any means. It can cost 3x the
	triangle count of the source model (although <2x with some optimizations is
	reasonable) per shadowing light, and it can have pathological fill rate
	utilization in some cases, like a light shining out horizontally through a
	jail cell door. Still, they are quick operations even if there are a lot
	of them. The vertexes are just bare xyz points without texcoords or color,
	and the fill rate is only to the depth/stencil buffer.

	There are lots of subtleties to actually using this, like making sure your
	shadow volumes are capped on both ends if they need to be (you can often
	optimize away the caps based on culling information), making sure that none
	of the shadow volumes get clipped off by your far clipping plane (which
	would unbalance the count), and all the normal picky silhouette
	optimization issues.

	Depth buffer based shadows still sound like they have a lot of advantages:

	Not much in the way of coding subtleties required.

	The performance is more level (fixed fill rate overhead) and theoretically
	somewhat faster (only one extra drawing of the surface into the shadow
	buffer) in most cases.

	They avoid the silhouette finding work that still needs to be done with the
	shadow volumes (a per-face dot product and some copying), and don't require
	any connectivity information.

	Unfortunately, the quality just isn't good enough unless you use extremely
	high resolution shadow maps (or possibly many offset passes with a lower
	resolution map, although the bias issues become complex), and you need to
	tweak the biases and ranges in many scenes. For comparison, Pixar will
	commonly use 2k or 4k shadow maps, focused in on a very narrow field of
	view (they assume projections outside the map are NOT in shadow, which
	works for movie sets but not for architectural walkthroughs), along with 16
	jittered samples of the shadow map for each pixel and occasional hand
	tweaking of the bias.

	I still want to research the options for cropping and skewing shadow depth
	buffer projection planes, but I am now positive that the stencil shadow
	architecture works out.


	John Carmack