This simple example shows that shared memory will persist between kernel launches. I believe this to be a lucky coincidence. This property is not guaranteed in general.
From the CUDA Programming Guide:
The __shared__ qualifier, optionally used together with __device__, declares a variable that:
* Resides in the shared memory space of a thread block,
* Has the lifetime of the block,
* Is only accessible from all the threads within the block.