These are notes from a one-day project to test a hunch. The idea is to train a convolutional neural network to remove speckle from sar (synthetic aperture radar) using only one other observation – with its own speckles – as the target. This method does not come close to state of the art despeckling, and can be biased by the skewed distribution of noise in a way that makes it useless for quantitative research. However, I hadn’t noticed it in the literature and I think it’s kind of funny, so I’m writing it up.
Everything here is about Sentinel-1 L1 GRD-HD data, since it’s what I used, since it’s free.
Sar observations contain speckle, a form of interference related to the sparkles in reflected laser light. By some definitions speckle is not noise, since it’s physically real outside the sensor and contains information, but we will treat it as noise. Speckle is (close enough to) independent between radar chirps, a.k.a. looks, and even its distribution is hard to model. The data we’re using is multilook and has already had some speckle averaged out of it.
Speckle looks like this:
The vh polarization from the most recent S1A pass over Ulsan shipyard. This is (and all further images will be) reprojected to EPSG:3857 at a resolution of 15 × 15 m with cubic resampling, then arbitrarily scaled for display. It’s slightly upsampled to ensure that downsampling doesn’t hide speckle.
In sweeping terms, given some data (x) corrupted by noise (e), to train a model to provide a clean version (ŷ), you want at least one of:
-
A statistical definition of what good data looks like, e.g. using FFT or TV. Then you can train a model to maximize the quality score while minimizing the difference from the input. It also helps to know about the noise; for example, if you know that it’s additive Gaussian, you can say “and the modeled noise that you’re removing (ê I guess) should have the same local mean and variance across the whole image” as an extra loss.
-
Actual clean observations, a.k.a. labels, targets, ground truth, etc. (y).
From what I’ve read, 1 is tough in sar, particularly because of speckle’s weird distributions. People do take route 1 and succeed, but it’s hard work. We are not interested in hard work here; we want something that converges overnight.
And 2 is tough because the obvious approach is to average more looks, but Earth’s surface changes over the timescale of repeat visits. The “raw” backscatter data already contains all the looks available.
If we stack the latest 16 vh observations of Ulsan and take an interquartile mean (which is not ideal when we don’t know the noise distribution, but I won’t tell anyone if you won’t), it’s much smoother:
I enjoy looking at the speckly one, finding something just on the edge of interpretabilty (topology in the hills, for example, or the circular fuel tanks in the lower left), and checking it against this smoothed version. The diff of this and the single observation looks like:
Normalized, gray-centered difference; brighter here means brighter in the long-term IQM. The difference is after scaling for display, so don’t go drawing any big conclusions about speckle distribution.
The black ships are in the single observation but not in the long-term average; a few bright ships were evidently there for more than a quarter of the average’s timespan, but not in the most recent pass. Some other changes look like construction or huge parking lots filling or emptying. And of course the speckle overlies everything.
The problem with using this pair (the single observation and the long-term average) as an x and y for training is that we can’t expect the model to learn to travel through time. There’s information in the smooth version that is not available (or guessable) in the single version. Looking at it from the other end, we don’t want the final model to put ships in the y spots given x as input; that would be a form of overfitting. We want the ships despeckled in place.
Some components of ship motion are predictable on some timescales and some situations, but here, it contains an unlearnable and effectively random factor, and asking the model to learn that is likely to lead to problems as it tries to generalize from basically meaningless relationships.
As for the ships, so for tide flats, changes in soil moisture, reservoir levels, rails being laid, and so on.
Boiling this down, to make a smoother target image, we have to add more information to it – information that will not appear in the model’s inputs. This is likely to lead to the model learning false patterns. For example, if seasonal changes in vegetation mean that x is significantly brighter than y, say because x was collected in spring but y is from a full year, we’re wrongly teaching the model to brighten all vegetation.
(Incidentally, I do think there’s another and very different approach to despeckling here, and I even dropped a tiny hint at it, but it’s not ready for a writeup.)
So let’s go to the other extreme. Instead of trying to extend the timespan that the y covers in order to make it smoother, let’s narrow it to the shortest possible: only the next observation before or after x.
“But wait”, you say, “that means y is full of noise, which is exactly what we’re trying to avoid! You just argued that y shouldn’t contain totally unpredictable features, and speckle is unpredictable!”
Fair points. But speckle is way more pervasive than ships. Ships appear in particular contexts that it’s plausible a network would try to learn (i.e., growing a deep ship module). Speckle is also fully independent from x to y. This means that any network of plausible size is unable to (over)fit it as a matter of scale.
What we end up with, intuitively, is a network that can only ever make it halfway from x to y – it can erase x’s speckle, but can’t learn to create y’s speckle. But halfway to that nominal goal of y is what we want: x despeckled to the point where we only see what x and y have in common, which is as close as we can reasonably come to a ground truth in this framework.
This is certainly not a new idea. Noisy labels are used all the time for training artificial neural networks, even sometimes on purpose when there are clean labels available. And I’m sure people have done this kind of noisy-to-noisy training to despeckle sar backscatter images before. But I haven’t seen it, so I was eager to try, and frankly surprised it worked right off the bat.
I trained the network for about 100 epochs on a dataset of 1024×2 pairs, then another 100 or so on 1024×32 pairs, all with L1 loss. (That’s not a super thoughtful training schedule – that’s me seeing if it tends to improve, then leaving it overnight.) Here’s the vh:
Here’s the IQM again, for comparison:
There are definitely spots that the IQM renders more smoothly or clearly, but overall I think the model output is actually less speckly, which surprised me. Overall, I feel comfortable saying that on this sample, at least, the network makes something about as smooth and detail-preserving as the IQM does, without time travel.
This is a vanilla CNN without even skip connections, let alone stuff like pixelwise attention that I suspect would contribute a lot. It’s literally the first one I tried. It uses pixel shuffles to “downsample” (as discussed here, since it’s basically just the core of a pansharpening network, because that’s what I had handy).
Right now I’m training on two observations (this one and this one, reprojected as above and cropped to this footprint), chosen for a mix of landcovers and to avoid large water bodies.
It’s training in both directions, with the 07-32 observation as x and the 08-04 observation as y in half the pairs, and vice versa in the other half.
I’m normalizing with np.log1p
(then scaling to µ = 0, σ = 1), which is implicitly an estimate of the speckle distribution; as far as it’s wrong, which it must be, the outputs presumably have biases and therefore wouldn’t be radiometrically rigorous even if they were perfectly despeckled in a narrow sense.
Water is a real problem here. Because it completely and unlearnably changes textures, in principle it really oughtn’t appear in training data. But this means the model has no idea how to handle it, and in particular its outlier neighborhood in vv/vh space. I don’t see how to fix this in a way as scrappy as the rest of what I’m describing here; I think this is the line beyond which you have to do actual hard work.
I’ll leave you with something I’ve been wondering.
Each x and y has both polarizations, vv and vh. I hope and expect that the network is using these bands (please don’t be mad I called them bands), whose speckles are independent, to fill each other in. There are classical despeckling methods that do this – in pseudocode, some fancier version of:
mean = (vv + vh)/2
vv_diff = blur_or_median_or_whatever(vv - mean)
vv_out = mean + vv_diff
# then same for vh
And you can do much the same thing with PCA, or take any of a range of more sophisticated statistical approaches. So is a small network doing noisy-to-noisy training learning to do some simplified form of this? Beats me. A starting point would be to set one polarization to pure noise and see how much it affects the output.
Forgot – here’s a normalized diff of the model’s ŷ v. the x:
Again, this is after nonlinear scaling, so the fact that it correlates with brightness might or might not mean anything real.