NC - 2011 - Recent advances and trends in visual tracking A review

Introduction

对 tracking 算法的要求

Robustness:
- Robustness means that even under complicated conditions, the tracking algorithms should be able to follow the interested object. The tracking difﬁculties may be cluttered back- ground, partial and full changing illuminations, occlusions or complex object motion.
Adaptivity:
- Additional to various changes of the environment that an object is located in, the object itself also undergoes changes. This requires a steady adaptation mechanism of the tracking system to the actual object appearance.
  - robustness 是对外界 robust，adaptivity 是能够 handle 自身变化
Real-time processing:
- A system that needs to deal with live video streams must have high processing speed. Thus, a fast and optimized implementation as well as the selection of high performance algorithms is required. The processing speed depends on the speed of the observed object, but to achieve a smooth output video impression for human eyes, a frame-rate of at least 15 frames per second has to be established.

How does visual tracking work?

In a word, most visual tracking methods include image input, appearance feature description, context information integration, decision and modal update, as shown in Fig. 1.

First, we need a description for the object to be tracked.
- This can, for example, be a template image of the object, a shape, texture or color model or something alike.
- Building such an initial object description is a very critical and hard task, because the quality of the description directly relates to the quality of the tracking process.
Second, objects are usually embedded into certain context.
- In visual tracking, many temporary, but potentially very strong links exist between the tracked object and the rest of the image. （要摒除这个干扰）
- Appropriate integration of such context information into a tracking framework will substantially beneﬁt the research of visual tracking.
Moreover, even having a good object description available a priori or established during runtime, adaptivity to appearance changes is necessary to achieve tracking robustness.
- To handle such variations, the object model needs to be adjusted to the new circumstances from time to time.
- The major problem of building such an adaptive tracking system is the degradation of the appearance model caused by the inaccuracy in the estimation of the foreground and background.
- Most commonly the foreground and background are divided by a bounding box or a region around the location of the object. No matter how tight the region is, such a partition is too rough because some background regions are treated as a part of the foreground, especially when the location of the object is not precise or the object is occluded. This problem is called the Drifting Problem [9].
  - 这个问题在小目标上也存在，对 MPCM 这样基于 patch 的HVS 方法和 PCA、sparse representation 方法都存在

Particle Filtering

Due to the great success of Particle Filtering [12], also known as sequential Monte Carlo methods (SMC), visual tracking has been formulated as a problem of Bayesian inference in state space.
Compared with the regular exhaustive search-based methods, the main advantage of the use of a particle filter is the reduction of sampling patches during tracking.
Another benefit of the particle filter is that the sampling effort can be kept constant, independent to the size of the object to track which is not the case with simply expanding the search region around the object with a fixed factor.
Despite its great success, Particle Filtering often suffered from the sample impoverishment problem [12], which is due to the suboptimal sampling technique. Therefore, introducing more advanced Monte Carlo sampling methods would greatly elevate the visual tracking performance.

Feature descriptors for visual tracking

In general, the most desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space.
- 但实际上办不到，这就是 RIPT，SR-MPCM 的 motivation
Visual detection is difficult because the object appearance may vary due to many factors, including viewpoint, occlusion, illumination, texture, and articulation.
- 小目标也多变，这点对小目标也成立
This has motivated the invention of different image features that capture different characteristic properties.
- RIPT，SR-MPCM 就是这样想的，但貌似小目标都是 pop-out featue 或者说是 contrast
Some existing methods for object detection base their detectors on a single type of features. Others try to integrate multiple feature types to improve performance.
- 这句话可以用在我写 SR-MPCM 上
In fact, any feature descriptor used for visual detection can be adapted for visual tracking.
- 所以单帧检测的 feature 也可以用在 tracking 上
- 这也就是为什么下面前三类特征都没有涉及到 temporal，第四类才是 spatio-temporal features

Gradient features

Generally speaking, there are two categories of gradient features.

One main category of gradient based methods is to use shape / contour to represent objects
Another main category is to use the statistical summarization of the gradients.
- SIFT、SURF、HOG

Color features

color descriptors are robust against certain photometric changes
- why？
- The apparent color of an object is influenced primarily by two physical factors
  - (1) the spectral power distribution of the illuminant
  - (2) the surface reflectance properties of the object.
color descriptors can be categorized into
- novel histogram-based color descriptors
- SIFT-based color descriptors

histogram-based color descriptors

In the HVS color space, it is known that the hue becomes unstable near the grey axis.
the certainty of the hue is inversely proportional to the saturation.
Therefore, the hue histogram is made more robust by weighing each sample of the hue by its saturation.
- 要把 hue 和 saturation 相乘啊，因为两者刚好是反比，所以乘积具有某种 invariant 性质么？
The H color model is therefore scale-invariant and shift-invariant with respect to light intensity.

SIFT descriptor

The SIFT descriptor is not invariant to light color changes, because the intensity channel is a combination of the R, G and B channels.
改进的 CSIFT descriptor that is scale-invariant with respect to light intensity.

Texture features

Texture is a measure of the intensity variation of a surface which quantifies properties such as smoothness and regularity
- Gabor filters、LBP 等
Weber’s Law
human perception of a pattern depends not only on the change of a stimulus (such as sound, lighting) but also on the original intensity of the stimulus.
- the perceived change in stimuli is proportional to the initial stimuli
Weber's law also incorporates the Just Noticeable Difference (JND). This is the smallest change in stimuli that can be perceived. As stated above, the JND is proportional to the initial stimuli. Fechner found that the JND is constant for any sense.
Weber contrast
Fechner's law
- Perceived loudness/brightness is proportional to logarithm of the actual intensity measured with an accurate nonhuman instrument.
- The relationship between stimulus and perception is logarithmic.
- This logarithmic relationship means that if a stimulus varies as a geometric progression (i.e., multiplied by a fixed factor), the corresponding perception is altered in an arithmetic progression (i.e., in additive constant amounts).
  - 实际是乘数效应，但只能察觉到加数效应

Spatio-temporal features

Multiple features fusion

biological features

小结

tremendous progress has been made in this area. However, no single feature descriptor is robust and efﬁcient enough to deal with all kinds of situations.
- the HOG descriptor focuses on edges and structures, ignores flat areas, thus fails to deal with noisy edge regions.
- LBP 的缺点是 sensitive to noise 因为是 thresholding operation when comparing the neighboring pixels
- Color features represent the global information of images, which are relatively independent of the viewing angle, translation, and rotation of the objects and regions of interest. However, objects with the same color histogram may be completely different in texture, thus color histogram cannot provide enough information.
How to combine various kinds of features into a coherent framework needs much more study.
- 这一节的 motivation，要 coherent framework
Besides, deeper understanding of human vision principles would also enormously beneﬁts feature descriptor research.
- 这就是 HVS 方法大行其道的 motivation

Online learning based tracking methods

For visual tracking, handling appearance variations of a target object is a fundamental and challenging task.
In general, there are two types of appearance variations:
intrinsic variation
- Pose variation and/or shape deformation of a target object are considered as the intrinsic appearance variations
extrinsic variation
- extrinsic variations are due to the changes resulting from different illumination, camera motion, camera viewpoint, and occlusion
- These variations can only be handled with adaptive methods which are able to incrementally update their representations. （研究 on-line algorithms 的 Motivation）
- Thus there is an essential need for on-line algorithms that are able to learn continuously.
Generally, on-line algorithms can be divided into two categories:
- Generative methods
- Discriminative methods

Generative online learning methods

Generative methods, which are used to learn the appearance of an object, have been exploited to handle the variability of a target.
The object model is often updated online to adapt to appearance changes.
**缺点：**generative methods would easily fail within cluttered background.

Discriminative online learning methods

Discriminative methods for classification have also been exploited to handle appearance changes during visual tracking, where a classifier is trained and updated online to distinguish the object from the background.
This method is also termed as tracking- by-detection, in which a target object identified by the user in the first frame is described by a set of features.
A separate set of features describes the background, and a binary classifier separates target from background in successive frames.
- 也可以描述红外小目标那里的 PCA 和 sparse representation 方法
Motion constraints restrict the space of boxes to be searched for the target.
**缺点：**A major shortcoming of discriminative methods is their noise sensitivity

Drifting Problem

什么是 Drifting Problem

Despite its high efficiency, online adaption faces one key problem: Each update of the tracker may introduce an error which, finally, can lead to tracking failure

Combined method

In fact, how to combine the generative machine learning methods and discriminative machine learning methods into a coherent framework is a classic question within machine learning field and needs more research.
Besides, how to achieve a better balance between adaptivity and stability when using online learning methods is still an open problem.
- 感觉上面两句话对小目标也是超级有启发

Integration of context information

Numerous psychophysics studies have shown the importance of context for human object recognition and detection.
- 所以 context 非常重要，为了 achieve great improvements，一定要 integrate context information into the visual tracking framework
Objects are always embedded into certain context.

Monte Carlo sampling

Assumption

Visual tracking usually can be formulated as a graphical model and involves a searching process for inferring the motion of an object from uncertain and ambiguous observations.
If the state posterior density is a Gaussian
- Kalman Filter [114], Extended Kalman Filter [114] or Unscented Kalman Filter [115] can be used to ﬁnd the optimal/suboptimal solution.
However, most real tracking problems are usually nonlinear and non-Guassian
Particle Filtering [12] is proposed to deal with this situation by Monte Carlo simulation.

Particle Filtering

Key idea

The key idea of Particle Filtering is to represent the required posterior density function by a set of random samples with associated weights.

缺点

Although Particle Filtering has achieved considerable success in tracking literature, it is faced with a fatal problem due to its suboptimal sampling mechanism in the importance sampling process and thus leads to the well-known sample impoverishment problem.

Conclusion and future directions

Visual tracking is a kind of motion analysis at the object level, which consists of two major components:
- object representation
- temporal ﬁltering

  @article{Yang2011RecentAA,
  title={Recent advances and trends in visual tracking: A review},
  author={Hanxuan Yang and Ling Shao and Feng Zheng and Liang Wang and Zhan Song},
  journal={Neurocomputing},
  year={2011},
  volume={74},
  pages={3823-3831}
}

YimianDai/Yang2011RecentAA.md

NC - 2011 - Recent advances and trends in visual tracking A review

Introduction

对 tracking 算法的要求

How does visual tracking work?

Particle Filtering

Feature descriptors for visual tracking

Gradient features

Color features

histogram-based color descriptors

SIFT descriptor

Texture features

Spatio-temporal features

Multiple features fusion

biological features

小结

Online learning based tracking methods

Generative online learning methods

Discriminative online learning methods

Drifting Problem

什么是 Drifting Problem

Combined method

Integration of context information

Monte Carlo sampling

Assumption

Particle Filtering

Key idea

缺点

Conclusion and future directions