2009 - ICCV - Robust visual tracking using L1 minimization

摘要

一句话介绍本文方法：
- a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework.
怎么处理 occlusion 和 corruption？
- occlusion, corruption and other challenging issues are addressed seamlessly through a set of trivial templates.
怎么 detect the target？
- Specifically, to find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates.
- The sparsity is achieved by solving an L1-regularized least squares problem.
- Then the candidate with the smallest projection error is taken as the tracking target.
怎么 tracking？
- After that, tracking is continued using a Bayesian state inference framework in which a particle filter is used for propagating sample distributions over time.
- 意思是本文方法是 detection before tracking 咯，那就是基于前景目标建模方法里面的判别式的咯？
Two additional components further improve the robustness of our approach:
- 1. the nonnegativity constraints that help filter out clutter that is similar to tracked targets in reversed intensity patterns
- 1. a dynamic template update scheme that keeps track of the most representative templates throughout the tracking procedure.
  - This is done by adjusting template weights by using the coefficients in the sparse representation

Introduction

一句话概况本文方法
- develop a robust visual tracking frame work by casting the tracking problem as ﬁnding a sparse approximation in a template subspace
怎么处理 occlusion 和 corruption？
- 作者说是受 SRC 那篇 TPAMI 的启发的
- handling occlusion using trivial templates, such that each trivial template has only one non- zero element (see Fig. 1)
- Then, during tracking, a target candidate is represented as a linear combination of the template set composed of both target templates (obtained from previous frames) and trivial templates.
如何用 sparse coefficient vector 来辨别 target 还是 occlusion？
- The number of target templates are far fewer than the number of trivial templates.
- 如果是 target
  - Intuitively, a good target candidate can be efficiently represented by the target templates.
  - This leads to a sparse coefficient vector, since coefficients corresponding to trivial templates (named trivial coefficients) tend to be zeros.
- 如果是 occlusion
  - In the case of occlusion (and/or other unpleasant issues such as noise corruption or background clutter), a limited number of trivial coefficients will be activated, but the whole coefficient vector remains sparse.
- A bad target candidate, on the contrary, often leads to a dense representation (e.g., Fig. 3).
  - Candidates similar to trivial templates have sparse representations, but they are easily filtered out for their large dissimilarities to target templates.

Related Work

Tracking
- Early works uses a Kalman filter to provide solutions that are optimal for a linear Gaussian model.
- The particle filter, also known as the sequential Monte Carlo method, recursively constructs the posterior probability density function of the state space using Monte Carlo integration.
怎么看待 Tracking
- Tracking can be considered as finding the minimum distance from the tracked object to the subspace represented by the training data or previous tracking results
  - 就是 generative model
- Tracking can also be considered as a classification problem and a classifier is trained to distinguish the object from the background
  - 就是 discriminative model

Particle Filter

particle filter is a Bayesian sequential importance sampling technique for estimating the posterior distribution of state variables characterizing a dynamic system.
It provides a convenient framework for estimating and propagating the posterior probability density function of state variables regardless of the underlying distribution.
It consists of essentially two steps:
- prediction
- update
数学预备
- Let $x_t$ denote the state variable describing the affine motion parameters of an object at time $t$, The predicting distribution of $x_t$ given all available observations $z_{1:t-1} = { z_1, z_2, \cdots, z_{t-1}}$ up to time $t-1$, denoted by $p(x_t|z_{1:t-1})$ is recursively computed as
- 公式（1）其实就是以前学过的全概率公式，只不过以前学的是离散的，这个是连续的
- At time $t$, the observation $z_t$ is available and the state vector is updated using the Bayes rule
  - $p(x_t|z_{1:t})$ 是 posterior，后验概率
  - $p(z_t|x_t)$ denotes the observation likelihood
  - $p(x_t|z_{1:t-1})$ 是 state vector, 由公式（1）得到
  - $p(z_t|z_{1:t-1})$ 是啥？
具体到 particle filter
- 后验概率 $p(x_t|z_{1:t})$ approximated by a finite set of $N$ samples ${ x^i_t }_{i = 1,\cdots, N}$ with importance weights $w_t^i$
- The candidate samples $x^i_t$ are drawn from an importance distribution $q(x_t|x_{1:t-1},z_{1:t})$ and the weights of the samples are updated as
- The samples are resampled to generate a set of equally weighted particles according to their importance weights to avoid degeneracy.
  - 怎么产生 samples 的？
- In the case of the bootstrap filter $q(x_t|x_{1:t-1},z_{1:t}) = p(z_t|x_t)$ and the weights become the observation likelihood $p(z_t|x_t)$
具体到 tracking framework
- apply an affine image warping to model the object motion of two consecutive frames.
- The state variable $x_t$ is modeled by the six parameters of the affine transformation parameters $x_t = (\alpha_1, \alpha_2, \alpha_3, \alpha_4, t_x, t_y)$
  - ${\alpha_1, \alpha_2, \alpha_3, \alpha_4}$ are the deformation parameters
  - $(t_x, t_y)$ are the 2D translation parameters
- $z_t$ 是 region of interest
- state transition distribution $p(x_t|x_{t-1})$ 是 a Gaussian distribution
- assume the six parameters of the affine transformation are independent
- observation model $p(z_t|x_t)$ reflects the similarity between a target candidate and the target templates
  - $p(z_t|x_t)$ 在上面也被叫做 observation likelihood
  - 在本文中 $p(z_t|x_t)$ is formulated from the error approximated by the target templates using $\ell_1$ minimization.

$\ell_1$ Minimization Tracking

Sparse Representation of a Tracking Target

a tracking result $\textbf{y} \in \mathbb{R}^d$ approximately lies in the linear span of $\textbf{T}$

where $\textbf{a} = (a_1,a_2,\cdots,a_n)^T \in \mathbb{R}^n$ is called a target coefficient vector
To incorporate the effect of occlusion and noise, 公式（4） is rewritten as
- $\epsilon$ is the error vector – a fraction of its entries are nonzero.
- The nonzero entries of $\epsilon$ indicate the pixels in $\textbf{y}$ that are corrupted or occluded.
use trivial templates $\textbf{I} = [\textbf{i}_1, \textbf{i}_2, ..., \textbf{i}_d] \in \mathbb{R}^{d \times d}$ to capture the occlusion as
- a trivial template $\textbf{i}_i \in \mathbb{R}^d$ is a vector with only one nonzero entry，也就是说公式（6）中的 $\textbf{I}$ 是一个单位矩阵（identity matrix）
- $\mathbf{e} = (e_1,e_2,\cdots, e_d)^T \in \mathbb{R}^d$ is called a trivial coefficient vector

Nonnegativity Constraints

作者认为 a tracking target can almost always represented by the target templates dominated by nonnegative coefficients
- the templates that are most similar to the tracking target are positively related to the target
- 这些 templates 要保持更新
- nonnegative coefficients 还有一个好处是能够 filter out clutter that is similar to target templates at reversed intensity patterns, which often happens when shadows are involved
如果直接对公式（6）的系数施加非负约束的话，就强迫 trivial templates 也要正相关的，但实际上 trivial templates 也可能是负相关的，只有 target templates 才是正相关，作者对此的解决办法是添加 negative trivial templates，model (6) 可以重写为

Achieving Sparseness through $\ell_1$ Minimization

求解稀疏问题
- 前 10 个 templates 是 target templates，后面 360 个 templates 是 trivial templates
- 如果是 good target candidate，那么就跟右上方这幅画一样，target templates 对应的系数非常突出
- 如果是 bad target candidate，那么就跟右下方一样，trivial templates 对应的系数非常突出
find the tracking result by ﬁnding the smallest residual after projecting on the target template subspace, i.e., $\Vert \textbf{y} - \textbf{Ta} \Vert_2$

Template Update

为什么要 template update
- A fixed appearance template is not sufficient to handle recent changes in the video
- If we do not update the template, the template cannot capture the appearance variations due to illumination or pose changes.
update too often 也不行
- If we update the template too often, small errors are introduced each time the template is updated.
- The errors are accumulated and the tracker drifts from the target
还给每个 target template 赋了一个权重，说实话，我没看懂
- $\ell_1$ favors the template with larger norm because of the regularization part $\Vert \textbf{c} \Vert_1$
- The larger norm of $\Vert \textbf{t}_i \Vert$ is, the smaller coefficient $a_i$ is needed in the approximation $\Vert \textbf{y-Bc}\Vert_2$
- 作者 exploit the characteristic by introducing a weight $w_i = \Vert \textbf{t}_i \Vert_2$ associated with each template $\textbf{t}_i$
- Intuitively, the larger the weight is, the more important the template is.
  - 感觉这个没有道理啊，照理说，所有样本都应该归一化才对啊
最初的 target templates 是怎么产生的？
- At initialization, the first target template is manually selected from the first frame and applied zero-mean-unit-norm normalization.
  - 不是做了 unit norm 了么，为啥 $\Vert \textbf{t}_i \Vert$ 还会不一样？
- The rest target templates are created by perturbating one pixel in four possible directions at the corner points of the first template in the first frame.

updating scheme

template replacement
- If the tracking result $\textbf{y}$ is not similar to the current template set $\textbf{T}$, it will replace the least important template in $\textbf{T}$ and be initialized to have the median weight of the current templates.
- 怎么判断是不是 similar 啊？
template updating
weight updating
- The weight of each template increases when the appearance of the tracking result and template is close enough and decreases otherwise.
- tempalte 跟当前的 target 相近，就增加权重

我自己的总结

说白了就是一个 template 做字典，然后求解 $\ell_1$ minimization 感觉不难啊
对我来说新的信息量就是 trivial templates 的使用，以及 weight update，不过 weight update 感觉没看懂
还有的问题就是，你的 particle filter 怎么最后没有用啊？

@article{Mei2009RobustVT,
  title={Robust visual tracking using ℓ1 minimization},
  author={Xue Mei and Haibin Ling},
  journal={2009 IEEE 12th International Conference on Computer Vision},
  year={2009},
  pages={1436-1443}
}

YimianDai/Mei2009RobustVT.md

2009 - ICCV - Robust visual tracking using L1 minimization

摘要

Introduction

Related Work

Particle Filter

$\ell_1$ Minimization Tracking

Sparse Representation of a Tracking Target

Nonnegativity Constraints

Achieving Sparseness through $\ell_1$ Minimization

Template Update

updating scheme

我自己的总结