Skip to content

Instantly share code, notes, and snippets.

@YimianDai
Created July 30, 2019 18:12
Show Gist options
  • Save YimianDai/37038d0d2acd17873556877520fa07df to your computer and use it in GitHub Desktop.
Save YimianDai/37038d0d2acd17873556877520fa07df to your computer and use it in GitHub Desktop.
2015-TCSVT-Background Prior-Based Salient Object Detection via Deep Reconstruction Residual

TCSVT-2015-Background Prior-Based Salient Object Detection via Deep Reconstruction Residual

Abstract

原先方法的 assumption

  • Based on the assumption that foreground salient regions are distinctive within a certain context, most conventional approaches rely on a number of hand-designed features and their distinctiveness is measured using local or global contrast.
  • 小目标也是这样,target assumption 是 distinctive within a certain context
  • 不管是 local contrast measure 还是 IPI model 都是 hand-designed features

原先方法的 不足

  • Although these approaches have been shown to be effective in dealing with simple images, their limited capability may cause difficulties when dealing with more complicated images.

Introduction

潜在的假设以及要解决的问题:

Based on the underlying hypothesis that the salient stimulus is distinct from its contextual stimuli, most existing saliency detection models need to solve two key problems:

    1. extract effective features to represent the image
    1. develop an optimal mechanism to measure the distinctiveness over the extracted features.
    • 哈哈,所以 HVS 做小目标检测也可以拆成 feature extraction (往往是均值)和 distinctiveness measure 两部分,这个在 MPCM 里面体现的就非常明显,feature extraction 是计算均值,distinctiveness measure 是计算相乘什么的,总之目前 HVS 方法把这两者混在一起了,把其上升到新的框架就可以对其更进一步了。

hand designed feature 的缺点

All these feature representations are hand designed and require significant amounts of domain knowledge. However, hand-designed features in general suffer from poor generalization capability for different images, especially due to a lack of thorough understanding of the biological mechanisms and principles of human visual attention as well as weak human intuition involved.

为什么不用 sparse coding, and low-rank matrix recovery?

Nevertheless, due to the shallow-structured architectures used, these methods still have limited representational power and are insufficient to capture high-level information and latent patterns of complex image data.

本文意图

investigate the feasibility of learning more powerful representation directly from the raw image data themselves in an unsupervised way for the task of saliency detection.

传统的 saliency detection 做法

The saliency or distinctiveness is typically measured by image contrast computation over features, where various contrast measures have been presented. Depending on the extent of the context in which the contrast is calculated, these approaches can be classified into local-contrast-based methods and global-contrast-based methods.

  • Local-contrast- based methods estimate the saliency of an image pixel or an image patch by calculating the contrast against its local neighborhood, and some representative local methods include the center-surround difference [5], [6], [12], [13], incremental coding length [10], and self-resemblance [14].
  • Global- contrast-based methods characterize the saliency of an image region as the uniqueness in the entire image. Previous works found in the literature have proposed a variety of approaches to model the global contrast from different perspectives. To be specific, in [15] and [16], the global contrast is derived in the frequency domain with the hypothesis that salient regions are normally less frequent.

传统方法的缺点

In spite of extensive efforts, local and global contrast- based approaches still suffer from some drawbacks.

  • First, these approaches normally can highlight only object boundaries but fail to detect the whole target region uniformly as shown in the examples given in Fig. 1. This problem may be alleviated in some global-contrast-based methods while the results yielded are still unsatisfactory.
    • 可以用作我 weighted 的 motivation 那里
  • Second, although the salient objects often present high contrast, the inverse might unnecessarily be true [11]. In many complex images (as shown in the third example of Fig. 1), the background contains small-scale high- contrast patterns, which may lead to previous contrast-based methods fail in such cases.
    • 可以用作 rare structure effect 的成因那里

传统 saliency detection 不好的本质原因

Essentially, the true aim of salient object detection is to find objects that are distinctive from the image background. It needs to calculate the contrast between the objects and the image background and then select those with high contrast as salient objects.

However, the local and global contrast- based methods do not identify which regions form the image background. They blindly assume the neighboring regions or the entire image to be the background and then calculate the contrast between each location and the assumed back- ground. As their assumed background may not be the real one, the determined contrast also becomes incorrect, which in turn reduces the performance of saliency detection.

目前的 background prior-based methods 仍然不好的原因

没有考虑到下面四种情形,说白了就是 人工增加的 prior 其实考虑是不全的

    1. The entire image boundary is a large and smoothly connected region (see the first row of Fig. 1).
    • target 没有 well defined boundary,target 和 background 的界限是缓变的
    1. The regions defined within the image boundary look different, whereas they may share certain latent pattern (see the second row of Fig. 1).
    • 是说 目标和背景中的某些成分的 shape 等的 pattern 其实是一样的
    1. The background is complex (for example, containing small-scale high-contrast patterns) and the regions of image boundary are different, as shown in the third row of Fig. 1.
    • 背景本身比较复杂,含有高对比度成分,这个在小目标检测里存在,而且更加突出,因为小目标比较弱,所以背景成分的 contrast 不需要特别 high 都会很突出
    • rare structure effect 的本质就是 mishighlighting the small-scale high-contrast background regions in the saliency maps.
    1. Salient objects significantly touch the image boundary and parts of them are wrongly considered as the back- ground, as shown in the fourth row of Fig. 1.
    • 就是目标比较大,目标内部含有突变边缘,并不符合 匀质假设

本文工作

本文仍然是 一种 background prior-based method,本文的创新点在于 用 stacked denoising autoencoder 来 model image background,好处是

  • 不需要再 hand-designed features ,直接无监督地从 raw data 中学习更加 powerful 的 representation
  • enables capturing the latent pattern of the input data hierarchically
  • 这一点能够使得算法可以克服上面的 第二点 原因,也是我想用 representation learning 来做小目标检测的 motivation

本文是怎么度量显著性的?

the measure of contrast between the salient objects and the background is formulated as the reconstruction residuals in the deep-structured SDAE.

其实本文和传统 saliency detection 的区别也就是 IPI 和 HVS 方法的区别

Unlike the previous works [18], [19], which mainly focused on the way to calculate the similarity or distinctiveness between a certain image patch and the image boundary, this paper pays more attention to modeling the background regions.

Proposed approach

为什么要 deep architecture

This deep architecture allows the SAE to learn more complex mapping from the input to hidden representations and capture the latent patterns which reflect the most homogametic property shared among the training data.

本文用的还不是 SAE,而是 SDAE,下面这篇文章

  • P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, pp. 3371–3408, Jan. 2010.

本文假设

  • 为了获得更加精确的 object - background contrast,assume that the image boundary is mostly background
    • 假设图像边缘主要都是背景,其实就是 center bias,其实这一点对小目标也有帮助,虽然小目标出现在哪里都有可能,但是小目标小啊,所以整幅图像,mostly 都是 background

本文算法内容

as shown in Fig. 2,本文算法由三部分构成:

  • multiscale inputs generation
  • salient region detection via deep reconstruction residual
  • post- processing.

我的问题

  • 为什么 autoencoder encode 和 decode 用的都是 sigmoid 函数?decode 不应该是 encode 的反变换吗?

  • 具体 SDAE 是怎么弄的,还是要仔细看一下

@article{Han2014BackgroundPB,
  title={Background prior-based salient object detection via deep reconstruction residual},
  author={Han, Junwei and Zhang, Dingwen and Hu, Xintao and Guo, Lei and Ren, Jinchang and Wu, Feng},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  volume={25},
  number={8},
  pages={1309--1321},
  year={2014},
  publisher={IEEE}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment