Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created April 10, 2017 11:55
Show Gist options
  • Save shagunsodhani/828d8de0034a350d97738bbedadc9373 to your computer and use it in GitHub Desktop.
Save shagunsodhani/828d8de0034a350d97738bbedadc9373 to your computer and use it in GitHub Desktop.
Notes for paper "Seeing the Arrow of Time"

Seeing the Arrow of Time

Introduction

  • Given a video, can a machine learning system detect the arrow of time and distinguish whether the video is running forward or backwards.
  • Link to the paper

Datasets

  • Youtube Dataset

    • 180 short videos (6-10 seconds) are manually selected using keywords like "dance", "stream trains" etc.
    • 155 forward videos and 25 reverse videos - highly imbalanced dataset!!
  • Tennis Ball Dataset

    • Recorded 13 HD videos of tennis ball rolling and colliding on the floor.

Methods

Baseline

  • Spatial-temporal Oriented Energy (SOE) as off the shelf feature extractor.
  • Split videos into 2x2 spatial subregions and concatenate SOE features to obtain final features.
  • These final features are fed to linear SVM classifier and the performance varies from 48% to 60%.
  • One reason for poor performance could be the difficulty in generalising motion over different sub-regions.

Statistical Flow Method

  • Idea is to capture local regions of motion in a video to examine what type of motion is a good feature for detecting the arrow of time.

  • Flow Words are object-motion descriptors based on SIFTlike descriptors and capture motion occurring in small patches of videos.

  • These descriptors are motion quantized to obtain a discrete set of flow words.

  • The entire video sequence can be encoded as a bag of flow-word descriptors which becomes the features for the learning system.

    Training

    • For each video, 4 descriptor histograms were extracted:
      • (A): the native direction of the video
      • (B): this video mirrored in the left-right direction
      • (C): the original video time-flipped
      • (D): the time-flipped left-right-mirrored version
    • Train an SVM using the 4 histograms and combine their scores as A + B - C - D expecting a positive result for forwarding clips and negative for backwards clips.

    Result

    • Performance varies from 75% to 90%

Motion-Causation Method

  • Idea is to capture motion causing other motions as it is more common for one motion to cause multiple motions instead of multiple motions collapsing into one motion.

  • The system looks at the regions in the video from frame to frame with the expectation that, in the forwards-time direction, there would be more occurrences of one region splitting in two than of two regions joining to become one.

    Result

    • Performance varies from 70% to 73%.
    • Though it underperforms as compared to the flow-word method, it can complement that method as Motion-causation considers the spatial location of motions while flow-word method considers motion in each frame separately.

AR Method

  • Idea is to model the problem as that of inferring casual direction in cause-effect models.

  • The assumption is that some image motions will be modelled as AR models with additive non-Gaussian noise.

  • In such a scenario, noise added at some point in time, is independent of the past values of the time series but not of future values.

  • This allows independence tests to be performed for determining the direction of time.

    Result

    • There is a tradeoff between the accuracy achieved by the system versus the number of videos it can classify (depending on the value of delta for p-test).

Comment

  • The paper poses a new and interesting research problem but uses a very small dataset which makes the results inconclusive in my opinion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment