Motion Interchange Patterns for Action Recognition in Unconstrained Videos

Published in European Conference on Computer Vision (ECCV), Firenze , Italy, 2012

Recommended citation: Orit Kliper-Gross, Yaron Gurovich, Tal Hassner.Motion Interchange Patterns for Action Recognition in Unconstrained Videos. European Conference on Computer Vision (ECCV), Firenze , Italy, 2012.

Abstract

Action Recognition in videos is an active research field that is fueled by an acute need, spanning several application domains. Still, existing systems fall short of the applications’ needs in real-world scenarios, where the quality of the video is less than optimal and the viewpoint is uncontrolled and often not static. In this paper, we consider the key elements of motion encoding and focus on capturing local changes in motion directions. In addition, we decouple image edges from motion edges using a suppression mechanism, and compensate for global camera motion by using an especially fitted registration scheme. Combined with a standard bag-of-words technique, our methods achieves state-of-the-art performance in the most recent and challenging benchmarks.

Download paper here

BibTeX

The Action Similarity Labeling Challenge (ASLAN)

(a)(b)

The Motion Interchange Patterns (MIP) representation. (a) Our encoding is based on comparing two SSD scores computed between three patches from three consecutive frames. Relative to the location of the patch in the current frame, the location of the patch in the previous (next) frame is said to be in direction i (j); The angle between directions i and j is denoted alpha. (b) Illustrating the different motion patterns captured by different i and j values. Blue arrows represent motion from a patch in position i in the previous frame; red for the motion to the patch j in the next frame. Shaded diagonal strips indicate same alpha values.

Results

Below are some results of the MIP representation, applied to recent Action Recognition benchmarks. For additional results and more details, please see the paper.

SystemNo CSMLWith CSML
AccuracyAUCAccuracyAUC
LTP [Yeffet & Wolf 09]55.45 +- 0.6%57.258.50 +- 0.7%62.4
HOG [Laptev et al. 08]58.55 +- 0.8%61.5960.15 +- 0.6%64.2
HOF [Laptev et al. 08]56.82 +- 0.6%58.5658.62 +- 1.0%61.8
HNF [Laptev et al. 08]58.67 +- 0.9%62.1657.20 +- 0.8%60.5
MIP single channel alpha=058.27 +- 0.6%61.761.52 +- 0.8%66.5
MIP single best channel alpha=161.45 +- 0.8%66.163.55 +- 0.8%69.0
MIP w/o suppression61.67 +- 0.9%66.463.17 +- 1.1%68.4
MIP w/o motion compensation62.27 +- 0.8%66.463.57 +- 1.0%69.5
MIP w/o both60.43 +- 1.0%64.863.08 +- 0.9%68.2
MIP on stabilized clips59.73 +- 0.77%62.962.30 +- 0.77%66.4
MIP62.23 +- 0.8%67.564.62 +- 0.8%70.4
HOG+HOF+HNF60.88 +- 0.8%65.363.12 +- 0.9%68.0
HOG+HOF+HNF with OSSML
[Kliper-Gross et al. 11]
62.52 +- 0.8%66.664.25 +- 0.7%69.1
MIP+HOG+HOF+HNF64.27 +- 1.0%69.265.45 +- 0.8%71.92

Table 1. Comparison to previous results on the ASLAN benchmark. The average accuracy and standard error on the ASLAN benchmark is given for a list of methods (see text for details). All HOG, HOF, and HNF results are taken from [Kliper-Gross et al. 11, Kliper-Gross et al. 12].

SystemOriginal clipsStabilized clips
HOG/HOF [Laptev et al. 08]20.44%21.96%
C2 [Jhuang et al. 07]22.83%23.18%
Action Bank [Sadanand & Corso 12]26.90%N/A
MIP29.17%N/A

Table 2. Comparison to previous results on the HMDB51 database. Since our method contains a motion compensation component, we tested our method on the more challenging unstabilized videos. Our method significantly outperforms the best results obtained by previous work.

SystemSplits clipsLOgO
HOG/HOF [Laptev et al. 08]47.9%N/A
Action Bank [Sadanand & Corso 12]57.9%N/A
MIP68.51%72.68%

Table 3. Comparison to previous results on the UCF50 database. Our method significantly outperforms all reported methods.

MIP video descriptor for action recognition - Code

MATLAB code for computing the Motion Interchange Patterns (MIP) video descriptor is now available for download here. Please see the ReadMe.txt for information on how to install and run the code.

If you use this code in your own work, please cite our paper.


Copyright and disclaimer
Copyright 2012, Orit Kliper-Gross, Yaron Gurovich, Tal Hassner, and Lior Wolf