Conditional models for contextual human motion recognition

Cristian Sminchisescu, Atul Kanaujia, Dimitri Metaxas

Research output: Contribution to journalArticle

152 Citations (Scopus)

Abstract

We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.

Original languageEnglish (US)
Pages (from-to)210-220
Number of pages11
JournalComputer Vision and Image Understanding
Volume104
Issue number2-3 SPEC. ISS.
DOIs
StatePublished - Nov 1 2006

Fingerprint

Hidden Markov models
Convex optimization
Dynamic programming
Labels
Entropy
Experiments

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering
  • Computer Vision and Pattern Recognition

Keywords

  • Conditional models
  • Discriminative models
  • Feature selection
  • Hidden Markov models
  • Human motion recognition
  • Markov random fields
  • Multiclass logistic regression
  • Optimization

Cite this

Sminchisescu, Cristian ; Kanaujia, Atul ; Metaxas, Dimitri. / Conditional models for contextual human motion recognition. In: Computer Vision and Image Understanding. 2006 ; Vol. 104, No. 2-3 SPEC. ISS. pp. 210-220.
@article{91b8fb7266d04d1b8edcf4860dc9e651,
title = "Conditional models for contextual human motion recognition",
abstract = "We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.",
keywords = "Conditional models, Discriminative models, Feature selection, Hidden Markov models, Human motion recognition, Markov random fields, Multiclass logistic regression, Optimization",
author = "Cristian Sminchisescu and Atul Kanaujia and Dimitri Metaxas",
year = "2006",
month = "11",
day = "1",
doi = "https://doi.org/10.1016/j.cviu.2006.07.014",
language = "English (US)",
volume = "104",
pages = "210--220",
journal = "Computer Vision and Image Understanding",
issn = "1077-3142",
publisher = "Academic Press Inc.",
number = "2-3 SPEC. ISS.",

}

Conditional models for contextual human motion recognition. / Sminchisescu, Cristian; Kanaujia, Atul; Metaxas, Dimitri.

In: Computer Vision and Image Understanding, Vol. 104, No. 2-3 SPEC. ISS., 01.11.2006, p. 210-220.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Conditional models for contextual human motion recognition

AU - Sminchisescu, Cristian

AU - Kanaujia, Atul

AU - Metaxas, Dimitri

PY - 2006/11/1

Y1 - 2006/11/1

N2 - We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.

AB - We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.

KW - Conditional models

KW - Discriminative models

KW - Feature selection

KW - Hidden Markov models

KW - Human motion recognition

KW - Markov random fields

KW - Multiclass logistic regression

KW - Optimization

UR - http://www.scopus.com/inward/record.url?scp=33749993686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749993686&partnerID=8YFLogxK

U2 - https://doi.org/10.1016/j.cviu.2006.07.014

DO - https://doi.org/10.1016/j.cviu.2006.07.014

M3 - Article

VL - 104

SP - 210

EP - 220

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

SN - 1077-3142

IS - 2-3 SPEC. ISS.

ER -