Bilinear and multi-linear models have been successful in decomposing static image ensembles into perceptually orthogonal sources of variations, e.g., separation of style and content. If we consider the appearance of human motion such as gait, facial expression and gesturing, most of such activities result in nonlinear manifolds in the image space. The question that we address in this paper is how to separate style and content on manifolds representing dynamic objects. In this paper we learn a decomposable generative model that explicitly decomposes the intrinsic body configuration (content) as a function of time from the appearance (style) of the person performing the action as time-invariant parameter. The framework we present in this paper is based on decomposing the style parameters in the space of nonlinear functions which map between a learned unified nonlinear embedding of multiple content manifolds and the visual input space.
|Original language||English (US)|
|Journal||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|State||Published - 2004|
|Event||Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004 - Washington, DC, United States|
Duration: Jun 27 2004 → Jul 2 2004
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition