A Dual-Source Approach for 3D Human Pose Estimation from Single Images

Umar Iqbal, Andreas Doering, Hashim Yasin, Björn Krüger, Andreas Weber und Juergen Gall
In: Computer Vision and Image Understanding (2018), 172(37-49)
 

Abstract

In this work we address the challenging problem of 3D human pose estimation from single images. Recent approaches learn deep neural networks to regress 3D pose directly from images. One major challenge for such methods, however, is the collection of large amounts of training data. Particularly, collecting a large number of unconstrained images that are annotated with accurate 3D poses is impractical. We therefore propose to use two independent training sources. The first source consists of accurate 3D motion capture data, and the second source consists of unconstrained images with annotated 2D poses. To incorporate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient 3D pose retrieval. To this end, we first convert the motion capture data into a normalized 2D pose space, and separately learn a 2D pose estimation model from the image data. During inference, we estimate the 2D pose and efficiently retrieve the nearest 3D poses. We then jointly estimate a mapping from the 3D pose space to the image and reconstruct the 3D pose. We provide a comprehensive evaluation of the proposed method and experimentally demonstrate the effectiveness of our approach, even when the skeleton structures of the two sources differ substantially.

Stichwörter: 3D human pose estimation, 3d reconstruction, Articulated pose estimation, Motion capture

Bilder

Bibtex

@ARTICLE{iqbal-2018,
    author = {Iqbal, Umar and Doering, Andreas and Yasin, Hashim and Kr{\"u}ger, Bj{\"o}rn and Weber, Andreas and Gall,
              Juergen},
     pages = {37--49},
     title = {A Dual-Source Approach for 3D Human Pose Estimation from Single Images},
   journal = {Computer Vision and Image Understanding},
    volume = {172},
      year = {2018},
  keywords = {3D human pose estimation, 3d reconstruction, Articulated pose estimation, Motion capture},
  abstract = {In this work we address the challenging problem of 3D human pose estimation from single images.
              Recent approaches learn deep neural networks to regress 3D pose directly from images. One major
              challenge for such methods, however, is the collection of large amounts of training data.
              Particularly, collecting a large number of unconstrained images that are annotated with accurate 3D
              poses is impractical. We therefore propose to use two independent training sources. The first source
              consists of accurate 3D motion capture data, and the second source consists of unconstrained images
              with annotated 2D poses. To incorporate both sources, we propose a dual-source approach that
              combines 2D pose estimation with efficient 3D pose retrieval. To this end, we first convert the
              motion capture data into a normalized 2D pose space, and separately learn a 2D pose estimation model
              from the image data. During inference, we estimate the 2D pose and efficiently retrieve the nearest
              3D poses. We then jointly estimate a mapping from the 3D pose space to the image and reconstruct the
              3D pose. We provide a comprehensive evaluation of the proposed method and experimentally demonstrate
              the effectiveness of our approach, even when the skeleton structures of the two sources differ
              substantially.},
      issn = {1077-3142},
       url = {http://www.sciencedirect.com/science/article/pii/S1077314218300511},
       doi = {https://doi.org/10.1016/j.cviu.2018.03.007}
}