Fast Registration of Photorealistic Avatars for VR Facial Animation

Chaitanya Patel1      Shaojie Bai2      Te-Li Wang2      Jason Saragih2      Shih-En Wei2
1Stanford University           2Meta Reality Labs

ECCV 2024

Paper | Code | Ava-256 Dataset




On consumer VR headsets, oblique mouth views and a large image domain gap hinder high quality face registration. As shown above, subtle lip shapes and jaw movements are often hardly observed. Under this setting, our method is capable of efficiently and accurately registering facial expression and head pose of unseen identities in VR with their photorealisitic avatars.



Abstract

Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime models are significantly degraded. Online registration is also challenging due to oblique camera views and differences in modality. In this work, we first show that the domain gap between the avatar and headset-camera images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we develop a system design that decouples the problem into two parts: 1) an iterative refinement module that takes in-domain inputs, and 2) a generic avatar-guided image-to-image style transfer module that is conditioned on current estimation of expression and head pose. These two modules reinforce each other, as image style transfer becomes easier when close-to-ground-truth examples are shown, and better domain-gap removal helps registration. Our system produces high-quality results efficiently, obviating the need for costly offline registration to generate personalized labels. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over direct regression methods as well as offline registration.



Results

Iterative Refinement


Results on Videos of Unseen Identities [Use fullscreen mode]



Additional Results

Result 1 (png, 17MB)
Result 2 (png, 15MB)
Result 3 (png, 15MB)

Citation