Revolutionizing the World of 3D Avatars: ByteDance AI Research Unveils Self-Supervised Learning Framework

TL;DR:

  • ByteDance AI Research presents an innovative self-supervised learning framework for creating high-quality stylized 3D avatars from a single selfie.
  • The algorithm predicts an avatar vector with continuous and discrete parameters to generate lifelike avatars using predefined 3D assets.
  • Self-supervised learning reduces the need for extensive manual annotations, making the process more cost-effective.
  • The architecture includes Avatar Vector Conversion, Self-supervised Avatar Parameterization, and Portrait Stylization stages to bridge domain gaps and maintain user identification.
  • A unique search technique enhances avatar quality and preference research validates its effectiveness.

Main AI News:

In the fast-paced digital landscape, where socializing, shopping, and gaming have become an integral part of modern life, the demand for visually appealing and animated 3D avatars is soaring. Recognizing the significance of avatars that resonate with users, renowned avatar systems like Zepeto1 and ReadyPlayer2 have embraced cartoonized and stylized looks, making them engaging and user-friendly. Nonetheless, manually choosing and modifying avatars can be a laborious task, particularly for newcomers, as it requires meticulous adjustments across multiple graphic elements. But here’s where ByteDance AI Research’s groundbreaking innovation comes into play.

The research team at ByteDance AI has recently introduced a revolutionary self-supervised learning framework that harnesses the power of AI to create high-quality stylized 3D avatars, taking inspiration from just a single selfie. This cutting-edge algorithm predicts an avatar vector, a complete configuration for a graphics engine to generate and render lifelike 3D avatars using predefined 3D assets. The avatar vector comprises parameters specific to the assets, which can either be continuous, such as head length, or discrete, like hair types.

Traditionally, annotating a vast dataset of selfie images and training a model via supervised learning was the standard approach. However, this method required an extensive amount of annotations to account for the wide array of assets, making it impractical and costly. ByteDance AI’s solution leverages self-supervised learning, using a differentiable imitator to replicate the renderings of the graphics engine. By utilizing identification and semantic segmentation losses, the system automatically matches the stylized avatar image with the input selfie, significantly reducing the need for manual annotations and the associated costs.

The innovative architecture of the system comprises three key steps: Avatar Vector Conversion, Self-supervised Avatar Parameterization, and Portrait Stylization. Throughout these stages, the system ensures that identification information, such as hairstyle and skin tone, is retained, while gradually closing the domain gap for a seamless transition. The Portrait Stylization stage concentrates on real-to-stylized visual appearance crossover while preserving the integrity of the picture space. A modified version of AgileGAN is utilized to maintain expression homogeneity while preserving user identification, leading to superior outcomes compared to traditional stylization techniques.

The Self-Supervised Avatar Parameterization step focuses on transitioning from pixel-based pictures to vector-based avatars. ByteDance AI’s researchers found that strict parameter discreteness enforcement hindered optimization from achieving convergence. To address this, they introduced a lenient formulation called a relaxed avatar vector, encoding discrete parameters as continuous one-hot vectors. This approach allowed them to teach an imitator to behave like the non-differentiable engine, ensuring differentiability during training. The Avatar Vector Conversion step converts all discrete parameters to one-hot vectors, effectively crossing the domain from the relaxed avatar vector space to the strict avatar vector space. The final avatars are constructed and rendered by the graphics engine using the strict avatar vector. Notably, the team employed a unique search technique that yielded superior results compared to direct quantization.

To validate the effectiveness of their method, ByteDance AI conducted human preference research, comparing their outcomes with baseline approaches like F2P and manual production. The results of this evaluation indicated that their approach successfully preserved personal uniqueness, obtaining scores significantly higher than those achieved by conventional techniques and comparable to the quality of hand-crafted avatars.

In addition to their technical achievements, the research team highlights the following contributions:

  • A novel self-supervised learning framework for generating high-quality stylized 3D avatars, incorporating both continuous and discrete parameters.
  • A groundbreaking method to bridge the substantial style domain gap in stylized 3D avatar creation through portrait stylization.
  • A cascaded relaxation and search pipeline that effectively addresses the convergence challenge in discrete avatar parameter optimization.

Conclusion:

ByteDance AI’s breakthrough in self-supervised learning for stylized 3D avatars holds significant implications for the market. The technology streamlines the avatar creation process, reducing the time and costs associated with manual annotations. With the ability to generate personalized, attractive, and engaging avatars from a single selfie, businesses in social media, gaming, virtual reality, and e-commerce stand to benefit. The market can now offer users a more immersive and enjoyable experience, enhancing customer engagement and loyalty. Additionally, the potential for automated avatar creation opens up opportunities for efficient avatar customization in various applications, paving the way for new revenue streams and improved user satisfaction. Overall, ByteDance AI’s cutting-edge solution has the potential to revolutionize the digital avatar market and redefine the way users interact with the digital world.

Source