Microsoft's New AI Can Make Photos Talk: Innovations and Risks Explored
Microsoft has unveiled an innovative artificial intelligence model designed to animate photographs using generated audio, producing stunning yet potentially risky results.
The development of advanced machine learning technologies has remarkably expanded the capabilities of artificial intelligence. For example, Microsoft's latest AI model can bring static images of people to life.
With this model, named Microsoft VASA-1, an image can suddenly start speaking, animating human portraits to synchronize with sound recordings. This technology impressively transforms ordinary photos into realistic animations of people talking or singing.
Turning a simple photo into a realistic animation
Microsoft conducted experiments using non-existent, generated portraits created with StyleGAN2 and DALL-E 3. This feature works effectively on realistic photos of people and cartoon avatars, with experiments even including the animation of the famous Mona Lisa.
The VASA-1 model goes beyond synchronizing lip movements; it captures the richness of facial expressions and natural head movements, enhancing the realism of the animations.
The model supports the creation of animations in a resolution of 512 x 512 pixels at a frame rate of 45 frames per second in offline mode. It can also produce real-time recordings at up to 40 frames per second with a minimal delay of just 170 ms on desktop computers equipped with an NVIDIA GeForce RTX 4090 graphics card.
The potential risks of new technology
Although Microsoft's research primarily focuses on generating animations for virtual portraits rather than creating misleading content, the company acknowledges the potential misuse of this technology for impersonation.
Microsoft has publicly stated its opposition to using the VASA-1 model for deceptive purposes or creating harmful content with the images of real people. Consequently, the company has decided not to release the demonstration version, API, or complete product to the public. Microsoft remains interested in leveraging this technology to enhance the detection of counterfeit content.