Remember those FMV games from the ’90s — the ones that blended prerecorded clips with animated sprites and 3D models? Facebook is bringing them back in style, and improved tenfold. In a newly published preprint paper on Arxiv.org (“Vid2Game: Controllable Characters Extracted from Real-World Videos“), scientists at Facebook AI Research describe a system capable of extracting controllable characters from real-world videos.
“Our method extracts a character from an uncontrolled video and enables us to control its motion,” the paper’s coauthors explain. “The model generates novel image sequences of that person … [and the] generated video can have an arbitrary background, and effectively capture both the dynamics and appearance of the person.”
According to the research, which will be presented at the VR filmmaking conference SIGGRAPH in August, the team ran a number of tests comparing its new algorithm to existing means of manipulating lifelike videos and images, many of which have been at least partially developed by Facebook and Google.
The team’s approach relies on two neural networks, or layers of mathematical functions modeled after biological neurons: Pose2Pose, a framework that maps a current pose and a single-instance control signal to the next post, and Pose2Frame, which plops the current pose and new pose (along with a given background) on an output frame. The reanimation can be controlled by any “low-dimensional” signal, such as one from a joystick or keyboard, and the researchers say that the system is robust enough to position extracted characters in dynamic backgrounds.
Oculus VR is a company that launched a Kickstarter project to release virtual reality goggles in the 2010s. Their goggles brought a lot of interest to virtual reality after many years of not a lot of interest by industry or consumers.
So how’s it work? First, an input video containing one or more characters is fed into a Pose2Pose network trained for a specific domain (e.g., dancing), which isolates them (plus estimated foreground spatial masks) and their motion — the latter of which is taken as a trajectory of their centers of mass. (The masks are used to determine which regions of the background are replaced by synthesized image information.) Using these and combined pose data, Pose2Frame separates between character-dependent changes in the scene like shadows, held items, and reflections and those that are character-independent, and returns a pair of outputs that are linearly blended with any desired background.
Researchers at Carnegie Mellon University have devised a way to automatically transform the content of one video into the style of another, making it possible to transfer the facial expressions of comedian John Oliver to those of a cartoon character, or to make a daffodil bloom in much the same way a hibiscus would.
To train the AI system, the researchers sourced three videos, each between five and eight minutes long, of a tennis player outdoors, a person swinging a sword indoors, and a person walking. Compared with a neural network model fed three-minute video of a dancer, they report that their approach managed to successfully field dynamic elements, such as other people and differences in camera angle, in addition to variations in character clothing and camera angle.
“Each network addresses a computational problem not previously fully met, together paving the way for the generation of video games with realistic graphics,” they wrote. “In addition, controllable characters extracted from YouTube-like videos can find their place in the virtual worlds and augmented realities.”
The VR Bandwagon. With hundreds upon thousands of people wanting to get their hands on a VR device that was still in development, huge companies, including giants like HTC and Steam, Google, Lionsgate and Samsung, among others, started heavily investing in virtual reality technologies and experiences.
Facebook isn’t the only company investigating AI systems that might aid in game design. Startup Promethean AI employs machine learning to help human artists create art for video games, and Nvidia researchers recently demonstrated a generative model that can create virtual environments using video snippets. Machine learning has also been used to rescue old game textures in retro titles like Final Fantasy VII and The Legend of Zelda: Twilight Princess, and to generate thousands of levels in games like Doom from scratch.