Reality as code - How close are we at generating virtual humans and their environment - Devoxx

This talk is about my obsession with not wanting to present live. When COVID hit and conferences went virtual, I started exploring how to create a virtual version of myself that could deliver presentations. What began as a green screen experiment spiraled into a multi-year journey through virtual production, 3D scanning, motion capture, voice synthesis, and generative AI.

The green screen rabbit hole goes deep. Five lights, a DSLR camera for proper chroma subsampling, understanding the inverse square law of light decay, and even a specialized app to verify even lighting. The movie industry mantra “we’ll fix it in post” applied, with ML-based rotoscoping eventually making background removal trivial compared to the manual approach. Virtual production – the technique behind The Mandalorian using game engines and LED walls – inspired my home setup: a 360 camera for environment scanning, VR base station trackers for camera position, and Unreal Engine for real-time compositing.

Building a virtual body required navigating photogrammetry (hard), LiDAR phone scanning (easier), and eventually Unreal’s MetaHuman system that generates realistic 3D humans from a single photo and lets you mix faces. Motion capture meant VR trackers, inverse kinematics models for realistic joint movement, and eventually computer vision eliminating the need for physical trackers. Face tracking uses the iPhone’s existing face recognition mesh. Voice synthesis converts text to spectrograms to waveforms – but the one-to-many problem of intonation and emotion remains unsolved without manual annotation.

The generative AI explosion accelerated everything. DALL-E, Midjourney, and crucially Stable Diffusion’s open source release brought text-to-image generation to everyone. In-painting, out-painting, multi-view generation, and storyboard creation emerged. Text-to-video followed with Google, Meta, and open alternatives. Motion generation, background replacement, 3D model creation from 2D images, and entire world building from text prompts became real products. Deep fakes showed both the creative potential (live translation, therapeutic goodbye conversations) and the risks (fake LinkedIn profiles, political manipulation).

The tooling ecosystem around prompt engineering – search engines for prompts, marketplaces, auto-completion, negative prompts, reverse engineering – mirrors what we saw with configuration management in DevOps. Programming is becoming a conversation with your computer, expressing intent rather than writing explicit instructions.

Watch on YouTube — available on the jedi4ever channel

This summary was generated using AI based on the auto-generated transcript.

Get notified of new posts