No Priors: Artificial Intelligence | Technology | Startups

The Timeline for Realistic 4-D: Devi Parikh from Meta on Research Hurdles for Generative AI in Video and Multimodality

Thu Jul 20 2023

Generative AIVideo GenerationAudio GenerationAI's Impact on CreativityTime Management

Description

The episode covers Debbie Perik's background in generative AI, the Make a Video project for video generation, challenges and potential of video generation, controllability and multi-modality in video generation, advancements in audio generation, AI's impact on creativity, future applications, importance of time management, and pursuing personal interests.

Insights

Advancements in Video Generation

Video generation has been slower compared to language models or image models due to infrastructure challenges, finding the right representations for videos, hierarchical architecture, and data limitations. A curriculum approach may be valuable for training video generation models. Advances in video understanding may have a significant impact on robotics and embodied agents.

Controllability and Multi-Modality in Video Generation

Controllability is important for generative models to be tools for creative expression. Multi-modal inputs can provide more control in video generation. Predictability and iterative editing mechanisms are desired for better control. Control capabilities will likely progress after core capabilities in text-to-video generation are achieved.

Advancements in Audio Generation and AI's Impact on Creativity

Sound effects and music can make audio content more expressive and delightful, but they are under-invested. The first application areas for AI-generated media could be in improving communication and AI agents. AI can benefit both established artists and individuals without artistic training by enabling more expressive creation.

AI Models as Tools for Creativity and Future Applications

Using multiple modalities as input allows for more control and creativity in AI models. Some artists view AI models as tools, while others see them as collaborators in the creative process. Areas of image, audio, and video generation that are underexplored include control and multi-modality.

Chapters

Debbie Perik's Background and Research Focus
Make a Video Project and Advancements in Video Generation
Challenges and Potential of Video Generation
Controllability and Multi-Modality in Video Generation
Advancements in Audio Generation and AI's Impact on Creativity
AI Models as Tools for Creativity and Future Applications
Importance of Time Management and Pursuing Personal Interests

Summary

Transcript

Debbie Perik's Background and Research Focus

00:06 - 07:05

Debbie Perik is a research director in generative AI at meta and an associate professor at Georgia Tech.
She got started in computer vision during her undergrad at Roman University.
Debbie's interest in computer vision grew from wanting to see the outputs of algorithms and understand their workings.
Her research path has been driven by her desire to enable meaningful interactions between humans and machines.
Debbie has transitioned from non-visual modalities to visual modalities, and then into natural language processing as a way of interaction.
She is now focused on AI for creativity, including generative modeling using transformer-based approaches and diffusion models.
Debbie spent a year at Facebook AI Research (now Meta) during her transition from Virginia Tech to Georgia Tech, which turned into a long-term collaboration.
She now works in Meta's new generative AI group, focusing on large language models, image generation, video generation, and generating 3D content.

Make a Video Project and Advancements in Video Generation

06:40 - 13:44

New organization created to explore large language models, image generation, video generation, and more
The goal is to enable users to create content in addition to consuming it
Make a Video project allows users to generate videos with text prompts
The approach leverages progress in image generation models and separates appearance and language from motion
Advantages include less learning for the model, diversity of visual concepts through images, and no need for video-text paired data
Make a Video starts with independent images and learns to make them temporally coherent as a video
Interpreting motion can be based on what has been seen in images and videos before
Future aspects of video generation include longer and more complex videos with memory capabilities

Challenges and Potential of Video Generation

13:14 - 19:49

The complexity of videos needs to be improved, with longer and more consistent scenes.
Progress in video generation has been slower compared to language models or image models.
Bottlenecks in video generation include infrastructure challenges, finding the right representations for videos, hierarchical architecture, and data limitations.
A curriculum approach may be valuable for training video generation models.
Video has always been a challenging aspect of technology advancements, including web streaming and computer vision.
Advances in video understanding may have a significant impact on robotics and embodied agents.

Controllability and Multi-Modality in Video Generation

19:23 - 26:11

Video understanding is relevant for embodied agents and robotics.
Robots' visual signals are a consequence of their actions.
Controllability is important for generative models to be tools for creative expression.
Text prompting is one way to control generative models, but it lacks direct form of control.
Multi-modal inputs can provide more control in video generation.
Predictability and iterative editing mechanisms are desired for better control in video generation.
Control capabilities will likely progress after core capabilities in text-to-video generation are achieved.
Editing existing videos will also see advancements in the near future.
State of the art in text-to-audio systems works reasonably well one in five times.
Audio and music add expressiveness and delight to content but are under-invested.

Advancements in Audio Generation and AI's Impact on Creativity

25:56 - 32:25

Sound effects and music can make audio content more expressive and delightful, but they are under-invested.
There are large sound effect libraries available, but the state of the art hasn't caught up with generating new sound effects.
Audio allows for complex sequences of sounds and superimposition, unlike video.
The first application areas for AI-generated media could be in improving communication and AI agents.
Near-term applications include generating animated GIFs, mid-stream video editing, and marketing.
Unexpected use cases may emerge as the technology develops.
Social media platforms like Instagram and TikTok have shown that people are interested in creating high-quality imagery using generative technologies.
AI can benefit both established artists and individuals without artistic training by enabling more expressive creation.
Some artists already use AI as their primary tool for expression.
The speaker's personal experience as an artist influences their excitement about this technology.

AI Models as Tools for Creativity and Future Applications

31:58 - 38:52

The speaker is enthusiastic about new technology and enjoys trying out new models.
Using multiple modalities as input allows for more control and creativity.
Some artists view AI models as tools, while others see them as collaborators in the creative process.
Areas of image, audio, and video generation that are underexplored include control and multi-modality.
The speaker attended CBPR and found the Scholars and Big Models workshop interesting.
The impact of these technologies on social media will be significant, enhancing creative expression and communication.
AI agents on social media may change how people connect with each other.
Social expression modalities like lenses already have an impact on social media engagement.
Time management is important for productivity in AI research, with a focus on scheduling tasks.

Importance of Time Management and Pursuing Personal Interests

38:23 - 39:42

Time management is important and writing everything down on your calendar helps plan your time effectively.
Don't self-select, go for what you want and let the world decide if you're a good fit.