Kabir's Tech Dives

šŸŽ¬ One-Minute Video Generation via Test-Time Transformer Training


Listen Later

Researchers introduced Test-Time Training (TTT) layers to enhance the ability of pre-trained Diffusion Transformers to generate longer, more complex videos from text. These novel layers, inspired by meta-learning, allow the model's hidden states to adapt during the video generation process. To validate their approach, they created a dataset of annotated Tom and Jerry cartoons for training and evaluation. Their model, incorporating TTT layers, outperformed existing methods in generating coherent, minute-long videos with multi-scene stories and dynamic motion, as judged by human evaluators. While promising, the generated videos still exhibit some artifacts, and the method's efficiency could be improved. The study demonstrates a step forward in creating longer, story-driven videos from textual descriptions.

Send us a text

Support the show


Podcast:
https://kabir.buzzsprout.com


YouTube:
https://www.youtube.com/@kabirtechdives

Please subscribe and share.

...more
View all episodesView all episodes
Download on the App Store

Kabir's Tech DivesBy Kabir

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

33 ratings


More shows like Kabir's Tech Dives

View all
Hard Fork by The New York Times

Hard Fork

5,420 Listeners