Imagen Video: Fine-Definition AI Text-to-Video Generator

What is Imagen Video?

Imagen Video is built on Google’s image generation system Imagen, which is a text-conditional video generation system based on a cascade of video diffusion models. Users only need to enter a simple text description to receive a matching short video. In addition, Imagen Video also allows users to add various art styles for specific interesting display, interaction, or other creative uses.

Price: Free
Tag: Text-to-Video
Release Time: 2022
Developers: Google

Share Imagen Video


Features of Imagen Video

  • The output content is highly consistent with the text description
  • Maintains the powerful features of the original Imagen system, such as the ability to spell text accurately
  • Produces high-fidelity video: 1280 x 768 pixel resolution at 24 frames per second
  • A high degree of controllability and world knowledge, including a variety of video and text animations capable of generating a variety of art styles and 3D object understanding

How to use Imagen Video?

At present, Imagen Video has not been publicly used. If you want to know more about it, you can click on the “Research Paper” on the homepage of Imagen Video official website to learn more.

Imagen AI Paper

Click here to view the related paper.

Imagen Video Technical Principle

The Imagen Video model consists of a frozen T5 text encoder, a base video diffusion model, and interleaved spatial and temporal super-resolution diffusion models. You can also find more information in the related paper.

Imagen Video Pricing:



Can we use Imagen Video now?

Not currently. The Google development team said: “While our internal tests have shown that most explicit and violent content can be filtered out, there are still social biases and stereotypes that are difficult to detect and filter. We decided not to release the Imagen Video model or its source code until these issues are mitigated.”

Are there any downsides to Imagen Video?

The length of the video is limited: currently the maximum output is about 5 seconds. But Google also gave a solution. Phenaki can show a video about two minutes long, which is also one of the Google projects. And Google plans to combine the image quality of Imagen Video with the coherence and video length of Phenaki in the next step.

error: Content is protected !!