What is Gemini Omni Flash?
Gemini Omni Flash is Google DeepMind's first model in the Omni family, announced on May 19, 2026 at Google I/O. It generates short video clips with synchronized audio from any combination of inputs: text descriptions, still images, audio files, or existing video clips. Unlike earlier text-to-video tools, Omni Flash processes all these input types in a single forward pass through its transformer architecture, then lets you refine the output through conversation.
The model is available through the Gemini app, YouTube Shorts, YouTube Create, and Google Flow. Google positions it as the fastest path from concept to posted video content, particularly for creators who already live inside the Google ecosystem. A developer API has been announced but isn't publicly available yet.
What makes Omni Flash different from Google's own Veo 3.1 or competitors like Sora 2 is the editing loop. You don't regenerate from scratch each time. You tell it "change the lighting" or "add a dog in the background," and it modifies the existing clip while preserving everything else. That conversational workflow cuts the iteration cost significantly.
Key Features and Capabilities
Multimodal Input Processing
Most video generators accept text prompts only. Omni Flash takes text, images, audio, and video simultaneously as a unified scene description. You can feed it a photo of a product, a voiceover track, and a text instruction like "animate this product spinning on a white table with the voiceover playing," and it produces a coherent clip combining all inputs.
This isn't stitching separate outputs together. The model reasons across modalities in one pass, which means the audio timing matches the visual motion, and image elements maintain their identity throughout the clip.
Conversational Video Editing
This is the headline feature. After generating a clip, you can modify it through follow-up messages:
- "Make the background a sunset beach"
- "Slow down the camera pan"
- "Change the art style to watercolor"
- "Add a second character on the right"

Each instruction builds on the previous state. The model preserves what you haven't asked to change, so you're not rolling the dice on a completely new generation each time. For anyone who's burned through credits regenerating entire clips to fix one detail, this is the practical improvement that matters.
Synchronized Audio Generation
Omni Flash generates audio natively alongside the video. It's not a post-processing step bolted on after the visual is done. The audio is synchronized to the visual content during generation, so footsteps match walking, ambient sounds match the environment, and voiceover timing aligns with on-screen action.
Current limitation: the audio output is voice and ambient sound only. Custom music and sound effects aren't supported yet. You also can't edit or modify speech in generated videos. Google deliberately withheld that capability citing deepfake concerns during election cycles.
Personal Avatar Creation
You can create a persistent digital avatar of yourself. The onboarding process requires you to record yourself speaking a sequence of numbers on camera. This serves as a deepfake verification step, confirming you're creating an avatar of yourself rather than someone else.
Once created, your avatar persists across generations. You can insert yourself into scenes, create explainer videos with your likeness, or produce content where your digital self presents information. The model restricts editing arbitrary voices or likenesses in uploaded content as a safety boundary.
Physics and World Understanding
The model demonstrates improved understanding of real-world physics: gravity, liquid behavior, object permanence, and motion dynamics. When you ask it to show a ball bouncing off a table, the trajectory and speed look physically plausible rather than floaty or disconnected from the environment.
This matters for practical content creation. Product demos, explainer animations, and scene compositions look more grounded because objects interact with their environment in expected ways.
SynthID Watermarking
Every video generated by Omni Flash carries an imperceptible SynthID watermark. This is non-optional. You can't turn it off. The watermark is verifiable through the Gemini app, Chrome browser, and Google Search, making it possible to identify AI-generated content after it's been shared or reposted.
How to Use Gemini Omni Flash: Getting Started
Option 1: Gemini App (Easiest)
- Open the Gemini app (requires Google AI Plus subscription at $7.99/month or higher)
- Start a new conversation
- Describe the video you want, or upload an image/video as a starting point
- Wait 60-90 seconds for generation
- Review the clip and send follow-up messages to refine
Option 2: YouTube Shorts (Free)
- Open YouTube on mobile
- Tap the "+" button for creation tools
- Look for Gemini Omni in the creation interface
- Type your prompt directly
- Generated clips go straight into Shorts format
This is the zero-cost entry point. You get access to Omni Flash's generation capabilities without any subscription, though the output is formatted specifically for Shorts (vertical, short-form).
Option 3: Google Flow (For Teams)
Google Flow is the workspace-oriented surface. Credit allocations depend on your subscription tier:
| Tier | Monthly Credits | Approximate Videos |
|---|---|---|
| AI Plus ($7.99) | 200 | ~50 standard clips |
| AI Pro | 1,000 | ~250 clips |
| AI Ultra | 10,000-25,000 | 2,500-6,250 clips |
Option 4: Third-Party Platforms
Platforms like veol.ai provide access to Gemini Omni Flash with additional features: higher resolution output (up to 4K), flexible credit-based pricing starting at $0.15 per video, and a streamlined interface focused specifically on video generation workflows.
Option 5: Developer API (Coming Soon)
Google has confirmed the API will be available through both the Gemini API and Vertex AI, but it hasn't reached general availability yet. No public model ID, rate limits, or migration path from Veo has been officially documented. If you're building production integrations, continue using Veo 3.1 until the Omni API ships.
Gemini Omni Flash vs Sora 2 vs Veo 3.1 vs Kling
Here's how Omni Flash stacks up against the other major AI video generators available in 2026:

| Feature | Gemini Omni Flash | Sora 2 (OpenAI) | Veo 3.1 (Google) | Kling (Kuaishou) |
|---|---|---|---|---|
| Input types | Text + image + audio + video | Text + image | Text + image | Text + image |
| Max clip length | 10 seconds | 15-25 seconds | 8 seconds | 10 seconds |
| Conversational editing | Yes | No | No | No |
| Native audio | Yes (synced) | Yes | Yes | No |
| Avatar/likeness | Yes | No | No | No |
| Free tier | YouTube Shorts | No | No | Limited |
| Paid access | $7.99/mo (AI Plus) | $20/mo (ChatGPT Plus) | Bundled with Omni | Credit-based |
| API available | Coming soon | Yes | Yes | Yes |
| Best for | Social content, iteration | Narratives, characters | Cinematic shots | Asian market ads |
The honest breakdown:
Sora 2 still wins on character consistency across longer sequences. If you're making a short film where the same character appears in multiple shots, Sora handles that better. It also generates longer clips (up to 25 seconds on Pro tier).
Veo 3.1 is the choice for deliberate, cinematic work where you want precise camera control. It's slower and more expensive per clip, but the output looks more like something a cinematographer planned.
Kling dominates in Asian markets, particularly for advertising workflows. Its credit-based pricing works well for agencies that need bursts of high-volume generation.
Omni Flash's advantage is the iteration speed. The conversational editing means you spend fewer credits reaching your final output. For social media creators who need to produce volume quickly, that workflow difference adds up. The multimodal input is also unique. No other model lets you feed in audio alongside images and text as a combined prompt.
Real-World Use Cases
YouTube Shorts and TikTok Content
The free YouTube Shorts integration makes Omni Flash the lowest-friction option for short-form creators. You can go from idea to published Short without leaving the YouTube app. The 10-second cap actually fits the Shorts format well.
Product Demos and Marketing
Feed the model a product photo, describe the scene you want, and get a demo clip. The physics understanding means products interact with surfaces and lighting in believable ways. Iterate through conversation until the angle and presentation match your brand guidelines.
Educational Explainers
The avatar feature combined with conversational editing makes explainer content faster to produce. Record your avatar once, then generate yourself presenting different topics without re-recording. Useful for course creators, internal training, and documentation.
Social Media Advertising
Quick iteration on ad creative. Generate a concept, test variations ("try it with a blue background," "make the text larger," "add motion to the logo"), and export the winner. The credit cost per iteration is lower than regenerating from scratch each time.
Storyboarding and Pre-visualization
For film and video production teams, Omni Flash works as a rapid pre-visualization tool. Describe scenes, iterate on composition and timing, and use the outputs to communicate creative direction before committing to expensive live shoots.
Pricing and Availability
Google's Official Tiers
| Access Method | Cost | What You Get |
|---|---|---|
| YouTube Shorts | Free | Video generation in Shorts format |
| Google AI Plus | $7.99/month | Gemini app + Google Flow (200 credits) |
| Google AI Pro | ~$20/month | Higher limits (1,000 credits) |
| Google AI Ultra | ~$50/month | Maximum allocation (10,000-25,000 credits) |
Third-Party Access
If you want more control over output resolution and a pay-per-use model without monthly subscriptions, platforms like veol.ai offer Gemini Omni Flash access with:
- Resolution options from 720p to 4K
- Credit-based pricing starting at $0.15 per standard video
- Free trial credits to test before committing
- Dedicated video generation interface
Developer API Pricing
Not yet published. Google has confirmed availability through Gemini API and Vertex AI but hasn't released pricing tables, rate limits, or quota details. Based on Veo 3.1 pricing ($0.50 per generation on Vertex AI), expect similar or slightly higher rates for Omni Flash given the additional capabilities.
Frequently Asked Questions
Is Gemini Omni Flash free to use?
Partially. You can use it for free through YouTube Shorts and YouTube Create. For full access through the Gemini app or Google Flow, you need at least a Google AI Plus subscription ($7.99/month). Third-party platforms like veol.ai offer pay-per-use pricing starting at $0.15 per video if you don't want a monthly commitment.
How long are the videos Gemini Omni Flash generates?
Currently capped at 10 seconds per clip. Google has stated this is a policy decision rather than a technical limitation, suggesting longer clips may come in future updates. For now, you can generate multiple 10-second clips and edit them together externally.
Can Gemini Omni Flash edit existing videos?
Yes, that's one of its core features. You can upload an existing video clip and modify it through conversation: change the style, add elements, adjust the environment, or transform the visual aesthetic. The model preserves what you don't ask to change.
How does Gemini Omni Flash compare to Sora 2?
Omni Flash is better at multimodal input (combining text, images, audio, and video in one prompt) and iterative editing through conversation. Sora 2 is better at character consistency over longer sequences and generates clips up to 25 seconds. Omni Flash is cheaper to access ($7.99/mo vs $20/mo) and has a free tier through YouTube Shorts.
What are the limitations of Gemini Omni Flash?
The main limitations: 10-second clip cap, no audio/speech editing (withheld for safety), text rendering can be inaccurate, complex motion scenes may have consistency issues, no custom music or sound effects (voice and ambient only), and the developer API isn't available yet.
Can I use Gemini Omni Flash for commercial purposes?
Yes, commercial use is permitted within paid subscription tiers, subject to Google's Generative AI Prohibited Use Policy. Content involving specific likenesses, third-party IP, or regulated industries may require additional verification. All outputs carry SynthID watermarks regardless of use case.
What resolution does Gemini Omni Flash output?
Through Google's official channels, the confirmed output is 720p. Third-party platforms like veol.ai support higher resolutions up to 4K through their own processing pipeline.
Is there an API for Gemini Omni Flash?
Not yet. Google announced API availability through Gemini API and Vertex AI but hasn't published documentation, pricing, or model IDs. The timeline is "coming weeks" as of May 2026. For production video generation via API, Veo 3.1 remains the current option.
Resources and Further Reading
If you want to start generating videos with Gemini Omni Flash right away, veol.ai offers a streamlined interface with flexible pricing and resolution options up to 4K.