Veo 3.1 by Google DeepMind: cinematic video with sound baked right in.
Veo 3.1 makes clips up to 8 seconds with native, synced audio — dialogue, effects and ambience, no separate sound pass. Start from text or from photos: set a first and last frame, or feed up to three reference images. Output is 720p or 1080p, up to 4K via upscale.
| Type | Video generation (text-to-video, image-to-video) |
|---|---|
| Animate a photo | Yes |
| Input frames | 1–2 (first/last) |
| References | up to 3 |
| Audio | yes, native |
| Clip length | 8s |
| Resolution | 720p, 1080p, 4K* |
| Prompt length | 1000 characters |
| Provider model | Google Veo 3.1 |
| Released | 2025-10-14 |
Veo 3.1 by Google DeepMind: cinematic video with sound baked right in. It is a video model by Google DeepMind (Google Veo 3.1), available on Mixer AI pay-as-you-go — from 18 coins.
Pay as you go, no plans — from 18 coins. The exact price is shown before you run it.
Yes — upload a photo as a frame or reference and the model turns it into video. Text-to-video also works.
No. Mixer AI is pay-as-you-go: you top up a balance in coins and spend it only on the generations you want. Available on the site and in the Telegram bot @addbeer_bot.