Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack
Foley-Omni not only supports TTA, TTM, TTS, V2A, and VisualTTS, but also generates a complete video soundtrack in a single pass, jointly producing video-synchronized sound effects, speech, and music.
Featured Demo
Complete Video Soundtrack
All videos below are re-dubbed by Foley-Omni.
Foley-Omni on VEO3 Videos
V2ST-Bench
V2A
VisualTTS
GRID
Text-conditioned Generation
Clip-clops gallop as the wind blows and thunder cracks
A dog snoring loudly
Among other things on which she cast her eyes was a small crucifix of solid silver, standing on a cabinet near the window.
A classical sounding music piece which sounds like a music box being played through a tiny, distorted speaker of an ice cream truck. Low fidelity.