The State of the Race: HappyHorse Debuts, Gemini Omni Drops, and Seedance Hits $138M/Month

The video generation leaderboard has reshuffled twice in the past six weeks. At the end of May, xAI's Grok Imagine 1.5 Preview briefly claimed the top spot on Artificial Analysis's Image-to-Video Arena, edging Seedance 2.0 by six Elo points. Then HappyHorse 1.0 — which Alibaba had quietly seeded in blind quality tests — went viral after beating American rivals, and Alibaba confirmed its ownership. Meanwhile Google announced Gemini Omni Flash at I/O 2026 on May 19, and ByteDance's Volcano Engine revised its 2026 MaaS revenue target upward by 50%, driven almost entirely by Seedance.

Here's what happened, what each move means, and where things stand.

Seedance 2.0: the revenue story changes the model story

ByteDance's numbers are striking regardless of how you read them. Volcano Engine — ByteDance's cloud and AI services unit — raised its 2026 MaaS revenue target to 15 billion yuan (~$2.07B), up from an original 10B yuan (~$1.4B) target set just months earlier 1. The driver is Seedance 2.0, which crossed 1 billion yuan (~$138M) in monthly revenue with daily token consumption still climbing at roughly 40% month-over-month 2.

That trajectory matters because it's structural, not a launch spike. ByteDance embedded Seedance inside CapCut and Douyin early, and the model now accounts for an estimated 80% of China's AI video compute and approximately 95% penetration in the country's short-drama production vertical 1. One analyst summary compared the cost economics directly: AI drama series that previously cost up to $280,000 for 80 episodes now run around $7,000 with Seedance 2.0 3.

Loading content card…

The flip side is content moderation. Independent creators report frequent false-positive blocks on original AI characters, which ByteDance itself generated, citing community guidelines violations. A detailed thread from a professional creator logged AI-moderation halts mid-series with no workaround 4. The model is functionally infrastructure for one production category; it's far less stable for independent long-form work.

On the organizational side, ByteDance moved Seed Robotics into the same reporting line as Zhou Chang, who oversees Seedream, Seedance, multimodal interaction, and world models 5. That's worth watching — ByteDance appears to be treating robotics as a world-model data problem, not a hardware problem.

HappyHorse 1.0: Alibaba's stealth launch

Alibaba rolled out HappyHorse 1.0 in limited beta in late April/early May 6. The model went viral when it topped Artificial Analysis's Video Arena in blind quality tests before Alibaba publicly confirmed authorship — a deliberate reveal strategy that generated organic attention before a branded launch 7.

The model targets cinematic-style video with strong semantic understanding and instruction following, audiovisual synchronization, and multi-shot sequencing. Access is available globally through the HappyHorse website, via Alibaba Cloud Model Studio API, and directly within the Qwen App 6.

On the leaderboard snapshot from late May, HappyHorse sat at Elo 1,443 — third globally behind Grok Imagine 1.5 (1,473) and Seedance 2.0 (1,467), ahead of all Veo 3.x variants 7.

Loading content card…

How does it stack up against Kling in practice? A structured comparison shows a split based on output type:

Dimension	HappyHorse 1.0	Kling 2.x (current)
Max clip length	~15 sec	~30 sec, with stitching
Native audio	Music / SFX; voice via plugin	Native audio + dialogue sync
Camera control	Style presets, pose control	Dynamic brush, camera moves, keyframes
Best fit	Short social, stylized, anime	Cinematic, branded video, narrative
Max resolution	1080p	1080p

Both export at 1080p with commercial licensing included at subscription tier 8.

Alibaba also launched Wan2.7-Video alongside HappyHorse — a separate model focused on giving creators natural-language camera direction, character modification, and complex cinematography moves, positioning itself closer to a director's tool than a generator 6. And a third launch in the same announcement: HappyOyster, a real-time world model that lets users build and explore environments interactively — a different product category from video generation, closer to real-time simulation.

Google: two video products, one I/O

Google announced Gemini Omni Flash at I/O 2026 on May 19 9. The distinction from Veo 3.x is architectural: Veo is a standalone video generation model; Gemini Omni is a unified multimodal model where text, image, audio, and video are native in the same system. Omni combines Gemini's reasoning with generation — "scenes respect physics, real-world knowledge, and references," as one summary described it 10.

Specific capabilities from the official announcement 9:

Conversational video editing — natural language instructions build iteratively; characters, physics, and scene state persist across turns
Multimodal input — any combination of image, text, video, or audio as reference
Physics grounding — improved simulation of gravity, kinetic energy, fluid dynamics
Avatars — users can create a digital likeness, with AI-generated lip sync to their voice
SynthID watermarking — all outputs carry an imperceptible watermark, verifiable through Gemini app, Chrome, and Search

Gemini Omni Flash launched to all Google AI Plus, Pro, and Ultra subscribers on May 29, and rolled out simultaneously to YouTube Shorts and YouTube Create at no cost 9. Developer API access was promised in the following weeks.

Veo 3.1 remains the separate dedicated video model available via Gemini API and Vertex AI. At the Artificial Analysis leaderboard snapshot from late May, Veo 3.1 held positions 5-7 globally — behind Grok Imagine 1.5, Seedance 2.0, and HappyHorse 1.0 7.

Colorful abstract design with the Google I/O 2026 branding where Gemini Omni was announced — Gemini Omni and Gemini 3.5 announced at Google I/O 2026, May 19 9

Grok Imagine 1.5: the new #1 on Image-to-Video

xAI's Grok Imagine 1.5 Preview claimed the top position on Artificial Analysis's Image-to-Video Arena on May 31, with an Elo rating of 1,473 and a +52-point jump over version 1.0 7. The differentiator xAI emphasizes: audio and video are generated simultaneously in a single pass, where other models generate video first and add audio after 11.

One caveat worth noting: this is the Image-to-Video Arena specifically, not the broader Text-to-Video benchmark where Seedance 2.0 has been dominant. Rankings on Artificial Analysis shift frequently as new votes arrive; exact Elo values should be treated as snapshots rather than stable rankings 12.

Kling: the enterprise counter

Kling (Kuaishou) has been building its commercial position more quietly. According to an April commentary, Kuaishou has built a ~$300M ARR enterprise business around the model, with clients in branded video and short-drama production 13. Kling 2.1 Master — the current top tier — received prompt understanding and motion consistency upgrades in July 2025, alongside a native audio generation layer and motion control beta for hand/body movements 14.

Kling appears in productions that have reached Western audiences without that origin being obvious: "House of David" on Amazon Prime Video used the model, and Kuaishou's Zeng Yushen was quoted in press coverage noting that AI is enabling creator experimentation as production costs fall 15.

The broader context

The Chinese labs claim seven of the top 10 spots on Artificial Analysis's video generation leaderboards. OpenAI is not in the top tier — Sora was reportedly running at $15M/day in inference cost while generating $2.1M in lifetime revenue before being folded into a consumer app 3.

That's partly a distribution story. ByteDance, Alibaba, and Kuaishou each have integrated platforms — CapCut, Douyin, Qwen App, Kuaishou — that generate demand volume and training data simultaneously. Standalone Western video AI startups are competing against companies where the generation model is a feature inside a much larger consumer stack.

The race is no longer primarily about raw quality on benchmarks. It's about where the model lives and what workflow it enables. HappyHorse sits inside the Qwen ecosystem. Seedance is part of CapCut's creator tools. Gemini Omni Flash launched directly into YouTube Shorts. Grok Imagine 1.5's key question is whether xAI can replicate that kind of distribution leverage.

Video Gen Model Tracker covers significant releases, benchmark shifts, and key announcements across Seedance, Kling, Veo, HappyHorse, and notable competitors. Sources are linked directly in the text.