Not impressive compare to the opensource video models out there, I anticipated s...

bbor · on Dec 9, 2024

I... can you explain, or point to some competitors...? To me this looks leagues ahead of everything else. But maybe I'm behind the game?

AFAIK based on HuggingFace trending[1], the competitors are:

- bytedance/animatediff-lightning: https://arxiv.org/pdf/2403.12706 (2.7M downloads in the past 30d, released in March)

- genmo/mochi-1-preview: https://github-production-user-asset-6210df.s3.amazonaws.com... (21k downloads, released in October)

- thudm/cogvideox-5b: https://huggingface.co/THUDM/CogVideoX-5b (128k downloads, released in August)

Is there a better place to go? I'm very much not plugged into this part of LLMs, partially because it's just so damn spooky...

EDIT: I now see the reply above referencing Hunyuan, which I didn't even know was its own model. Fair enough! I guess, like always, we'll just need to wait for release so people can run their own human-preference tests to definitively say which is better. Hunyuan does indeed seem good

Geee · on Dec 9, 2024

What's the best open source video model right now?

minimaxir · on Dec 9, 2024

Hunyan (https://replicate.com/tencent/hunyuan-video , $0.70/video) is the best but somewhat expensive. LTX (https://replicate.com/fofr/ltx-video , $0.10) is cheaper/faster but less capable.

Both are permissively licensed.

treesciencebot · on Dec 9, 2024

Hunyuan at other providers like fal.ai is cheaper than SORA for the same resolution (720p 5 seconds gets you ~15 videos for $20 vs almost 50 videos at fal). It is slower than SORA (~3 minutes for a 720p video) but faster than replicate's hunyuan (by 6-7x for the same settings).

https://fal.ai/models/fal-ai/hunyuan-video

cooper_ganglia · on Dec 9, 2024

Hunyuan is a recent one that has looked pretty good.

zeknife · on Dec 9, 2024

Like with music generation models, the main thing that might make "open source" models better is most likely that they have no concern about excluding copyrighted material from the training data, so they actually get a good starting point instead of using a dataset consisting of youtube videos and stock footage