MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long Videos

🌹Hello everyone! Welcome toPoLang‘s homepage. Thank you for your support and encouragement. On the road of AIGC exploration, I willwalk with you. If you like it, pleasestar and follow PoLang or scan the code at the end of the article to join the discussion group!

1: Introduction to MultiTalk

In previous articles, we introduced the latestmulti-person dialogue digital human framework: MultiTalk, which focuses on audio-drivenmulti-person dialogue, singing, interactive control, and cartoon-style digital human video generation, providing more efficient and precise digital human video creation. Previously, we introduced the accelerated version implemented byKJ in the ComfyUI-WanVideoWrapper plugin, but at that time it only supportedsingle-person digital human video generation. Due toKJ‘s temporary hiatus, the multi-person workflow was not realized. With his recent return, themulti-person digital human version has also arrived. Therefore, today’s article will focus on introducingMultiTalk multi-person digital human video generation. For more details, please refer to previous articles:

  • MultiTalk: KJ’s super 6x acceleration, ultra-realistic top-tier karaoke digital human! Supports realistic cartoon and animal generalization for 15-second long videos
  • MultiTalk: Quick look! Ultra-realistic multi-person dialogue singing, easily generate digital human videos with animal and cartoon generalization, with stunning effects for up to 15 seconds
  • OmniAvatar: A new experience of Alibaba’s digital human! Rich expressions, gesture actions, and natural, smooth full-body movements, lifelike ultra-realistic video quality
  • github: https://github.com/MeiGen-AI/MultiTalk
  • Project homepage: https://meigen-ai.github.io/multi-talk/

2: Model and Environment Installation

This article uses theComfyUI-WanVideoWrapper plugin for the experience.Models and workflows can be downloaded from the link at the end of the article!

  • ComfyUI-WanVideoWrapper: https://github.com/kijai/ComfyUI-WanVideoWrapper
  • Since MultiTalk is developed independently on theMultiTalk branch and has not yet been merged into the main branch, you need to switch branches locally<span class="language-plaintext">git switch multitalk</span>
  • WanVideo_2_1_Multitalk_14B_fp8_e4m3fn: Download the model and place it in theComfyUI/models/unet directory.
  • Wan21_T2V_14B_lightx2v_cfg_step_distill_lora: Download the model and place it in theComfyUI/models/loras directory.
  • Wan21_Uni3C_controlnet_fp16: Download the model and place it in theComfyUI/models/controlnet directory.
  • TencentGameMate: Additionally, the first run will automatically download the TencentGameMate model and place it in the/ComfyUI/models/transformers/TencentGameMate/chinese-wav2vec2-base directory.
  • Moreover, the multi-person version is identical to the previous single-person model; but don’t forget to update theComfyUI-KJNodes plugin.

MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long VideosMultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long VideosMultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long VideosMultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long Videos

3: Model Evaluation and Experience

MultiTalk multi-person digital human video experience workflow is as follows: For the best practices ofMultiTalk, please refer to the summary section at the end of the article. MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long Videos

Core Nodes: This time, the main additions are theMultiTalk Wav2Vec Embeds plugin for multi-person audio parameters and multi-person facial recognition masks.

MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long VideosAmong the parameters,multi_audio_type includespara and add options. Here, para represents multi-person dialogue suitable for chorus, while add represents each person speaking in turn suitable for alternating singing.

01. IndexTTS Multi-Person Voice

02. Singing Digital Human

Multi-person chorus:Alternating singing:

4: Recommended Online Experience

1. XianGong Cloud Mirror: Recommended to use the cloud mirror experience: New registrations receive 8 yuan free credits.Registration link: https://www.xiangongyun.com/register/UJ6IVE

  • WanXiang, HunYuan, FramePack, and LTXV video comprehensive experience integration mirror: https://www.xiangongyun.com/image/detail/a5702c45-2b5a-4dfa-9f9b-086d885773ec?r=UJ6IVE
  • Kontext Image Editing LORA Alchemy Furnace: https://www.xiangongyun.com/image/detail/efd35158-dba3-47c9-a453-bbd3ee46310a?r=UJ6IVE

2. RunningHUB: Recommended onlineRunningHUB platform for experiencing AI applications and workflows (registration gives 1000 points). More exciting workflows can be experienced online at: https://www.runninghub.cn/user-center/1890418187312222210/webapp?inviteCode=kol01-rh059 (Invitation code: kol01-rh059

  • VACE14B – Ultimate Perfect Motion Pose Transfer: https://www.runninghub.cn/ai-detail/1922722674630606850/?inviteCode=kol01-rh059
MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long Videos

Recommended Reading

  • Black Forest Kontext: 22 styles one-click painting artifact! The new king of consistent image editing, LORA thriving ecosystem
  • Black Forest Kontext: You need to master the precise area editing techniques! Just one sentence for image editing, the strongest voice in image editing
  • Black Forest Kontext: Core techniques for multi-image editing, greatly improving accuracy, precisely solving the problem of product character proportion imbalance
  • OmniAvatar: A new experience of Alibaba’s digital human! Rich expressions, gesture actions, and natural, smooth full-body movements, lifelike ultra-realistic video quality
  • Kontext-DEV: The strongest image editing from Black Forest is heavily open-sourced! A new era of consistent editing, GGUF low memory usage. Includes migration | e-commerce | furniture comprehensive collection

5: Article Summary

Summary of usage techniques for MultiTalk methods are as follows:

  • Ultra-realistic multi-person dialogue digital humans have very good quality, belonging to the current top open-source digital human video models. In KJ’s implementation, it achieves 5-6 times acceleration compared to the native LORA implementation. For single-person dialogue reference: MultiTalk: KJ’s super 6x acceleration, ultra-realistic top-tier karaoke digital human! Supports realistic cartoon and animal generalization for 15-second long videos
  • MultiTalk also supports480 and 720 resolution high-quality video generation, and can generate videos up to15 seconds long. Note, however, that this takes more time and memory.
  • Currently, there are still issues with multiple mouths moving simultaneously; I believe this will be well resolved in KJ’s future updates.

Workflow and Model Downloads

  • Ultra-realistic multi-person dialogue digital human – KJ – MultiTalk: https://www.runninghub.cn/ai-detail/1940926946006167554/?inviteCode=kol01-rh059
  • KJ accelerated version – top-tier MultiTalk karaoke digital human: https://www.runninghub.cn/ai-detail/1936228380436238338/?inviteCode=kol01-rh059
  • LIBLIB download: https://www.liblib.art/modelinfo/03891861fc3c413c8324bfc2e9178ccb?from=personal_page&versionUuid=f09f581fdde54d778a656a31d421d666 (Registration: https://www.liblib.art/viphome?referralCode=vspjs7PH)
  • Wan2.1 WanXiang video model download: https://pan.quark.cn/s/2605bbea7d92?pwd=u6ru Extraction code: u6ru

    If interested, join the [AGI Technology Discussion Group]+V

    MultiTalk: The Ultra-Realistic Digital Human for Group Conversations is Here! Essential for Karaoke, Covering Realistic Cartoon Animals, Easily Generate 15-Second Long Videos

    Great articleFollow + comment + like +look +share and interactsupport

Leave a Comment