❝
🌹Hello everyone! Welcome toPoLang‘s homepage. Thank you for your support and encouragement. On the road of AIGC exploration, I willwalk with you. If you like it, pleasestar and follow PoLang or scan the code at the end of the article to join the discussion group!
1: Introduction to MultiTalk
In previous articles, we introduced the latestmulti-person dialogue digital human framework: MultiTalk, which focuses on audio-drivenmulti-person dialogue, singing, interactive control, and cartoon-style digital human video generation, providing more efficient and precise digital human video creation. Previously, we introduced the accelerated version implemented byKJ in the ComfyUI-WanVideoWrapper plugin, but at that time it only supportedsingle-person digital human video generation. Due toKJ‘s temporary hiatus, the multi-person workflow was not realized. With his recent return, themulti-person digital human version has also arrived. Therefore, today’s article will focus on introducingMultiTalk multi-person digital human video generation. For more details, please refer to previous articles:
- MultiTalk: KJ’s super 6x acceleration, ultra-realistic top-tier karaoke digital human! Supports realistic cartoon and animal generalization for 15-second long videos
- MultiTalk: Quick look! Ultra-realistic multi-person dialogue singing, easily generate digital human videos with animal and cartoon generalization, with stunning effects for up to 15 seconds
- OmniAvatar: A new experience of Alibaba’s digital human! Rich expressions, gesture actions, and natural, smooth full-body movements, lifelike ultra-realistic video quality
- github: https://github.com/MeiGen-AI/MultiTalk
- Project homepage: https://meigen-ai.github.io/multi-talk/
2: Model and Environment Installation
This article uses theComfyUI-WanVideoWrapper plugin for the experience.Models and workflows can be downloaded from the link at the end of the article!
- ComfyUI-WanVideoWrapper: https://github.com/kijai/ComfyUI-WanVideoWrapper
- Since MultiTalk is developed independently on theMultiTalk branch and has not yet been merged into the main branch, you need to switch branches locally
<span class="language-plaintext">git switch multitalk</span> - WanVideo_2_1_Multitalk_14B_fp8_e4m3fn: Download the model and place it in theComfyUI/models/unet directory.
- Wan21_T2V_14B_lightx2v_cfg_step_distill_lora: Download the model and place it in theComfyUI/models/loras directory.
- Wan21_Uni3C_controlnet_fp16: Download the model and place it in theComfyUI/models/controlnet directory.
- TencentGameMate: Additionally, the first run will automatically download the TencentGameMate model and place it in the/ComfyUI/models/transformers/TencentGameMate/chinese-wav2vec2-base directory.
- Moreover, the multi-person version is identical to the previous single-person model; but don’t forget to update theComfyUI-KJNodes plugin.




3: Model Evaluation and Experience
MultiTalk multi-person digital human video experience workflow is as follows: For the best practices ofMultiTalk, please refer to the summary section at the end of the article. 
Core Nodes: This time, the main additions are theMultiTalk Wav2Vec Embeds plugin for multi-person audio parameters and multi-person facial recognition masks.
Among the parameters,multi_audio_type includespara and add options. Here, para represents multi-person dialogue suitable for chorus, while add represents each person speaking in turn suitable for alternating singing.
01. IndexTTS Multi-Person Voice
02. Singing Digital Human
Multi-person chorus:Alternating singing:
4: Recommended Online Experience
1. XianGong Cloud Mirror: Recommended to use the cloud mirror experience: New registrations receive 8 yuan free credits.Registration link: https://www.xiangongyun.com/register/UJ6IVE
- WanXiang, HunYuan, FramePack, and LTXV video comprehensive experience integration mirror: https://www.xiangongyun.com/image/detail/a5702c45-2b5a-4dfa-9f9b-086d885773ec?r=UJ6IVE
- Kontext Image Editing LORA Alchemy Furnace: https://www.xiangongyun.com/image/detail/efd35158-dba3-47c9-a453-bbd3ee46310a?r=UJ6IVE
2. RunningHUB: Recommended onlineRunningHUB platform for experiencing AI applications and workflows (registration gives 1000 points). More exciting workflows can be experienced online at: https://www.runninghub.cn/user-center/1890418187312222210/webapp?inviteCode=kol01-rh059 (Invitation code: kol01-rh059)
- VACE14B – Ultimate Perfect Motion Pose Transfer: https://www.runninghub.cn/ai-detail/1922722674630606850/?inviteCode=kol01-rh059

Recommended Reading
- Black Forest Kontext: 22 styles one-click painting artifact! The new king of consistent image editing, LORA thriving ecosystem
- Black Forest Kontext: You need to master the precise area editing techniques! Just one sentence for image editing, the strongest voice in image editing
- Black Forest Kontext: Core techniques for multi-image editing, greatly improving accuracy, precisely solving the problem of product character proportion imbalance
- OmniAvatar: A new experience of Alibaba’s digital human! Rich expressions, gesture actions, and natural, smooth full-body movements, lifelike ultra-realistic video quality
- Kontext-DEV: The strongest image editing from Black Forest is heavily open-sourced! A new era of consistent editing, GGUF low memory usage. Includes migration | e-commerce | furniture comprehensive collection
5: Article Summary
Summary of usage techniques for MultiTalk methods are as follows:
- Ultra-realistic multi-person dialogue digital humans have very good quality, belonging to the current top open-source digital human video models. In KJ’s implementation, it achieves 5-6 times acceleration compared to the native LORA implementation. For single-person dialogue reference: MultiTalk: KJ’s super 6x acceleration, ultra-realistic top-tier karaoke digital human! Supports realistic cartoon and animal generalization for 15-second long videos
- MultiTalk also supports480 and 720 resolution high-quality video generation, and can generate videos up to15 seconds long. Note, however, that this takes more time and memory.
- Currently, there are still issues with multiple mouths moving simultaneously; I believe this will be well resolved in KJ’s future updates.
Workflow and Model Downloads
- Ultra-realistic multi-person dialogue digital human – KJ – MultiTalk: https://www.runninghub.cn/ai-detail/1940926946006167554/?inviteCode=kol01-rh059
- KJ accelerated version – top-tier MultiTalk karaoke digital human: https://www.runninghub.cn/ai-detail/1936228380436238338/?inviteCode=kol01-rh059
- LIBLIB download: https://www.liblib.art/modelinfo/03891861fc3c413c8324bfc2e9178ccb?from=personal_page&versionUuid=f09f581fdde54d778a656a31d421d666 (Registration: https://www.liblib.art/viphome?referralCode=vspjs7PH)
- Wan2.1 WanXiang video model download: https://pan.quark.cn/s/2605bbea7d92?pwd=u6ru Extraction code: u6ru
If interested, join the [AGI Technology Discussion Group]+V

Great articleFollow + comment + like +look +share and interactsupport