❝

🌹Hello everyone! Welcome to PoLang‘s homepage. Thank you for your support and encouragement. On the road of AIGC exploration, I will be with you all the way. If you like it, please star and follow PoLang or scan the code at the end of the article to join the discussion group!

1: Introduction to MultiTalk

In yesterday’s article, we introduced a recent digital avatar conversation framework: MultiTalk, which focuses on audio-driven multi-person conversations, singing, interactive control, and cartoon-style digital avatar video generation, providing more efficient and precise digital avatar video creation. Advantages include: support for single and multi-person generation, interactive character control, excellent generalization performance for real people, animals, and cartoon themes, and the ability to support flexible generation of 480 and 720 resolutions with a maximum length of 15-second videos.

However, due to high memory requirements, the RunningHUB plugin has not yet been open-sourced, and the model requires significant memory and time for quantization, which will hinder the local experience for most consumer-grade graphics cards. Now, community expert kijai has stepped in, quantizing the fp8 model (only 2.7G) and supporting the wanvideo plugin. In the workflow, the latest lightx2v model with only 4 steps acceleration is used, along with Uni3C Controlnet for camera movement, resulting in superior video quality. Therefore, today’s article will focus on the latest kj ComfyUI workflow experience. Additionally, the kj version currently only supports single-person conversation or singing videos, for multi-person conversation, refer to the previous article: MultiTalk: Quick Look! Ultra-Realistic Multi-Person Singing Conversations, Easily Generate Digital Avatar Videos with Animal and Cartoon Generalization, Maximum 15-Second Effects are Stunning

github: https://github.com/MeiGen-AI/MultiTalk
Project Homepage: https://meigen-ai.github.io/multi-talk/
MultiTalk: Quick Look! Ultra-Realistic Multi-Person Singing Conversations, Easily Generate Digital Avatar Videos with Animal and Cartoon Generalization, Maximum 15-Second Effects are Stunning

2: Model and Environment Installation

This article uses the ComfyUI-WanVideoWrapper plugin for the experience. Models and workflow can be downloaded at the end of the article!

ComfyUI-WanVideoWrapper: https://github.com/kijai/ComfyUI-WanVideoWrapper
Since MultiTalk is developed independently on the MultiTalk branch and has not yet been merged into the main branch, you need to switch branches locally<span>git switch multitalk</span>
WanVideo_2_1_Multitalk_14B_fp8_e4m3fn: Download the model and place it in the ComfyUI/models/unet directory.
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora: Download the model and place it in the ComfyUI/models/loras directory.
Wan21_Uni3C_controlnet_fp16: Download the model and place it in the ComfyUI/models/controlnet directory.
TencentGameMate: Additionally, the first run will automatically download the TencentGameMate model and place it in the **/ComfyUI/models/transformers/TencentGameMate/chinese-wav2vec2-base** directory.

MultiTalk: KJ Achieves Over 6x Acceleration with Ultra-Realistic Digital Singing Avatars! Supports Realistic Cartoon and Animal Generalization for 15-Second Long Videos

3: Model Evaluation and Experience

MultiTalk workflow experience is as follows (workflow and model download at the end of the article):

Core Node: Here, the MultiTalk Wav2Vec Embeds node is used for voice encoding, and the lightx2v sampling configuration is set to 5-step acceleration, note cfg=1. Compared to yesterday’s plugin, KJ’s flow speed is extremely fast, taking only 295 seconds (about 5 minutes) for sampling. The current node only supports single-person digital avatars; multi-person digital avatars will need to wait for KJ’s subsequent updates. Additionally, if there are better motion camera movements, you can use input video to guide Uni3C control. You can also use I2V LORA with motion digital avatars.

MultiTalk: KJ Achieves Over 6x Acceleration with Ultra-Realistic Digital Singing Avatars! Supports Realistic Cartoon and Animal Generalization for 15-Second Long Videos

Node Sampling Time MultiTalk: KJ Achieves Over 6x Acceleration with Ultra-Realistic Digital Singing Avatars! Supports Realistic Cartoon and Animal Generalization for 15-Second Long Videos

01. Realistic Digital Avatar

一位女人站在酒店前拿起麦克风激情歌唱。The image features a woman dressed in a beautiful, fitted traditional Vietnamese áo dài made of a soft, creamy white fabric. The dress has long sleeves and gracefully flows down to the floor, adorned with vibrant purple flower embellishments at the bottom and on the chest, adding a touch of elegance and charm. She is holding a silver microphone, suggesting she may be about to speak or perform, likely in a social or celebratory setting. The background reveals a luxurious dining area with modern decor, including a marble table neatly set with glasses and bottles, as well as a colorful assortment of fruits that enhance the festive atmosphere. The setting seems to be well lit, with soft, warm lighting that highlights the details of her outfit and the elegance of the surroundings. Overall, the image captures a moment filled with grace and sophistication, indicative of a special occasion possibly related to culture or celebration, where traditional attire merges with a contemporary setting。

02. Cartoon Digital Avatar

A big cat is singing loudly into the camera, moving its body and waving its hands. A cute, furry image, with big and expressive eyes and a rich range of expressions.

03. Dance Digital Avatar

Stationary camera view of A woman passionately singing into a professional microphone in a recording studio. She wears large black headphones and a dark cardigan over a gray top. Her long, wavy brown hair frames her face as she looks slightly upwards, her mouth open mid-song. The studio is equipped with various audio equipment, including a mixing console and a keyboard, with soundproofing panels on the walls. The lighting is warm and focused on her, creating a professional and intimate atmosphere. A close-up shot captures her expressive performance.

04. Street Digital Avatar

A woman was singing passionately on the street. , The image showcases a young woman walking through a vibrant nighttime setting, illuminated by soft bokeh lights in the background, suggesting an urban atmosphere. She is wearing a stylish white top that is knotted at the waist, accentuating her midriff. The top has a feminine cut with short sleeves and a slightly loose fit, contributing to a casual yet trendy appearance. She pairs the top with light blue skinny jeans that fit snugly, highlighting her figure. The jeans feature a classic five-pocket design and a black belt embellished with a prominent Gucci logo buckle, which adds a touch of luxury to her outfit. Complementing her look, she carries a small handbag, which hints at a fashion-conscious style. Her long, dark hair flows smoothly down her back, and her posture exudes confidence as she strides forward. The backdrop blurred with distant figures indicates a bustling scene, with trees and city lights enhancing the lively ambiance. The overall vibe is youthful, chic, and modern, perfectly representing contemporary street style.

4: Recommended Online Experience

1. XianGong Cloud Mirror: Recommended to use the cloud mirror experience: New registrations receive 8 yuan free credit.Registration link: https://www.xiangongyun.com/register/UJ6IVE

WanXiang wan& HunYuan & FramePack & LTXV video comprehensive experience integration mirror: https://www.xiangongyun.com/image/detail/a5702c45-2b5a-4dfa-9f9b-086d885773ec?r=UJ6IVE

2. RunningHUB: Recommended online RunningHUB platform for online experience of AI applications and workflows (registration gives 1000 points). More exciting workflows can be experienced online: https://www.runninghub.cn/user-center/1890418187312222210/webapp?inviteCode=kol01-rh059 (Invitation code: kol01-rh059)

VACE14B – Ultimate Perfect Motion Pose Transfer: https://www.runninghub.cn/ai-detail/1922722674630606850/?inviteCode=kol01-rh059

5: Article Summary

MultiTalk experience requires attention to the following:

MultiTalk supports single and multi-person digital avatar generation, interactive character control, and excellent generalization performance, especially in cartoon and animal digital avatars. The current KJ version does not yet support multi-person, pending future updates from the expert
It also supports 480 and 720 resolutions for high-quality video generation, with a maximum of 15-second video generation. The KJ version uses lightx2v 5-step acceleration, with sampling nodes taking about 5-6 minutes, which is much faster than previous versions
When using Uni3C for camera movement, note that the input image and reference video width and height resolutions need to be consistent, thus the workflow adds image resizing. You can also use I2V LORA with motion digital avatars.
For multi-person karaoke digital avatars, refer to the article: MultiTalk: Quick Look! Ultra-Realistic Multi-Person Singing Conversations, Easily Generate Digital Avatar Videos with Animal and Cartoon Generalization, Maximum 15-Second Effects are Stunning

Workflow and Model Downloads

KJ Accelerated Version – MultiTalk Ultra-Realistic Conversation Singing Digital Avatar: https://www.runninghub.cn/ai-detail/1936228380436238338/?inviteCode=kol01-rh059
Full Blood Accelerated Version – MultiTalk – Single Person Conversation: https://www.runninghub.cn/ai-detail/1934779138240913410/?inviteCode=kol01-rh059
Full Blood Accelerated Version-MultiTalk – Multi-Person Conversation: https://www.runninghub.cn/ai-detail/1934780268601647106/?inviteCode=kol01-rh059
LIBLIB Workflow Download: https://www.liblib.art/modelinfo/03891861fc3c413c8324bfc2e9178ccb?versionUuid=f09f581fdde54d778a656a31d421d666
WanVideo_2_1_Multitalk_14B_fp8_e4m3fn: https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/WanVideo_2_1_Multitalk_14B_fp8_e4m3fn.safetensors
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora: https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
Wan21_Uni3C_controlnet_fp16: https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Wan21_Uni3C_controlnet_fp16.safetensors
TencentGameMate/chinese-wav2vec2-base: https://huggingface.co/TencentGameMate/chinese-wav2vec2-base
WanXiang Video Model Cloud Disk Download: https://pan.quark.cn/s/d6c08d7a3cab

Interested in joining the [AGI Technology Discussion Group]+V

If you find the article good please follow + leave a message + like + look + share and interact support