Identity leakage occurs when visual features from the driving video spill into the generated image, causing the face to lose resemblance to its intended subject. Shuai set out to understand why this happens. “We conducted two exploration experiments on existing frameworks, aiming to identify the cause behind the issue,” he explains. “Interestingly, we not only found the feature that leads to identity leakage, but also discovered that under specific conditions, taming identity leakage can actually help eliminate rendering artifacts.” That realization became the foundation for FixTalk – to mitigate the negative impact caused by identity leakage while maximizing its positive role in rendering artifacts. To achieve this, the team introduced two lightweight modules: an Enhanced Motion Indicator (EMI) and an Enhanced Detail Indicator (EDI). EMI decouples identity information from motion features to prevent identity leakage, while EDI reuses certain leaked identity information to fill in missing visual details – a combination that achieves superior performance compared to state-of-the-art methods. Talking head generation has wide applications in entertainment, gaming, filmmaking, and digital communication. But as Shuai notes, its success depends on realism: “If the identity leakage and the rendering artifacts exist in our method, the performance is very poor, so people can easily know this is a fake. We need to fix the issues – make it more real!” It feels like a touch of serendipity that Shuai is sitting here today as a Best Paper candidate, given that FixTalk’s story began this time last year at ECCV 2024 with a question 9 DAILY ICCV Tuesday Shuai Tan
RkJQdWJsaXNoZXIy NTc3NzU=