New AI Model Achieves 20% Reduction in "Copy-Paste" Artifacts for Image Generation

Image for New AI Model Achieves 20% Reduction in "Copy-Paste" Artifacts for Image Generation

A new diffusion-based model named WithAnyone has been introduced, aiming to significantly enhance controllable and identity-consistent image generation while mitigating prevalent "copy-paste" artifacts. Researchers from Fudan University and StepFun detailed their work in a recently published arXiv paper, providing a novel approach to generating diverse images of individuals without compromising identity fidelity. The project's official announcement, shared by "AK" on social media, highlighted the paper, stating, "> WithAnyone: Towards Controllable and ID Consistent Image Generation."

Existing text-to-image models often struggle with the "copy-paste" phenomenon, where generated images of a specific identity too closely replicate a reference image, limiting variations in pose, expression, or lighting. WithAnyone addresses this by introducing a large-scale paired dataset, MultiID-2M, which offers diverse references for each identity, and a new training paradigm incorporating a contrastive identity loss. This methodology allows for a better balance between identity fidelity and expressive diversity in generated visuals.

The research outlines three key contributions: the MultiID-2M dataset, a comprehensive benchmark called MultiID-Bench for evaluating multi-identity generation, and the WithAnyone model itself. Quantitative and qualitative experiments demonstrate that WithAnyone substantially reduces these artifacts, improves control over attributes like pose and expression, and maintains high perceptual quality. User studies further validate the model's ability to achieve high identity fidelity alongside expressive controllable generation.

The MultiID-2M dataset comprises 500,000 group photos featuring 1-5 recognizable celebrities, each with hundreds of individual reference images, alongside 1.5 million unpaired group photos. This extensive data facilitates a four-phase training pipeline, moving from reconstruction pre-training to paired tuning and quality tuning, specifically designed to discourage trivial replication and promote robust identity-conditioned synthesis. The model and associated resources, including checkpoints and the MultiID-Bench, have been made publicly available on GitHub, fostering further research and development in the field.