Replacing the background and simultaneously adjusting foreground objects is a challenging task in image editing. Current techniques for generating such images are heavily relied on user interactions with image editing softwares, which is a tedious job for professional retouchers. Some exciting progress on image editing has been made to ease their workload. However, few models focused on guarantee the semantic consistency between the foreground and background. To solve this problem, we propose a framework —— ART(Auto-Retoucher),to generate images with sufficient semantic and spatial consistency from a given image. Inputs are first processed by semantic matting and scene parsing modules, then a multi-task verifier model will give two confidence scores for the current matching and foreground location. We demonstrate that our jointly optimized verifier model successfully guides the foreground adjustment and improves the global visual consistency.
Our source code is available on Github.
Our paper is available here.
We create a large scale auto-retouching dataset. The source images are from The Celebrity in Places dataset. We processed these images and divided all the data in our dataset into 3 categories: the positive cases, the content-level negative cases and the spatial negative cases. The foregrounds in this dataset are persons in different clothes. The backgrounds contain 16 different types of scenes (beach, office, desert, etc.), which fully meets our requirement for scene diversity.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
original image | foreground | filled background | scene parsing |
![]() |
![]() |
![]() |
![]() |
original image | foreground | moving sequence | output image |