Repurposing existing deep networks for caption and aesthetic-guided image cropping. (June 2022)
- Record Type:
- Journal Article
- Title:
- Repurposing existing deep networks for caption and aesthetic-guided image cropping. (June 2022)
- Main Title:
- Repurposing existing deep networks for caption and aesthetic-guided image cropping
- Authors:
- Horanyi, Nora
Xia, Kedi
Yi, Kwang Moo
Bojja, Abhishake Kumar
Leonardis, Aleš
Chang, Hyung Jin - Abstract:
- Highlights: The core research question of this paper is how can we find the image part described by a user, such that the output image crop will represent and preserve the caption information meanwhile result in an aesthetically pleasing output? We have proposed a caption and aesthetics guided framework for cropping images according to the user's intention. Our framework is the first to account for the user's intention directly from the provided image caption. We argue that the currently available image cropping and caption grounding datasets are not suitable for our description-based image cropping task. Therefore, we proposed a novel dataset with multiple ground truth bounding box annotations for each caption. The experiments in Section 4.2 show that we can achieve better performance than the baseline methods for caption-based image cropping by re-proposing existing deep networks. Abstract: We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make theHighlights: The core research question of this paper is how can we find the image part described by a user, such that the output image crop will represent and preserve the caption information meanwhile result in an aesthetically pleasing output? We have proposed a caption and aesthetics guided framework for cropping images according to the user's intention. Our framework is the first to account for the user's intention directly from the provided image caption. We argue that the currently available image cropping and caption grounding datasets are not suitable for our description-based image cropping task. Therefore, we proposed a novel dataset with multiple ground truth bounding box annotations for each caption. The experiments in Section 4.2 show that we can achieve better performance than the baseline methods for caption-based image cropping by re-proposing existing deep networks. Abstract: We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization stable, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing. … (more)
- Is Part Of:
- Pattern recognition. Volume 126(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 126(2022)
- Issue Display:
- Volume 126, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 126
- Issue:
- 2022
- Issue Sort Value:
- 2022-0126-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Image cropping -- Aesthetics -- Deep network re-purposing -- Image captioning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108485 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22254.xml