Towards better long-tailed oracle character recognition with adversarial data augmentation. (August 2023)
- Record Type:
- Journal Article
- Title:
- Towards better long-tailed oracle character recognition with adversarial data augmentation. (August 2023)
- Main Title:
- Towards better long-tailed oracle character recognition with adversarial data augmentation
- Authors:
- Li, Jing
Wang, Qiu-Feng
Huang, Kaizhu
Yang, Xi
Zhang, Rui
Goulermas, John Y. - Abstract:
- Highlights: To our knowledge, it is the first work to study intensively mixup methods within the GAN framework for adversarial data augmentation in the long-tailed oracle character recognition. We propose two novel mixup-based methods, including Repatch to generalize samples in the mixup generator and the TailMix mechanism to replenish samples in the tail classes for the adversarial generator. We extensively validate the proposed model on three benchmark oracle datasets (Oracle-AYNU, OBC306 and Oracle-20K). Experimental results show that our model can yield the state-of-the-art performance as opposed to other existing oracle recognition methods. Abstract: Deciphering oracle bone script is of great significance to the study of ancient Chinese culture as well as archaeology. Although recent studies on oracle character recognition have made substantial progress, they still suffer from the long-tailed data situation that results in a noticeable performance drop on the tail classes. To mitigate this issue, we propose a generative adversarial framework to augment oracle characters in the problematic classes. In this framework, the generator produces synthetic data through convex combinations of all the available samples in the corresponding classes, and is further optimized through adversarial learning with the classifier and simultaneously the discriminator. Meanwhile, we introduce Repatch to generalize samples in the generator. Since tail classes do not have sufficient data forHighlights: To our knowledge, it is the first work to study intensively mixup methods within the GAN framework for adversarial data augmentation in the long-tailed oracle character recognition. We propose two novel mixup-based methods, including Repatch to generalize samples in the mixup generator and the TailMix mechanism to replenish samples in the tail classes for the adversarial generator. We extensively validate the proposed model on three benchmark oracle datasets (Oracle-AYNU, OBC306 and Oracle-20K). Experimental results show that our model can yield the state-of-the-art performance as opposed to other existing oracle recognition methods. Abstract: Deciphering oracle bone script is of great significance to the study of ancient Chinese culture as well as archaeology. Although recent studies on oracle character recognition have made substantial progress, they still suffer from the long-tailed data situation that results in a noticeable performance drop on the tail classes. To mitigate this issue, we propose a generative adversarial framework to augment oracle characters in the problematic classes. In this framework, the generator produces synthetic data through convex combinations of all the available samples in the corresponding classes, and is further optimized through adversarial learning with the classifier and simultaneously the discriminator. Meanwhile, we introduce Repatch to generalize samples in the generator. Since tail classes do not have sufficient data for convex combinations, we propose the TailMix mechanism to generate suitable tail class samples from other classes. Experimental results show that our proposed algorithm obtains remarkable performance in oracle character recognition and achieves new state-of-the-art average (total) accuracy with 86.03% (89.46%), 86.54% (93.86%), 95.22% (96.17%) on the three datasets Oracle-AYNU, OBC306 and Oracle-20K, respectively. … (more)
- Is Part Of:
- Pattern recognition. Volume 140(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 140(2023)
- Issue Display:
- Volume 140, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 140
- Issue:
- 2023
- Issue Sort Value:
- 2023-0140-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-08
- Subjects:
- Oracle character recognition -- Long tail -- Data imbalance -- Data augmentation -- Mixup strategy -- Generative adversarial networks
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2023.109534 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 27043.xml