TextTricker: Loss-based and gradient-based adversarial attacks on text classification models. (June 2020)
- Record Type:
- Journal Article
- Title:
- TextTricker: Loss-based and gradient-based adversarial attacks on text classification models. (June 2020)
- Main Title:
- TextTricker: Loss-based and gradient-based adversarial attacks on text classification models
- Authors:
- Xu, Jincheng
Du, Qingfeng - Abstract:
- Abstract: Adversarial examples are generated by adding infinitesimal perturbations to legitimate inputs so that incorrect predictions can be induced into deep learning models. They have received increasing attention recently due to their significant values in evaluating and improving the robustness of neural networks. While adversarial attack algorithms have achieved notable advancements in the continuous data of images, they cannot be directly applied for discrete symbols such as text, where all the semantic and syntactic constraints in languages are expected to be satisfied. In this paper, we propose a white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models. Our algorithm can be implemented in either a loss-based way, where word perturbations are performed according to the change in loss, or a gradient-based way, where the expected gradients are computed in the continuous embedding space to restrict the perturbations towards a certain direction. We perform extensive experiments on two publicly available datasets and three state-of-the-art text classification models to evaluate our algorithm. The empirical results demonstrate that TextTricker performs notably better than baselines in attack success rate. Moreover, we discuss various aspects of TextTricker in details to provide a deep investigation and offer suggestions for its practical use. Highlights: We propose a novel algorithm,Abstract: Adversarial examples are generated by adding infinitesimal perturbations to legitimate inputs so that incorrect predictions can be induced into deep learning models. They have received increasing attention recently due to their significant values in evaluating and improving the robustness of neural networks. While adversarial attack algorithms have achieved notable advancements in the continuous data of images, they cannot be directly applied for discrete symbols such as text, where all the semantic and syntactic constraints in languages are expected to be satisfied. In this paper, we propose a white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models. Our algorithm can be implemented in either a loss-based way, where word perturbations are performed according to the change in loss, or a gradient-based way, where the expected gradients are computed in the continuous embedding space to restrict the perturbations towards a certain direction. We perform extensive experiments on two publicly available datasets and three state-of-the-art text classification models to evaluate our algorithm. The empirical results demonstrate that TextTricker performs notably better than baselines in attack success rate. Moreover, we discuss various aspects of TextTricker in details to provide a deep investigation and offer suggestions for its practical use. Highlights: We propose a novel algorithm, TextTricker, to attack text classification models. The algorithm can be implemented in a loss-based way or in a gradient-based way. The algorithm outperforms baselines on multiple models and datasets. We empirically offer suggestions for the practical use of TextTricker . … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 92(2020)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 92(2020)
- Issue Display:
- Volume 92, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 92
- Issue:
- 2020
- Issue Sort Value:
- 2020-0092-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-06
- Subjects:
- Adversarial attacks -- Text classification -- The loss-based implementation -- The gradient-based implementation
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2020.103641 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13361.xml