Adversarial attacks on text classification models using layer‐wise relevance propagation. Issue 9 (20th July 2020)

Record Type:: Journal Article
Title:: Adversarial attacks on text classification models using layer‐wise relevance propagation. Issue 9 (20th July 2020)
Main Title:: Adversarial attacks on text classification models using layer‐wise relevance propagation
Authors:: Xu, Jincheng
Du, Qingfeng
Abstract:: Abstract: Due to the nested nonlinear structure inside neural networks, most existing deep learning models are treated as black boxes, and they are highly vulnerable to adversarial attacks. On the one hand, adversarial examples shed light on the decision‐making process of these opaque models to interrogate the interpretability. On the other hand, interpretability can be used as a powerful tool to assist in the generation of adversarial examples by affording transparency on the relative contribution of each input feature to the final prediction. Recently, a post‐hoc explanatory method, layer‐wise relevance propagation (LRP), shows significant value in instance‐wise explanations. In this paper, we attempt to optimize the recently proposed explanation‐based attack algorithms (EAAs) on text classification models with LRP. We empirically show that LRP provides good explanations and benefits existing EAAs notably. Apart from that, we propose a LRP‐based simple but effective EAA, LRPTricker . LRPTricker uses LRP to identify important words and subsequently performs typo‐based perturbations on these words to generate the adversarial texts. The extensive experiments show that LRPTricker is able to reduce the performance of text classification models significantly with infinitesimal perturbations as well as lead to high scalability.
Is Part Of:: International journal of intelligent systems. Volume 35:Issue 9(2020)
Journal:: International journal of intelligent systems
Issue:: Volume 35:Issue 9(2020)
Issue Display:: Volume 35, Issue 9 (2020)
Year:: 2020
Volume:: 35
Issue:: 9
Issue Sort Value:: 2020-0035-0009-0000
Page Start:: 1397
Page End:: 1415
Publication Date:: 2020-07-20
Subjects:: adversarial attacks -- layer‐wise relevance propagation -- text classification
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
006.3
Journal URLs:: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-111X ↗
https://www.hindawi.com/journals/ijis ↗
http://onlinelibrary.wiley.com/ ↗
DOI:: 10.1002/int.22260 ↗
Languages:: English
ISSNs:: 0884-8173
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4542.310500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 13670.xml