Towards robust explanations for deep neural networks. (January 2022)
- Record Type:
- Journal Article
- Title:
- Towards robust explanations for deep neural networks. (January 2022)
- Main Title:
- Towards robust explanations for deep neural networks
- Authors:
- Dombrowski, Ann-Kathrin
Anders, Christopher J.
Müller, Klaus-Robert
Kessel, Pan - Abstract:
- Highlights: We investigate how to enhance the resilience of explanations against manipulation. Explanations visualize the relevance of each input feature for the network's prediction. We develop a theoretical framework and derive bounds on the maximal change of an explanation. Based on these insights we present three different techniques to increase robustness. training with weight decay. smoothing activation functions. minimizing the Hessian of the network. Application of our methods shows significantly improved resilience of explanations. Graphical abstract: Abstract: Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches.
- Is Part Of:
- Pattern recognition. Volume 121(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 121(2022)
- Issue Display:
- Volume 121, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 121
- Issue:
- 2022
- Issue Sort Value:
- 2022-0121-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Explanation method -- Saliency map -- Adversarial attacks -- Manipulation -- Neural networks,
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108194 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25035.xml