Conditional information gain networks as sparse mixture of experts. (December 2021)
- Record Type:
- Journal Article
- Title:
- Conditional information gain networks as sparse mixture of experts. (December 2021)
- Main Title:
- Conditional information gain networks as sparse mixture of experts
- Authors:
- Bicici, Ufuk Can
Akarun, Lale - Abstract:
- Abstract: Deep neural network models owe their representational power and high performance in classification tasks to the high number of learnable parameters. Running deep neural network models in limited-resource environments is a problematic task. Models employing conditional computing aim to reduce the computational burden while retaining model performance on par with more complex neural network models. This paper, proposes a new model, Conditional Information Gain Networks as Sparse Mixture of Experts (sMoE-CIGNs). A CIGN model is a neural tree that allows conditionally skipping parts of the tree based on routing mechanisms inserted into the architecture. These routing mechanisms are based on differentiable Information Gain objectives. CIGN groups semantically similar samples in the leaves, enabling simpler classifiers to focus on differentiating between similar classes. This lets the CIGN model attain high classification performances with lighter models. We further improve the basic CIGN model by proposing a sparse mixture of experts model for difficult to classify samples that may get routed to suboptimal branches. If a sample has routing confidence higher than a specific threshold, the sample may be routed towards multiple child nodes. The classification decision can then be taken as a mixture of these expert decisions. We learn the optimal routing thresholds by Bayesian Optimization over a validation set by minimizing a weighted loss, including the classificationAbstract: Deep neural network models owe their representational power and high performance in classification tasks to the high number of learnable parameters. Running deep neural network models in limited-resource environments is a problematic task. Models employing conditional computing aim to reduce the computational burden while retaining model performance on par with more complex neural network models. This paper, proposes a new model, Conditional Information Gain Networks as Sparse Mixture of Experts (sMoE-CIGNs). A CIGN model is a neural tree that allows conditionally skipping parts of the tree based on routing mechanisms inserted into the architecture. These routing mechanisms are based on differentiable Information Gain objectives. CIGN groups semantically similar samples in the leaves, enabling simpler classifiers to focus on differentiating between similar classes. This lets the CIGN model attain high classification performances with lighter models. We further improve the basic CIGN model by proposing a sparse mixture of experts model for difficult to classify samples that may get routed to suboptimal branches. If a sample has routing confidence higher than a specific threshold, the sample may be routed towards multiple child nodes. The classification decision can then be taken as a mixture of these expert decisions. We learn the optimal routing thresholds by Bayesian Optimization over a validation set by minimizing a weighted loss, including the classification accuracy and the number of multiplication and accumulations (MAC). We show the effectiveness of the CIGN models enhanced with the Sparse Mixture of Experts approach with extensive tests on MNIST, Fashion MNIST, CIFAR 100 and UCI-USPS datasets, as well as comparisons with methods from the literature. sMoE-CIGN models can retain high generalization performance, on par with a thick unconditional model while keeping the operation burden at the same level with a much thinner model. 1 … (more)
- Is Part Of:
- Pattern recognition. Volume 120(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 120(2021)
- Issue Display:
- Volume 120, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 120
- Issue:
- 2021
- Issue Sort Value:
- 2021-0120-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-12
- Subjects:
- Machine learning -- Deep learning -- Conditional deep learning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108151 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18489.xml