Adaptive synchronous strategy for distributed machine learning. Issue 12 (20th September 2022)
- Record Type:
- Journal Article
- Title:
- Adaptive synchronous strategy for distributed machine learning. Issue 12 (20th September 2022)
- Main Title:
- Adaptive synchronous strategy for distributed machine learning
- Authors:
- Tan, Miaoquan
Liu, Wai‐Xi
Luo, Junming
Chen, Haosen
Guo, Zhen‐Zheng - Abstract:
- Abstract: In distributed machine learning training, bulk synchronous parallel (BSP) and asynchronous parallel (ASP) are two main synchronization methods to help achieve gradient aggregation. However, BSP needs longer training time due to "stragglers" problem, while ASP sacrifices the accuracy due to "gradient staleness" problem. In this article, we propose a distributed training paradigm on parameter server framework called adaptive synchronous strategy (A2S) which improves the BSP and ASP paradigms by adaptively adopting different parallel training schemes for workers with different training speeds. Based on the stale value between the fastest and slowest worker, A2S adaptively adds a relaxed synchronous barrier for fast workers to alleviate gradient staleness, where a differentiated weighting gradient aggregation method is used to reduce the impact of slow gradients. Simultaneously, A2S adopts ASP training for slow workers to eliminate stragglers. Hence, A2S not only improves the "gradient staleness" and "stragglers" problems, but also obtains convergence stability and synchronous gain through synchronous and asynchronous parallel, respectively. Specially, we theoretically proved the convergence of A2S by deriving the regret bound. Moreover, experiment results show that A2S improves accuracy by up to 2.64% and accelerates training by up to 41% more than the state‐of‐the‐art synchronization methods BSP, ASP, stale synchronous parallel (SSP), dynamic SSP, and Sync‐switch.
- Is Part Of:
- International journal of intelligent systems. Volume 37:Issue 12(2022)
- Journal:
- International journal of intelligent systems
- Issue:
- Volume 37:Issue 12(2022)
- Issue Display:
- Volume 37, Issue 12 (2022)
- Year:
- 2022
- Volume:
- 37
- Issue:
- 12
- Issue Sort Value:
- 2022-0037-0012-0000
- Page Start:
- 11713
- Page End:
- 11741
- Publication Date:
- 2022-09-20
- Subjects:
- ASP -- BSP -- distributed training -- parameter server -- synchronous strategy
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
006.3 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-111X ↗
https://www.hindawi.com/journals/ijis ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/int.23060 ↗
- Languages:
- English
- ISSNs:
- 0884-8173
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.310500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25605.xml