Storm-based distributed sampling system for multi-source stream environment. (November 2018)
- Record Type:
- Journal Article
- Title:
- Storm-based distributed sampling system for multi-source stream environment. (November 2018)
- Main Title:
- Storm-based distributed sampling system for multi-source stream environment
- Authors:
- Cho, Wonhyeong
Gil, Myeong-Seon
Choi, Mi-Jung
Moon, Yang-Sae - Abstract:
- As a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of binary Bernoulli sampling in the multi-source stream environment. Binary Bernoulli sampling has the n :1 structure where n sites transmit data to 1 coordinator. However, as the number of sites increases or the input stream explosively increases, the binary Bernoulli sampling may cause a severe bottleneck in the coordinator. In addition, bidirectional communication over different networks among the coordinator and sites may incur excessive communication overhead. In this article, we propose a novel distributed processing model of binary Bernoulli sampling to solve these coordinator bottleneck and communication overhead problems. We first present a multiple-coordinator structure to solve the coordinator bottleneck. We then present a new sampling model with an integrated framework and shared memory to alleviate the communication overhead. To verify the effectiveness and scalability of the proposed model, we perform its actual implementation in Apache Storm, a real-time distributed stream processing system. Experimental results show that our Storm-based binary Bernoulli sampling improves performance by up to 1.8 times compared with the legacy method and maintains high performance evenAs a large amount of data streams occur rapidly in many recent applications such as social network service, Internet of Things, and smart factory, sampling techniques have attracted many attentions to handle such data streams efficiently. In this article, we address the performance improvement of binary Bernoulli sampling in the multi-source stream environment. Binary Bernoulli sampling has the n :1 structure where n sites transmit data to 1 coordinator. However, as the number of sites increases or the input stream explosively increases, the binary Bernoulli sampling may cause a severe bottleneck in the coordinator. In addition, bidirectional communication over different networks among the coordinator and sites may incur excessive communication overhead. In this article, we propose a novel distributed processing model of binary Bernoulli sampling to solve these coordinator bottleneck and communication overhead problems. We first present a multiple-coordinator structure to solve the coordinator bottleneck. We then present a new sampling model with an integrated framework and shared memory to alleviate the communication overhead. To verify the effectiveness and scalability of the proposed model, we perform its actual implementation in Apache Storm, a real-time distributed stream processing system. Experimental results show that our Storm-based binary Bernoulli sampling improves performance by up to 1.8 times compared with the legacy method and maintains high performance even when the input stream largely increases. These results indicate that the proposed distributed processing model is an excellent approach that solves the performance degradation problem of binary Bernoulli sampling and verifies its superiority through the actual implementation on Apache Storm. … (more)
- Is Part Of:
- International journal of distributed sensor networks. Volume 14:Number 11(2018)
- Journal:
- International journal of distributed sensor networks
- Issue:
- Volume 14:Number 11(2018)
- Issue Display:
- Volume 14, Issue 11 (2018)
- Year:
- 2018
- Volume:
- 14
- Issue:
- 11
- Issue Sort Value:
- 2018-0014-0011-0000
- Page Start:
- Page End:
- Publication Date:
- 2018-11
- Subjects:
- Distributed stream sampling -- binary Bernoulli sampling -- multi-source stream -- data stream -- Apache Storm
Sensor networks -- Periodicals
Intelligent agents (Computer software) -- Periodicals
Multisensor data fusion -- Periodicals
681.2 - Journal URLs:
- http://www.informaworld.com/smpp/title~content=t714578688~db=all ↗
http://www.metapress.com/openurl.asp?genre=journal&issn=1550-1329 ↗
http://dsn.sagepub.com/ ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1177/1550147718812698 ↗
- Languages:
- English
- ISSNs:
- 1550-1329
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.186400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8937.xml