Plumb: Efficient stream processing of multi‐user pipelines. (11th October 2020)
- Record Type:
- Journal Article
- Title:
- Plumb: Efficient stream processing of multi‐user pipelines. (11th October 2020)
- Main Title:
- Plumb: Efficient stream processing of multi‐user pipelines
- Authors:
- Qadeer, Abdul
Heidemann, John - Abstract:
- Abstract: Operational services run 24×7 and require analytics pipelines to evaluate performance. In mature services such as domain name system (DNS), these pipelines often grow to many stages developed by multiple, loosely coupled teams. Such pipelines pose two problems: first, computation and data storage may be duplicated across components developed by different groups, wasting resources. Second, processing can be skewed, with structural skew occurring when different pipeline stages need different amounts of resources, and computational skew occurring when a block of input data requires increased resources. Duplication and structural skew both decrease efficiency, increasing cost, latency, or both. Computational skew can cause pipeline failure or deadlock when resource consumption balloons; we have seen cases where pessimal traffic increases CPU requirements 6‐fold. Detecting duplication is challenging when components from multiple teams evolve independently and require fault isolation. Skew management is hard due to dynamic workloads coupled with the conflicting goals of both minimizing latency and maximizing utilization. We propose Plumb, a framework to abstract stream processing as large‐block streaming (LBS) for a multi‐stage, multi‐user workflow. Plumb users express analytics as a DAG of processing modules, allowing Plumb to integrate and optimize workflows from multiple users. Many real‐world applications map to the LBS abstraction. Plumb detects and eliminatesAbstract: Operational services run 24×7 and require analytics pipelines to evaluate performance. In mature services such as domain name system (DNS), these pipelines often grow to many stages developed by multiple, loosely coupled teams. Such pipelines pose two problems: first, computation and data storage may be duplicated across components developed by different groups, wasting resources. Second, processing can be skewed, with structural skew occurring when different pipeline stages need different amounts of resources, and computational skew occurring when a block of input data requires increased resources. Duplication and structural skew both decrease efficiency, increasing cost, latency, or both. Computational skew can cause pipeline failure or deadlock when resource consumption balloons; we have seen cases where pessimal traffic increases CPU requirements 6‐fold. Detecting duplication is challenging when components from multiple teams evolve independently and require fault isolation. Skew management is hard due to dynamic workloads coupled with the conflicting goals of both minimizing latency and maximizing utilization. We propose Plumb, a framework to abstract stream processing as large‐block streaming (LBS) for a multi‐stage, multi‐user workflow. Plumb users express analytics as a DAG of processing modules, allowing Plumb to integrate and optimize workflows from multiple users. Many real‐world applications map to the LBS abstraction. Plumb detects and eliminates duplicate computation and storage, and it detects and addresses both structural and computational skew by tracking computation across the pipeline. We exercise Plumb using the analytics pipeline for B‐Root DNS. We compare Plumb to a hand‐tuned system, cutting latency to one‐third the original, and requiring 39 % fewer container hours, while supporting more flexible, multi‐user analytics and providing greater robustness to DDoS‐driven demands. … (more)
- Is Part Of:
- Software, practice & experience. Volume 51:Number 2(2021)
- Journal:
- Software, practice & experience
- Issue:
- Volume 51:Number 2(2021)
- Issue Display:
- Volume 51, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 51
- Issue:
- 2
- Issue Sort Value:
- 2021-0051-0002-0000
- Page Start:
- 385
- Page End:
- 408
- Publication Date:
- 2020-10-11
- Subjects:
- arbitrary operators -- binary UDF -- data and processing de‐duplication -- multi‐user pipelines -- unstructured data
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2909 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15391.xml