A scalable dynamic programming scheme for the computation of optimal k-segments for ordered data. (October 2017)
- Record Type:
- Journal Article
- Title:
- A scalable dynamic programming scheme for the computation of optimal k-segments for ordered data. (October 2017)
- Main Title:
- A scalable dynamic programming scheme for the computation of optimal k-segments for ordered data
- Authors:
- Mahlknecht, Giovanni
Dignös, Anton
Gamper, Johann - Abstract:
- Abstract: The optimal k -segments of an ordered dataset of size n consists of k tuples that are obtained by merging consecutive tuples such that a given error metric is minimized. The problem is general and has been studied in various flavors, e.g., piecewise-constant approximation, parsimonious temporal aggregation, and v-optimal histograms. A well-known computation scheme for the optimal k -segments is based on dynamic programming, which computes a k × n error matrixE and a corresponding split point matrixJ of the same size. This yields O ( n · k ) space and O ( n 2 · k ) runtime complexity. In this paper, we propose three optimization techniques for the runtime complexity and one for the space complexity. First, diagonal pruning identifies regions of the error matrixE that need not to be computed since they cannot lead to a valid solution. Second, for those cells inE that are computed, we provide a heuristic to determine a better seed value, which in turn leads to a tighter lower bound for the potential split points to be considered for the calculation of the minimal error. Third, we show how the algorithm can be effectively parallelized . The space complexity is dominated by the split point matrixJ, which needs to be kept till the end. To tackle this problem, we replace the split point matrix by a dynamic split point graph, which eliminates entries that are not needed to retrieve the optimal solution. A detailed experimental evaluation shows the effectiveness of theAbstract: The optimal k -segments of an ordered dataset of size n consists of k tuples that are obtained by merging consecutive tuples such that a given error metric is minimized. The problem is general and has been studied in various flavors, e.g., piecewise-constant approximation, parsimonious temporal aggregation, and v-optimal histograms. A well-known computation scheme for the optimal k -segments is based on dynamic programming, which computes a k × n error matrixE and a corresponding split point matrixJ of the same size. This yields O ( n · k ) space and O ( n 2 · k ) runtime complexity. In this paper, we propose three optimization techniques for the runtime complexity and one for the space complexity. First, diagonal pruning identifies regions of the error matrixE that need not to be computed since they cannot lead to a valid solution. Second, for those cells inE that are computed, we provide a heuristic to determine a better seed value, which in turn leads to a tighter lower bound for the potential split points to be considered for the calculation of the minimal error. Third, we show how the algorithm can be effectively parallelized . The space complexity is dominated by the split point matrixJ, which needs to be kept till the end. To tackle this problem, we replace the split point matrix by a dynamic split point graph, which eliminates entries that are not needed to retrieve the optimal solution. A detailed experimental evaluation shows the effectiveness of the proposed solutions. Our optimization techniques significantly improve the runtime of state-of-the-art matrix implementations, and they guarantee a comparable performance of an implementation that uses the split point graph. The split point graph reduces the memory consumption up to two orders of magnitude and allows us to process large datasets for which the memory explodes if the matrix is used. Abstract : Highlights: Dynamic programming algorithm for optimal k-segments with improved runtime. Split-point graph as replacement for the dynamic programming matrix with small memory footprint. Parallelization of the dynamic programming algorithm. … (more)
- Is Part Of:
- Information systems. Volume 70(2017)
- Journal:
- Information systems
- Issue:
- Volume 70(2017)
- Issue Display:
- Volume 70, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 70
- Issue:
- 2017
- Issue Sort Value:
- 2017-0070-2017-0000
- Page Start:
- 2
- Page End:
- 17
- Publication Date:
- 2017-10
- Subjects:
- Optimal k-segments -- Dynamic programming -- Parsimonious temporal aggregation -- Piecewise constant approximation -- v-optimal histograms -- Data approximation
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2016.08.002 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7042.xml