A stratified reservoir sampling algorithm in streams and large datasets. Issue 4 (3rd April 2022)
- Record Type:
- Journal Article
- Title:
- A stratified reservoir sampling algorithm in streams and large datasets. Issue 4 (3rd April 2022)
- Main Title:
- A stratified reservoir sampling algorithm in streams and large datasets
- Authors:
- Collins, David
Lu, Yan - Abstract:
- Abstract: In data stream mining, a stream is a dataset of unknown size with continuously incoming elements, which is typically large enough so that a computer processing it does not have enough memory to hold it in its entirety and each element can be read only once and only in order. Classical sampling methods such as simple random sampling (SRS), stratified sampling and cluster sampling cannot be used on the stream data since the entire set is not available all at once and data cannot be reread. Vitter's (1985 ) Algorithm R is a reservoir sampling method which can be used to select an SRS from a data stream. In this article, we propose Algorithm SR which extends Algorithm R to a stratified reservoir sampling method with optimal allocation. We prove that the proposed method is asymptotically equivalent to classical stratified random sampling with optimal allocation. Implementation results show that the proposed method is efficient and can outperform Algorithm R.
- Is Part Of:
- Communications in statistics. Volume 51:Issue 4(2022)
- Journal:
- Communications in statistics
- Issue:
- Volume 51:Issue 4(2022)
- Issue Display:
- Volume 51, Issue 4 (2022)
- Year:
- 2022
- Volume:
- 51
- Issue:
- 4
- Issue Sort Value:
- 2022-0051-0004-0000
- Page Start:
- 1767
- Page End:
- 1782
- Publication Date:
- 2022-04-03
- Subjects:
- Data stream mining -- Implementation -- Reservoir sampling -- Simulations -- Stratified random sampling -- Stratified reservoir sampling
Mathematical statistics -- Periodicals
Mathematical statistics -- Data processing -- Periodicals
Digital computer simulation -- Periodicals
519.5 - Journal URLs:
- http://www.tandfonline.com/toc/lssp20/current ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/03610918.2019.1682159 ↗
- Languages:
- English
- ISSNs:
- 0361-0918
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3363.431000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21192.xml