The top‐K tau‐path screen for monotone association in subpopulations. (30th June 2016)
- Record Type:
- Journal Article
- Title:
- The top‐K tau‐path screen for monotone association in subpopulations. (30th June 2016)
- Main Title:
- The top‐K tau‐path screen for monotone association in subpopulations
- Authors:
- Sampath, Srinath
Caloiaro, Adriano
Johnson, Wayne
Verducci, Joseph S. - Abstract:
- Abstract : A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. ( Stat Methodol 2011, 8:97–111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a 'top‐ K ' feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458–471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top‐ K tau‐path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalogAbstract : A pair of variables that tend to rise and fall either together or in opposition are said to be monotonically associated. For certain phenomena, this tendency is causally restricted to a subpopulation, as, e.g., the severity of an allergic reaction trending with the concentration of an air pollutant. Previously, Yu et al. ( Stat Methodol 2011, 8:97–111) devised a method of rearranging observations to test paired data to see if such an association might be present in a subpopulation. However, the computational intensity of the method limited its application to relatively small samples of data, and the test itself only judges if association is present in some subpopulation; it does not clearly identify the subsample that came from this subpopulation, especially when the whole sample tests positive. The present study adds a 'top‐ K ' feature (Sampath S, Verducci JS. Stat Anal Data Min 2013, 6:458–471) based on a multistage ranking model, that identifies a concise subsample that is likely to contain a high proportion of observations from the subpopulation in which the association is supported. Computational improvements incorporated into this top‐ K tau‐path algorithm now allow the method to be extended to thousands of pairs of variables measured on sample sizes in the thousands. A description of the new algorithm along with measures of computational complexity and practical efficiency help to gauge its potential use in different settings. Simulation studies catalog its accuracy in various settings, and an example from finance illustrates its step‐by‐step use. WIREs Comput Stat 2016, 8:206–218. doi: 10.1002/wics.1382 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical and Graphical Methods of Data Analysis > Nonparametric Methods Abstract : … (more)
- Is Part Of:
- Wiley interdisciplinary reviews. Volume 8:Number 5(2016)
- Journal:
- Wiley interdisciplinary reviews
- Issue:
- Volume 8:Number 5(2016)
- Issue Display:
- Volume 8, Issue 5 (2016)
- Year:
- 2016
- Volume:
- 8
- Issue:
- 5
- Issue Sort Value:
- 2016-0008-0005-0000
- Page Start:
- 206
- Page End:
- 218
- Publication Date:
- 2016-06-30
- Subjects:
- algorithmic complexity -- big data, mixtures of copulas -- nonparametric correlation -- ranking models -- unsupervised classification
Mathematical statistics -- Data processing -- Periodicals
Science -- Data processing -- Periodicals
Social sciences -- Data processing -- Periodicals
Mathematical statistics -- Periodicals
519.50285 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1939-0068 ↗
http://www3.interscience.wiley.com/journal/122458798/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/wics.1382 ↗
- Languages:
- English
- ISSNs:
- 1939-5108
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8965.xml