A Demonstration of Regression False Positive Selection in Data Mining. (July 2014)
- Record Type:
- Journal Article
- Title:
- A Demonstration of Regression False Positive Selection in Data Mining. (July 2014)
- Main Title:
- A Demonstration of Regression False Positive Selection in Data Mining
- Authors:
- Pinder, Jonathan P.
- Abstract:
- <abstract abstract-type="main"> <title>ABSTRACT</title> <p>Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection (<italic>type I errors</italic>) is nearly certain to occur. This article describes an in‐class demonstration that shows the frequency and impact of false positives on data mining regression‐based predictive modeling. In this demonstration, 500 randomly generated independent (<italic>X</italic>) variables are individually regressed against a single, randomly generated (<italic>Y</italic>) variable, and the resulting 500 <italic>p</italic>‐values are sorted and examined. This experiment is repeated and the distribution of the number of variables significant at the 5% level resulting from this simulation is presented and discussed. The demonstration provides a tangible example in which students see the reality and risks of incorrectly inferring statistical significance of independent regression variables. Students have expressed a deeper understanding and appreciation of the risks of type I errors through this demonstration. This demonstration is innovative because the scale of the simulation allows the students to experience the near certainty that the correlations shown in the results are truly random.</p><abstract abstract-type="main"> <title>ABSTRACT</title> <p>Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection (<italic>type I errors</italic>) is nearly certain to occur. This article describes an in‐class demonstration that shows the frequency and impact of false positives on data mining regression‐based predictive modeling. In this demonstration, 500 randomly generated independent (<italic>X</italic>) variables are individually regressed against a single, randomly generated (<italic>Y</italic>) variable, and the resulting 500 <italic>p</italic>‐values are sorted and examined. This experiment is repeated and the distribution of the number of variables significant at the 5% level resulting from this simulation is presented and discussed. The demonstration provides a tangible example in which students see the reality and risks of incorrectly inferring statistical significance of independent regression variables. Students have expressed a deeper understanding and appreciation of the risks of type I errors through this demonstration. This demonstration is innovative because the scale of the simulation allows the students to experience the near certainty that the correlations shown in the results are truly random.</p> </abstract> … (more)
- Is Part Of:
- Decision sciences journal of innovative education. Volume 12:Number 3(2014:Jul.)
- Journal:
- Decision sciences journal of innovative education
- Issue:
- Volume 12:Number 3(2014:Jul.)
- Issue Display:
- Volume 12, Issue 3 (2014)
- Year:
- 2014
- Volume:
- 12
- Issue:
- 3
- Issue Sort Value:
- 2014-0012-0003-0000
- Page Start:
- 199
- Page End:
- 217
- Publication Date:
- 2014-07
- Subjects:
- Business education -- Periodicals
658.00711 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1111/dsji.12037 ↗
- Languages:
- English
- ISSNs:
- 1540-4595
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3537.150500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 3669.xml