Data mining for business analytics : concepts, techniques, and applications in R /: concepts, techniques, and applications in R. (2017)
- Record Type:
- Book
- Title:
- Data mining for business analytics : concepts, techniques, and applications in R /: concepts, techniques, and applications in R. (2017)
- Main Title:
- Data mining for business analytics : concepts, techniques, and applications in R
- Further Information:
- Note: Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl.
- Authors:
- Shmueli, Galit, 1971-
Bruce, Peter C, 1953-
Yahav, Inbal
Lichtendahl, Kenneth C, 1969-
Patel, Nitin R (Nitin Ratilal) - Contents:
- Contents Foreword by Gareth James xix Foreword by Ravi Bapna xxi Preface to the R Edition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data Science 7 1.6 Why Are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Order of Topics 11 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 Classification 16 Prediction 16 Association Rules and Recommendation Systems 16 Predictive Analytics 17 Data Reduction and Dimension Reduction 17 Data Exploration and Visualization 17 Supervised and Unsupervised Learning 18 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 21 Organization of Datasets 21 Predicting Home Values in the West Roxbury Neighborhood 21 Loading and Looking at the Data in R 22 Sampling from a Database 24 Oversampling Rare Events in Classification Tasks 25 Preprocessing and Cleaning the Data 26 2.5 Predictive Power and Overfitting 33 Overfitting 33 Creation and Use of Data Partitions 35 2.6 Building a Predictive Model 38 Modeling Process 39 2.7 Using R for Data Mining on a Local Machine 43 2.8 Automating Data Mining Solutions 43 Data Mining Software: The State of the Market (by Herb Edelstein) 45 Problems 49 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 55 3.1 Uses of Data Visualization 55 Base R orContents Foreword by Gareth James xix Foreword by Ravi Bapna xxi Preface to the R Edition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data Science 7 1.6 Why Are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Order of Topics 11 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 Classification 16 Prediction 16 Association Rules and Recommendation Systems 16 Predictive Analytics 17 Data Reduction and Dimension Reduction 17 Data Exploration and Visualization 17 Supervised and Unsupervised Learning 18 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 21 Organization of Datasets 21 Predicting Home Values in the West Roxbury Neighborhood 21 Loading and Looking at the Data in R 22 Sampling from a Database 24 Oversampling Rare Events in Classification Tasks 25 Preprocessing and Cleaning the Data 26 2.5 Predictive Power and Overfitting 33 Overfitting 33 Creation and Use of Data Partitions 35 2.6 Building a Predictive Model 38 Modeling Process 39 2.7 Using R for Data Mining on a Local Machine 43 2.8 Automating Data Mining Solutions 43 Data Mining Software: The State of the Market (by Herb Edelstein) 45 Problems 49 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 55 3.1 Uses of Data Visualization 55 Base R or ggplot? 57 3.2 Data Examples 57 Example 1: Boston Housing Data 57 Example 2: Ridership on Amtrak Trains 59 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 59 Distribution Plots: Boxplots and Histograms 61 Heatmaps: Visualizing Correlations and Missing Values 64 3.4 Multidimensional Visualization 67 Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 67 Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 70 Reference: Trend Lines and Labels 74 Scaling up to Large Datasets 74 Multivariate Plot: Parallel Coordinates Plot 75 Interactive Visualization 77 3.5 Specialized Visualizations 80 Visualizing Networked Data 80 Visualizing Hierarchical Data: Treemaps 82 Visualizing Geographical Data: Map Charts 83 3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 86 Prediction 86 Classification 86 Time Series Forecasting 86 Unsupervised Learning 87 Problems 88 CHAPTER 4 Dimension Reduction 91 4.1 Introduction 91 4.2 Curse of Dimensionality 92 4.3 Practical Considerations 92 Example 1: House Prices in Boston 93 4.4 Data Summaries 94 Summary Statistics 94 Aggregation and Pivot Tables 96 4.5 Correlation Analysis 97 4.6 Reducing the Number of Categories in Categorical Variables 99 4.7 Converting a Categorical Variable to a Numerical Variable 99 4.8 Principal Components Analysis 101 Example 2: Breakfast Cereals 101 Principal Components 106 Normalizing the Data 107 Using Principal Components for Classification and Prediction 109 4.9 Dimension Reduction Using Regression Models 111 4.10 Dimension Reduction Using Classification and Regression Trees 111 Problems 112 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 117 5.1 Introduction 117 5.2 Evaluating Predictive Performance 118 Naive Benchmark: The Average 118 Prediction Accuracy Measures 119 Comparing Training and Validation Performance 121 Lift Chart 121 5.3 Judging Classifier Performance 122 Benchmark: The Naive Rule 124 Class Separation 124 The Confusion (Classification) Matrix 124 Using the Validation Data 126 Accuracy Measures 126 Propensities and Cutoff for Classification 127 Performance in Case of Unequal Importance of Classes 131 Asymmetric Misclassification Costs 133 Generalization to More Than Two Classes 135 5.4 Judging Ranking Performance 136 Lift Charts for Binary Data 136 Decile Lift Charts 138 Beyond Two Classes 139 Lift Charts Incorporating Costs and Benefits 139 Lift as a Function of Cutoff 140 5.5 Oversampling 140 Oversampling the Training Set 144 Evaluating Model Performance Using a Non-oversampled Validation Set 144 Evaluating Model Performance if Only Oversampled Validation Set Exists 144 Problems 147 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 153 6.1 Introduction 153 6.2 Explanatory vs. Predictive Modeling 154 6.3 Estimating the Regression Equation and Prediction 156 Example: Predicting the Price of Used Toyota Corolla Cars 156 6.4 Variable Selection in Linear Regression 161 Reducing the Number of Predictors 161 How to Reduce the Number of Predictors 162 Problems 169 CHAPTER 7 k-Nearest Neighbors (kNN) 173 7.1 The k-NN Classifier (Categorical Outcome) 173 Determining Neighbors 173 Classification Rule 174 Example: Riding Mowers 175 Choosing k 176 Setting the Cutoff Value 179 k-NN with More Than Two Classes 180 Converting Categorical Variables to Binary Dummies 180 7.2 k-NN for a Numerical Outcome 180 7.3 Advantages and Shortcomings of k-NN Algorithms 182 Problems 184 CHAPTER 8 The Naive Bayes Classifier 187 8.1 Introduction 187 Cutoff Probability Method 188 Conditional Probability 188 Example 1: Predicting Fraudulent Financial Reporting 188 8.2 Applying the Full (Exact) Bayesian Classifier 189 Using the “Assign to the Most Probable Class” Method 190 Using the Cutoff Probability Method 190 Practical Difficulty with the Complete (Exact) Bayes Procedure 190 Solution: Naive Bayes 191 The Naive Bayes Assumption of Conditional Independence 192 Using the Cutoff Probability Method 192 Example 2: Predicting Fraudulent Financial Reports, Two Predictors 193 Example 3: Predicting Delayed Flights 194 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 199 Problems 202 CHAPTER 9 Classification and Regression Trees 205 9.1 Introduction 205 9.2 Classification Trees 207 Recursive Partitioning 207 Example 1: Riding Mowers 207 Measures of Impurity 210 Tree Structure 214 Classifying a New Record 214 9.3 Evaluating the Performance of a Classification Tree 215 Example 2: Acceptance of Personal Loan 215 9.4 Avoiding Overfitting 216 Stopping Tree Growth: Conditional Inference Trees 221 Pruning the Tree 222 Cross-Validation 222 Best-Pruned Tree 224 9.5 Classification Rules from Trees 226 9.6 Classification Trees for More Than Two Classes 227 9.7 Regression Trees 227 Prediction 228 Measuring Impurity 228 Evaluating Performance 229 9.8 Improving Prediction: Ra … (more)
- Edition:
- 1st
- Publisher Details:
- Hoboken, New Jersey : John Wiley & Sons, Inc
- Publication Date:
- 2017
- Extent:
- 1 online resource
- Subjects:
- 658.05
Business -- Data processing
Data mining
R (Computer program language)
Business mathematics -- Computer programs - Languages:
- English
- ISBNs:
- 9781118879337
9781118956632 - Related ISBNs:
- 9781118879368
- Notes:
- Note: Description based on CIP data; resource not viewed.
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.194581
- Ingest File:
- 02_231.xml