A data scientist's guide to acquiring, cleaning and managing data in R. ([2018])
- Record Type:
- Book
- Title:
- A data scientist's guide to acquiring, cleaning and managing data in R. ([2018])
- Main Title:
- A data scientist's guide to acquiring, cleaning and managing data in R
- Further Information:
- Note: By Samuel Buttrey, Lyn R. Whitaker.
- Authors:
- Buttrey, Samuel
Whitaker, Lyn R - Contents:
- About the Authors xv Preface xvii Acknowledgments xix About the CompanionWebsite xxi 1 R 1 1.1 Introduction 1 1.1.1 What Is R? 1 1.1.2 Who Uses R and Why? 2 1.1.3 Acquiring and Installing R 2 1.1.4 Starting and Quitting R 3 1.2 Data 3 1.2.1 Acquiring Data 3 1.2.2 Cleaning Data 4 1.2.3 The Goal of Data Cleaning 4 1.2.4 Making YourWork Reproducible 5 1.3 The Very Basics of R 5 1.3.1 Top Ten Quick Facts You Need to Know about R 5 1.3.2 Vocabulary 8 1.3.3 Calculating and Printing in R 11 1.4 Running an R Session 12 1.4.1 Where Your Data Is Stored 13 1.4.2 Options 13 1.4.3 Scripts 14 1.4.4 R Packages 14 1.4.5 RStudio and Other GUIs 15 1.4.6 Locales and Character Sets 15 1.5 Getting Help 16 1.5.1 At the Command Line 16 1.5.2 The Online Manuals 16 1.5.3 On the Internet 17 1.5.4 Further Reading 17 1.6 How to Use This Book 17 1.6.1 Syntax and Conventions inThis Book 17 1.6.2 The Chapters 18 2 RData, Part1:Vectors 21 2.1 Vectors 21 2.1.1 Creating Vectors 21 2.1.2 Sequences 22 2.1.3 Logical Vectors 23 2.1.4 Vector Operations 24 2.1.5 Names 27 2.2 Data Types 27 2.2.1 Some Less-Common Data Types 28 2.2.2 What Type of Vector IsThis? 28 2.2.3 Converting from One Type to Another 29 2.3 Subsets of Vectors 31 2.3.1 Extracting 31 2.3.2 Vectors of Length 0 34 2.3.3 Assigning or Replacing Elements of a Vector 35 2.4 Missing Data (NA) and Other Special Values 36 2.4.1 The Effect of NAs in Expressions 37 2.4.2 Identifying and Removing or Replacing NAs 37 2.4.3 Indexing with NAs 39 2.4.4 NaN andAbout the Authors xv Preface xvii Acknowledgments xix About the CompanionWebsite xxi 1 R 1 1.1 Introduction 1 1.1.1 What Is R? 1 1.1.2 Who Uses R and Why? 2 1.1.3 Acquiring and Installing R 2 1.1.4 Starting and Quitting R 3 1.2 Data 3 1.2.1 Acquiring Data 3 1.2.2 Cleaning Data 4 1.2.3 The Goal of Data Cleaning 4 1.2.4 Making YourWork Reproducible 5 1.3 The Very Basics of R 5 1.3.1 Top Ten Quick Facts You Need to Know about R 5 1.3.2 Vocabulary 8 1.3.3 Calculating and Printing in R 11 1.4 Running an R Session 12 1.4.1 Where Your Data Is Stored 13 1.4.2 Options 13 1.4.3 Scripts 14 1.4.4 R Packages 14 1.4.5 RStudio and Other GUIs 15 1.4.6 Locales and Character Sets 15 1.5 Getting Help 16 1.5.1 At the Command Line 16 1.5.2 The Online Manuals 16 1.5.3 On the Internet 17 1.5.4 Further Reading 17 1.6 How to Use This Book 17 1.6.1 Syntax and Conventions inThis Book 17 1.6.2 The Chapters 18 2 RData, Part1:Vectors 21 2.1 Vectors 21 2.1.1 Creating Vectors 21 2.1.2 Sequences 22 2.1.3 Logical Vectors 23 2.1.4 Vector Operations 24 2.1.5 Names 27 2.2 Data Types 27 2.2.1 Some Less-Common Data Types 28 2.2.2 What Type of Vector IsThis? 28 2.2.3 Converting from One Type to Another 29 2.3 Subsets of Vectors 31 2.3.1 Extracting 31 2.3.2 Vectors of Length 0 34 2.3.3 Assigning or Replacing Elements of a Vector 35 2.4 Missing Data (NA) and Other Special Values 36 2.4.1 The Effect of NAs in Expressions 37 2.4.2 Identifying and Removing or Replacing NAs 37 2.4.3 Indexing with NAs 39 2.4.4 NaN and Inf Values 40 2.4.5 NULL Values 40 2.5 The table() Function 40 2.5.1 Two- and Higher-Way Tables 42 2.5.2 Operating on Elements of a Table 42 2.6 Other Actions on Vectors 45 2.6.1 Rounding 45 2.6.2 Sorting and Ordering 45 2.6.3 Vectors as Sets 46 2.6.4 Identifying Duplicates and Matching 47 2.6.5 Finding Runs of Duplicate Values 49 2.7 Long Vectors and Big Data 50 2.8 Chapter Summary and Critical Data Handling Tools 50 3 R Data, Part 2:More Complicated Structures 53 3.1 Introduction 53 3.2 Matrices 53 3.2.1 Extracting and Assigning 54 3.2.2 Row and Column Names 56 3.2.3 Applying a Function to Rows or Columns 57 3.2.4 Missing Values in Matrices 59 3.2.5 Using a Matrix Subscript 60 3.2.6 Sparse Matrices 61 3.2.7 Three- and Higher-Way Arrays 62 3.3 Lists 62 3.3.1 Extracting and Assigning 64 3.3.2 Lists in Practice 65 3.4 Data Frames 67 3.4.1 Missing Values in Data Frames 69 3.4.2 Extracting and Assigning in Data Frames 69 3.4.3 ExtractingThings That Aren’tThere 72 3.5 Operating on Lists and Data Frames 74 3.5.1 Split, Apply, Combine 75 3.5.2 All-Numeric Data Frames 77 3.5.3 Convenience Functions 78 3.5.4 Re-Ordering, De-Duplicating, and Sampling from Data Frames 79 3.6 Date and Time Objects 80 3.6.1 Formatting Dates 80 3.6.2 Common Operations on Date Objects 82 3.6.3 Differences between Dates 83 3.6.4 Dates and Times 83 3.6.5 Creating POSIXt Objects 85 3.6.6 Mathematical Functions for Date and Times 86 3.6.7 Missing Values in Dates 88 3.6.8 Using Apply Functions with Dates and Times 89 3.7 Other Actions on Data Frames 90 3.7.1 Combining by Rows or Columns 90 3.7.2 Merging Data Frames 91 3.7.3 Comparing Two Data Frames 94 3.7.4 Viewing and Editing Data Frames Interactively 94 3.8 Handling Big Data 94 3.9 Chapter Summary and Critical Data Handling Tools 96 4 RData, Part 3: Text and Factors 99 4.1 Character Data 100 4.1.1 The length() and nchar() Functions 100 4.1.2 Tab, New-Line, Quote, and Backslash Characters 100 4.1.3 The Empty String 101 4.1.4 Substrings 102 4.1.5 Changing Case and Other Substitutions 103 4.2 Converting Numbers into Text 103 4.2.2 Scientific Notation 106 4.2.3 Discretizing a Numeric Variable 107 4.3 Constructing Character Strings: Paste in Action 109 4.3.1 Constructing Column Names 109 4.3.2 Tabulating Dates by Year and Month or Quarter Labels 111 4.3.3 Constructing Unique Keys 112 4.3.4 Constructing File and Path Names 112 4.4 Regular Expressions 112 4.4.1 Types of Regular Expressions 113 4.4.2 Tools for Regular Expressions in R 113 4.4.3 Special Characters in Regular Expressions 114 4.4.4 Examples 114 4.4.5 The regexpr() Function and Its Variants 121 4.4.6 Using Regular Expressions in Replacement 123 4.4.7 Splitting Strings at Regular Expressions 124 4.4.8 Regular Expressions versusWildcard Matching 125 4.4.9 Common Data Cleaning Tasks Using Regular Expressions 126 4.4.10 Documenting and Debugging Regular Expressions 127 4.5 UTF-8 and Other Non-ASCII Characters 128 4.5.1 Extended ASCII for Latin Alphabets 128 4.5.2 Non-Latin Alphabets 129 4.5.3 Character and String Encoding in R 130 4.6 Factors 131 4.6.1 What Is a Factor? 131 4.6.2 Factor Levels 132 4.6.3 Converting and Combining Factors 134 4.6.4 Missing Values in Factors 136 4.6.5 Factors in Data Frames 137 4.7 R Object Names and Commands as Text 137 4.7.1 R Object Names as Text 137 4.7.2 R Commands as Text 138 4.8 Chapter Summary and Critical Data Handling Tools 140 5 Writing Functions and Scripts 143 5.1 Functions 143 5.1.1 Function Arguments 144 5.1.2 Global versus Local Variables 148 5.1.3 Return Values 149 5.1.4 Creating and Editing Functions 151 5.2 Scripts and Shell Scripts 153 5.2.1 Line-by-Line Parsing 155 5.3 Error Handling and Debugging 156 5.3.1 Debugging Functions 156 5.3.2 Issuing Error andWarning Messages 158 5.3.3 Catching and Processing Errors 159 5.4 Interacting with the Operating System 161 5.4.1 File and Directory Handling 162 5.4.2 Environment Variables 162 5.5 SpeedingThings Up 163 5.5.1 Profiling 163 5.5.2 Vectorizing Functions 164 5.5.3 Other Techniques to Speed Things Up 165 5.6 Chapter Summary and Critical Data Handling Tools 167 5.6.1 Programming Style 168 5.6.2 Common Bugs 169 5.6.3 Objects, Classes, and Methods 170 6 Getting Data into and out of R 171 6.1 Reading Tabular ASCII Data into Data Frames 171 6.1.1 Files with Delimiters 172 6.1.2 Column Classes 173 6.1.3 Common Pitfalls in Reading Tables 175 6.1.4 An Example of When read.table() Fails 177 6.1.5 Other Uses of the scan() Function 181 6.1.6 Writing Delimited Files 182 6.1.7 Reading andWriting Fixed-Width Files 183 6.1.8 A Note on End-of-Line Characters 183 6.2 Reading Large, Non-Tabular, or Non-ASCII Data 184 6.2.1 Opening and Closing Files 184 6.2.2 Reading andWriting Lines 185 6.2.3 Reading andWriting UTF-8 and Other Encodings 187 6.2.4 The Null Character 187 6.2.5 Binary Data 188 6.2.6 Reading Problem Files in Action 190 6.3 Reading Da … (more)
- Publisher Details:
- Hoboken, NJ, USA : Wiley
- Publication Date:
- 2018
- Copyright Date:
- 2018
- Extent:
- 1 online resource
- Subjects:
- 005.74/3
Database design -- Computer programs
Data editing -- Computer programs
Input design, Computer
Electronic data processing -- Data preparation
Database management
R (Computer program language)
COMPUTERS / Databases / General
Electronic books - Languages:
- English
- ISBNs:
- 9781119080060
1119080061 - Related ISBNs:
- 9781119080077
111908007X
9781119080022
1119080029 - Notes:
- Note: Online resource; title from PDF title page (EBSCO, viewed October 31, 2017)
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.209454
- Ingest File:
- 01_143.xml