Introduction to data science : data analysis and prediction algorithms with R /: data analysis and prediction algorithms with R. (2019)
- Record Type:
- Book
- Title:
- Introduction to data science : data analysis and prediction algorithms with R /: data analysis and prediction algorithms with R. (2019)
- Main Title:
- Introduction to data science : data analysis and prediction algorithms with R
- Further Information:
- Note: Rafael A. Irizarry.
- Authors:
- Irizarry, Rafael A
- Contents:
- I R 20 1. Installing R and RStudio Installing R Installing RStudio 2. Getting Started with R and RStudio Why R? The R console Scripts RStudio The panes Key bindings Running commands while editing scripts Changing global options Installing R packages 3. R Basics Case study: US Gun Murders The very basics Objects The workspace Functions Other prebuilt objects Variable names Saving your workspace Motivating scripts Commenting your code Exercises Data types Data frames Examining an object The accessor: $ Vectors: numerics, characters, and logical Factors Lists Matrices Exercises Vectors Creating vectors Names Sequences Subsetting Coercion Not availables (NA) Exercises Sorting sort order max and which.max rank Beware of recycling Exercise Vector arithmetics Rescaling a vector Two vectors Exercises Indexing Subsetting with logicals Logical operators which match %in% Exercises Basic plots plot hist boxplot image Exercises 4. Programming basics Conditional expressions Defining functions Namespaces For-loops Vectorization and functionals Exercises 5. The tidyverse 84 Tidy data Exercises Manipulating data frames Adding a column with mutate Subsetting with filter Selecting columns with select Exercises The pipe: %>% Exercises Summarizing data summarize pull Group then summarize with group by Sorting data frames Nested sorting The top n Exercises Tibbles Tibbles display better Subsets of tibbles are tibbles Tibbles can have complex entries Tibbles can be grouped Create a tibble usingI R 20 1. Installing R and RStudio Installing R Installing RStudio 2. Getting Started with R and RStudio Why R? The R console Scripts RStudio The panes Key bindings Running commands while editing scripts Changing global options Installing R packages 3. R Basics Case study: US Gun Murders The very basics Objects The workspace Functions Other prebuilt objects Variable names Saving your workspace Motivating scripts Commenting your code Exercises Data types Data frames Examining an object The accessor: $ Vectors: numerics, characters, and logical Factors Lists Matrices Exercises Vectors Creating vectors Names Sequences Subsetting Coercion Not availables (NA) Exercises Sorting sort order max and which.max rank Beware of recycling Exercise Vector arithmetics Rescaling a vector Two vectors Exercises Indexing Subsetting with logicals Logical operators which match %in% Exercises Basic plots plot hist boxplot image Exercises 4. Programming basics Conditional expressions Defining functions Namespaces For-loops Vectorization and functionals Exercises 5. The tidyverse 84 Tidy data Exercises Manipulating data frames Adding a column with mutate Subsetting with filter Selecting columns with select Exercises The pipe: %>% Exercises Summarizing data summarize pull Group then summarize with group by Sorting data frames Nested sorting The top n Exercises Tibbles Tibbles display better Subsets of tibbles are tibbles Tibbles can have complex entries Tibbles can be grouped Create a tibble using tibble instead of data frame The dot operator do The purrr package Tidyverse conditionals Case when between Exercises 6. Importing data 105 Paths and the working directory The filesystem Relative and full paths The working directory Generating path names Copying files using paths The readr and readxl packages readr readxl Exercises Downloading files R-base importing functions scan Text versus binary files Unicode versus ASCII Organizing Data with Spreadsheets Exercises II Data Visualization 7. Introduction to data visualization 8. ggplot2 The components of a graph ggplot objects Geometries Aesthetic mappings Layers Tinkering with arguments Global versus local aesthetic mappings Scales Labels and titles Categories as colors Annotation, shapes, and adjustments Add-on packages Putting it all together Quick plots with qplot Grids of plots Exercises 9. Visualizing data distributions Variable types Case study: describing student heights Distribution function Cumulative distribution functions Histograms Smoothed density Interpreting the y-axis Densities permit stratification Exercises The normal distribution Standard units Quantile-quantile plots Percentiles Boxplots Stratification Case study: describing student heights (continued) Exercises ggplot2 geometries Barplots Histograms Density plots Boxplots QQ-plots Images Quick plots Exercises 10. Data visualization in practice Case study: new insights on poverty Hans Rosling’s quiz Scatterplots Faceting facet_wrap Fixed scales for better comparisons Time series plots Labels instead of legends Data transformations Log transformation Which base? Transform the values or the scale? Visualizing multimodal distributions Comparing multiple distributions with boxplots and ridge plots Boxplots Ridge plots Example: 1970 versus 2010 income distributions Accessing computed variables Weighted densities The ecological fallacy and importance of showing the data Logistic transformation Show the data 11. Data visualization principles Encoding data using visual cues Know when to include Do not distort quantities Order categories by a meaningful value Show the data Ease comparisons Use common axes Align plots vertically to see horizontal changes and horizontally to see vertical changes Consider transformations Visual cues to be compared should be adjacent Use color Think of the color blind Plots for two variables Slope charts Bland-Altman plot Encoding a third variable Avoid pseudo-three-dimensional plots Avoid too many significant digits Know your audience Exercises Case study: impact of vaccines on battling infectious diseases Exercises 12. Robust summaries Outliers Median The inter quartile range (IQR) Tukey’s definition of an outlier Median absolute deviation Exercises Case study: self-reported student heights III Statistics with R 13. Introduction to Statistics with R 14. Probability Discrete probability Relative frequency Notation Probability distributions Monte Carlo simulations for categorical data Setting the random seed With and without replacement Independence 14.4 Conditional probabilities Addition and multiplication rules Multiplication rule Multiplication rule under independence Addition rule Combinations and permutations Monte Carlo example Examples Monty Hall problem Birthday problem … (more)
- Edition:
- 1st
- Publisher Details:
- Boca Raton : Chapman & Hall/CRC
- Publication Date:
- 2019
- Extent:
- 1 online resource
- Subjects:
- 005.362
R (Computer program language)
Information visualization
Data mining
Statistics -- Data processing
Probabilities -- Data processing
Computer algorithms
Quantitative research - Languages:
- English
- ISBNs:
- 9781000708035
9781000707731
9781000707885
9780429341830 - Related ISBNs:
- 9780367357986
- Notes:
- Note: Description based on CIP data; resource not viewed.
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.472314
- Ingest File:
- 02_620.xml