Data Analysis with R - Second Edition
图书信息
| 作者 | Tony Fischetti |
| 出版社 | Packt Publishing |
| ISBN | 9781788397339 |
| 出版时间 | 2018-03-28 |
| 字数 | 70.9万 |
| 分类 | 进口书,外文原版书,电脑,网络 |
读书简介
Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use. About This Book ? Analyze your data using R – the most powerful statistical programming language ? Learn how to implement applied statistics using practical use-cases ? Use popular R packages to work with unstructured and structured data Who This Book Is For Budding data scientists and data analysts who are new to the concept of data analysis, or who want to build efficient analytical models in R will find this book to be useful. No prior exposure to data analysis is needed, although a fundamental understanding of the R programming language is required to get the best out of this book. What You Will Learn ? Gain a thorough understanding of statistical reasoning and sampling theory ? Employ hypothesis testing to draw inferences from your data ? Learn Bayesian methods for estimating parameters ? Train regression, classification, and time series models ? Handle missing data gracefully using multiple imputation ? Identify and manage problematic data points ? Learn how to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization ? Put best practices into effect to make your job easier and facilitate reproducibility In Detail Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst. Style and approach An easy-to-follow step by step guide which will help you get to grips with real world application of Data Analysis with R
目录
Title Page
Copyright and Credits
Data Analysis with R Second Edition
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
RefresheR
Navigating the basics
Arithmetic and assignment
Logicals and characters
Flow of control
Getting help in R
Vectors
Subsetting
Vectorized functions
Advanced subsetting
Recycling
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Covariance
Correlation coefficients
Comparing multiple correlations
Visualization methods
Categorical and continuous variables
Two categorical variables
Two continuous variables
More than two continuous variables
Exercises
Summary
Probability
Basic probability
A tale of two interpretations
Sampling from distributions
Parameters
The binomial distribution
The normal distribution
The three-sigma rule and using z-tables
Exercises
Summary
Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
How did we get 1.96?
Smaller samples
Exercises
Summary
Testing Hypotheses
The null hypothesis significance testing framework
One and two-tailed tests
Errors in NHST
A warning about significance
A warning about p-values
Testing the mean of one sample
Assumptions of the one sample t-test
Testing two means
Assumptions of the independent samples t-test
Testing more than two means
Assumptions of ANOVA
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
What have we left out?
Exercises
Summary
Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
A word of warning
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Cross-validation
Striking a balance
Linear regression diagnostics
Second Anscombe relationship
Third Anscombe relationship
Fourth Anscombe relationship
Advanced topics
Exercises
Summary
Predicting Categorical Variables
k-Nearest neighbors
Using k-NN in R
Confusion matrices
Limitations of k-NN
Logistic regression
Generalized Linear Model (GLM)
Using logistic regression in R
Decision trees
Random forests
Choosing a classifier
The vertical decision boundary
The diagonal decision boundary
The crescent decision boundary
The circular decision boundary
Exercises
Summary
Predicting Changes with Time
What is a time series?
What is forecasting?
Uncertainty
Difficulties in forecasting
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
Simple exponential smoothing for forecasting
Accuracy assessment
Double exponential smoothing
Triple exponential smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Exercises
Summary
Sources of Data
Relational databases
Why didn't we just do that in SQL?
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
So which one is it?
Unsophisticated methods for dealing with missing data
Complete case analysis
Pairwise deletion
Mean substitution
Hot deck imputation
Regression imputation
Stochastic regression imputation
Multiple imputation
So how does mice come up with the imputed values?
Methods of imputation
Multiple imputation in practice
Exercises
Summary
Dealing with Messy Data
Checking unsanitized data
Checking for out-of-bounds data
Checking the data type of a column
Checking for unexpected categories
Checking for outliers, entry errors, or unlikely data points
Chaining assertions
Regular expressions
What are regular expressions?
Getting started
Regex for data normalization
More normalization
Other tools for messy data
OpenRefine
Fuzzy matching
Exercises
Summary
Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Allocation of memory
Vectorization
Using optimized packages
Using another R implementation
Using parallelization
Getting started with parallel R
An example of (some) substance
Using Rcpp
Being smarter about your code
Exercises
Summary
Working with Popular R Packages
The data.table package
The i in DT [i, j, by]
What in the world are by reference semantics?
The j in DT[i, j, by]
Using both i and j
Using the by argument for grouping
Joining data tables
Reshaping, melting, and pivoting data
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Loading data for use in dplyr
Manipulating rows
Selecting and renaming columns
Computing on columns
Grouping in dplyr
Joining data
Reshaping data with tidyr
Exercises
Summary
Reproducibility and Best Practices
R scripting
RStudio
Running R scripts
An example script
Scripting and reproducibility
R projects
Version control
Package version management
Communicating results
Exercises
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
- 2019年全国导游人员资格考试辅导教材-全国导游基础知识(圣才电子书)
- Dup? ce te-am pierdut(Jojo Moyes)
- 爱情下一秒(沈星妤)
- 热处理工程师理论基础(刘宗昌)
- 图说天下学生版 超级兵器传奇 世界王牌武器陆海空大阅兵(套装共3册)(试读本)(薛金冉 编著)
- 全国名校二外英语考研真题详解(圣才电子书)
- Gone With the Windsors(Laurie Graham)
- 家庭营养套餐(《健康餐桌》编委会编)
