Python Data Science Essentials
图书信息
| 作者 | Alberto Boschetti |
| 出版社 | Packt Publishing |
| ISBN | 9781785287893 |
| 出版时间 | 2015-04-30 |
| 字数 | 184.8万 |
| 分类 | 进口书,外文原版书,电脑,网络 |
读书简介
If you are an aspiring data scientist and you have at least a working knowledge of data analysis and Python, this book will get you started in data science. Data analysts with experience of R or MATLAB will also find the book to be a comprehensive reference to enhance their data manipulation and machine learning skills.
目录
Python Data Science Essentials
Table of Contents
Python Data Science Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. First Steps
Introducing data science and Python
Installing Python
Python 2 or Python 3?
Step-by-step installation
A glance at the essential Python packages
NumPy
SciPy
pandas
Scikit-learn
IPython
Matplotlib
Statsmodels
Beautiful Soup
NetworkX
NLTK
Gensim
PyPy
The installation of packages
Package upgrades
Scientific distributions
Anaconda
Enthought Canopy
PythonXY
WinPython
Introducing IPython
The IPython Notebook
Datasets and code used in the book
Scikit-learn toy datasets
The MLdata.org public repository
LIBSVM data examples
Loading data directly from CSV or text files
Scikit-learn sample generators
Summary
2. Data Munging
The data science process
Data loading and preprocessing with pandas
Fast and easy data loading
Dealing with problematic data
Dealing with big datasets
Accessing other data formats
Data preprocessing
Data selection
Working with categorical and textual data
A special type of data – text
Data processing with NumPy
NumPy's n-dimensional array
The basics of NumPy ndarray objects
Creating NumPy arrays
From lists to unidimensional arrays
Controlling the memory size
Heterogeneous lists
From lists to multidimensional arrays
Resizing arrays
Arrays derived from NumPy functions
Getting an array directly from a file
Extracting data from pandas
NumPy fast operation and computations
Matrix operations
Slicing and indexing with NumPy arrays
Stacking NumPy arrays
Summary
3. The Data Science Pipeline
Introducing EDA
Feature creation
Dimensionality reduction
The covariance matrix
Principal Component Analysis (PCA)
A variation of PCA for big data – RandomizedPCA
Latent Factor Analysis (LFA)
Linear Discriminant Analysis (LDA)
Latent Semantical Analysis (LSA)
Independent Component Analysis (ICA)
Kernel PCA
Restricted Boltzmann Machine (RBM)
The detection and treatment of outliers
Univariate outlier detection
EllipticEnvelope
OneClassSVM
Scoring functions
Multilabel classification
Binary classification
Regression
Testing and validating
Cross-validation
Using cross-validation iterators
Sampling and bootstrapping
Hyper-parameters' optimization
Building custom scoring functions
Reducing the grid search runtime
Feature selection
Univariate selection
Recursive elimination
Stability and L1-based selection
Summary
4. Machine Learning
Linear and logistic regression
Naive Bayes
The k-Nearest Neighbors
Advanced nonlinear algorithms
SVM for classification
SVM for regression
Tuning SVM
Ensemble strategies
Pasting by random samples
Bagging with weak ensembles
Random Subspaces and Random Patches
Sequences of models – AdaBoost
Gradient tree boosting (GTB)
Dealing with big data
Creating some big datasets as examples
Scalability with volume
Keeping up with velocity
Dealing with variety
A quick overview of Stochastic Gradient Descent (SGD)
A peek into Natural Language Processing (NLP)
Word tokenization
Stemming
Word Tagging
Named Entity Recognition (NER)
Stopwords
A complete data science example – text classification
An overview of unsupervised learning
Summary
5. Social Network Analysis
Introduction to graph theory
Graph algorithms
Graph loading, dumping, and sampling
Summary
6. Visualization
Introducing the basics of matplotlib
Curve plotting
Using panels
Scatterplots
Histograms
Bar graphs
Image visualization
Selected graphical examples with pandas
Boxplots and histograms
Scatterplots
Parallel coordinates
Advanced data learning representation
Learning curves
Validation curves
Feature importance
GBT partial dependence plot
Summary
Index
- 世界500强企业精细化管理工具系列--物业管理实用流程·制度·表格·文本(邵小云)
- 中华学生百科全书——控制论与自动化(读书堂)
- 永无止尽的狂热:三岛由纪夫(杨照)
- 151 Provérbios de Shakespeare(Willian Castro)
- 简单易学的基金投资(杨天南,孙振曦,贾泽亮 等)
- 完美应用Ubuntu(第4版)(何晓龙)
- 间苗(何金银)
- 文治帝国:大宋300年的世运与人物【畅销书《一看就停不下来的中国史》作者重磅新书!】(艾公子)
