Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010207100 | QD75.4.C45 V37 2009 | Open Access Book | Book | Searching... |
Searching... | 30000010231388 | QD75.4.C45 V37 2009 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Using formal descriptions, graphical illustrations, practical examples, and R software tools, Introduction to Multivariate Statistical Analysis in Chemometrics presents simple yet thorough explanations of the most important multivariate statistical methods for analyzing chemical data. It includes discussions of various statistical methods, such as principal component analysis, regression analysis, classification methods, and clustering.
Written by a chemometrician and a statistician, the book reflects the practical approach of chemometrics and the more formally oriented one of statistics. To enable a better understanding of the statistical methods, the authors apply them to real data examples from chemistry. They also examine results of the different methods, comparing traditional approaches with their robust counterparts. In addition, the authors use the freely available R package to implement methods, encouraging readers to go through the examples and adapt the procedures to their own problems.
Focusing on the practicality of the methods and the validity of the results, this book offers concise mathematical descriptions of many multivariate methods and employs graphical schemes to visualize key concepts. It effectively imparts a basic understanding of how to apply statistical methods to multivariate scientific data.
Table of Contents
Preface | p. ix |
Acknowledgments | p. xi |
Authors | p. xiii |
Chapter 1 Introduction | p. 1 |
1.1 Chemoinformatics-Chemometrics-Statistics | p. 1 |
1.2 This Book | p. 3 |
1.3 Historical Remarks about Chemometrics | p. 4 |
1.4 Bibliography | p. 6 |
1.5 Starting Examples | p. 8 |
1.5.1 Univariate versus Bivariate Classification | p. 8 |
1.5.2 Nitrogen Content of Cereals Computed from NIR Data | p. 9 |
1.5.3 Elemental Composition of Archaeological Glasses | p. 10 |
1.6 Univariate Statistics-A Reminder | p. 12 |
1.6.1 Empirical Distributions | p. 12 |
1.6.2 Theoretical Distributions | p. 16 |
1.6.3 Central Value | p. 19 |
1.6.4 Spread | p. 20 |
1.6.5 Statistical Tests | p. 22 |
References | p. 25 |
Chapter 2 Multivariate Data | p. 31 |
2.1 Definitions | p. 31 |
2.2 Basic Preprocessing | p. 33 |
2.2.1 Data Transformation | p. 34 |
2.2.2 Centering and Scaling | p. 35 |
2.2.3 Normalization | p. 36 |
2.2.4 Transformations for Compositional Data | p. 37 |
2.3 Covariance and Correlation | p. 38 |
2.3.1 Overview | p. 38 |
2.3.2 Estimating Covariance and Correlation | p. 40 |
2.4 Distances and Similarities | p. 44 |
2.5 Multivariate Outlier Identification | p. 47 |
2.6 Linear Latent Variables | p. 50 |
2.6.1 Overview | p. 50 |
2.6.2 Projection and Mapping | p. 51 |
2.6.3 Example | p. 53 |
2.7 Summary | p. 56 |
References | p. 58 |
Chapter 3 Principal Component Analysis | p. 59 |
3.1 Concepts | p. 59 |
3.2 Number of PCA Components | p. 63 |
3.3 Centering and Scaling | p. 64 |
3.4 Outliers and Data Distribution | p. 66 |
3.5 Robust PCA | p. 67 |
3.6 Algorithms for PCA | p. 69 |
3.6.1 Mathematics of PCA | p. 69 |
3.6.2 Jacobi Rotation | p. 71 |
3.6.3 Singular Value Decomposition | p. 72 |
3.6.4 NIPALS | p. 73 |
3.7 Evaluation and Diagnostics | p. 75 |
3.7.1 Cross Validation for Determination of the Number of Principal Components | p. 75 |
3.7.2 Explained Variance for Each Variable | p. 77 |
3.7.3 Diagnostic Plots | p. 78 |
3.8 Complementary Methods for Exploratory Data Analysis | p. 81 |
3.8.1 Factor Analysis | p. 82 |
3.8.2 Cluster Analysis and Dendrogram | p. 82 |
3.8.3 Kohonen Mapping | p. 84 |
3.8.4 Sammon's Nonlinear Mapping | p. 87 |
3.8.5 Multiway PCA | p. 89 |
3.9 Examples | p. 91 |
3.9.1 Tissue Samples from Human Mummies and Fatty Acid Concentrations | p. 91 |
3.9.2 Polycyclic Aromatic Hydrocarbons in Aerosol | p. 96 |
3.10 Summary | p. 99 |
References | p. 101 |
Chapter 4 Calibration | p. 103 |
4.1 Concepts | p. 103 |
4.2 Performance of Regression Models | p. 108 |
4.2.1 Overview | p. 108 |
4.2.2 Overfitting and Underfitting | p. 110 |
4.2.3 Performance Criteria | p. 112 |
4.2.4 Criteria for Models with Different Numbers of Variables | p. 114 |
4.2.5 Cross Validation | p. 115 |
4.2.6 Bootstrap | p. 118 |
4.3 Ordinary Least-Squares Regression | p. 119 |
4.3.1 Simple OLS | p. 119 |
4.3.2 Multiple OLS | p. 124 |
4.3.2.1 Confidence Intervals and Statistical Tests in OLS | p. 126 |
4.3.2.2 Hat Matrix and Full Cross Validation in OLS | p. 129 |
4.3.3 Multivariate OLS | p. 129 |
4.4 Robust Regression | p. 131 |
4.4.1 Overview | p. 131 |
4.4.2 Regression Diagnostics | p. 133 |
4.4.3 Practical Hints | p. 137 |
4.5 Variable Selection | p. 137 |
4.5.1 Overview | p. 137 |
4.5.2 Univariate and Bivariate Selection Methods | p. 139 |
4.5.3 Stepwise Selection Methods | p. 140 |
4.5.4 Best-Subset Regression | p. 141 |
4.5.5 Variable Selection Based on PCA or PLS Models | p. 143 |
4.5.6 Genetic Algorithms | p. 143 |
4.5.7 Cluster Analysis of Variables | p. 146 |
4.5.8 Example | p. 146 |
4.6 Principal Component Regression | p. 148 |
4.6.1 Overview | p. 148 |
4.6.2 Number of PCA Components | p. 150 |
4.7 Partial Least-Squares Regression | p. 150 |
4.7.1 Overview | p. 150 |
4.7.2 Mathematical Aspects | p. 154 |
4.7.3 Kernel Algorithm for PLS | p. 157 |
4.7.4 NIPALS Algorithm for PLS | p. 158 |
4.7.5 SIMPLS Algorithm for PLS | p. 160 |
4.7.6 Other Algorithms for PLS | p. 161 |
4.7.7 Robust PLS | p. 162 |
4.8 Related Methods | p. 163 |
4.8.1 Canonical Correlation Analysis | p. 163 |
4.8.2 Ridge and Lasso Regression | p. 166 |
4.8.3 Nonlinear Regression | p. 168 |
4.8.3.1 Basis Expansions | p. 168 |
4.8.3.2 Kernel Methods | p. 169 |
4.8.3.3 Regression Trees | p. 170 |
4.8.3.4 Artificial Neural Networks | p. 171 |
4.9 Examples | p. 172 |
4.9.1 GC Retention Indices of Polycyclic Aromatic Compounds | p. 172 |
4.9.1.1 Principal Component Regression | p. 173 |
4.9.1.2 Partial Least-Squares Regression | p. 177 |
4.9.1.3 Robust PLS | p. 178 |
4.9.1.4 Ridge Regression | p. 179 |
4.9.1.5 Lasso Regression | p. 181 |
4.9.1.6 Stepwise Regression | p. 182 |
4.9.1.7 Summary | p. 184 |
4.9.2 Cereal Data | p. 185 |
4.10 Summary | p. 188 |
References | p. 190 |
Chapter 5 Classification | p. 195 |
5.1 Concepts | p. 195 |
5.2 Linear Classification Methods | p. 197 |
5.2.1 Linear Discriminant Analysis | p. 197 |
5.2.1.1 Bayes Discriminant Analysis | p. 197 |
5.2.1.2 Fisher Discriminant Analysis | p. 200 |
5.2.1.3 Example | p. 204 |
5.2.2 Linear Regression for Discriminant Analysis | p. 205 |
5.2.2.1 Binary Classification | p. 205 |
5.2.2.2 Multicategory Classification with OLS | p. 206 |
5.2.2.3 Multicategory Classification with PLS | p. 207 |
5.2.3 Logistic Regression | p. 207 |
5.3 Kernel and Prototype Methods | p. 209 |
5.3.1 SIMCA | p. 209 |
5.3.2 Gaussian Mixture Models | p. 212 |
5.3.3 k-NN Classification | p. 214 |
5.4 Classification Trees | p. 217 |
5.5 Artificial Neural Networks | p. 221 |
5.6 Support Vector Machine | p. 223 |
5.7 Evaluation | p. 228 |
5.7.1 Principles and Misclassification Error | p. 228 |
5.7.2 Predictive Ability | p. 229 |
5.7.3 Confidence in Classification Answers | p. 230 |
5.8 Examples | p. 231 |
5.8.1 Origin of Glass Samples | p. 231 |
5.8.1.1 Linear Discriminant Analysis | p. 231 |
5.8.1.2 Logistic Regression | p. 233 |
5.8.1.3 Gaussian Mixture Models | p. 234 |
5.8.1.4 k-NN Methods | p. 235 |
5.8.1.5 Classification Trees | p. 236 |
5.8.1.6 Artificial Neural Networks | p. 237 |
5.8.1.7 Support Vector Machines | p. 238 |
5.8.1.8 Overall Comparison | p. 238 |
5.8.2 Recognition of Chemical Substructures from Mass Spectra | p. 240 |
5.9 Summary | p. 246 |
References | p. 247 |
Chapter 6 Cluster Analysis | p. 251 |
6.1 Concepts | p. 251 |
6.2 Distance and Similarity Measures | p. 254 |
6.3 Partitioning Methods | p. 260 |
6.4 Hierarchical Clustering Methods | p. 263 |
6.5 Fuzzy Clustering | p. 266 |
6.6 Model-Based Clustering | p. 267 |
6.7 Cluster Validity and Clustering Tendency Measures | p. 270 |
6.8 Examples | p. 272 |
6.8.1 Chemotaxonomy of Plants | p. 272 |
6.8.2 Glass Samples | p. 278 |
6.9 Summary | p. 279 |
References | p. 281 |
Chapter 7 Preprocessing | p. 283 |
7.1 Concepts | p. 283 |
7.2 Smoothing and Differentiation | p. 283 |
7.3 Multiplicative Signal Correction | p. 284 |
7.4 Mass Spectral Features | p. 287 |
7.4.1 Logarithmic Intensity Ratios | p. 288 |
7.4.2 Averaged Intensities of Mass Intervals | p. 288 |
7.4.3 Intensities Normalized to Local Intensity Sum | p. 288 |
7.4.4 Modulo-14 Summation | p. 289 |
7.4.5 Autocorrelation | p. 289 |
7.4.6 Spectra Type | p. 289 |
7.4.7 Example | p. 289 |
References | p. 291 |
Appendix 1 Symbols and Abbreviations | p. 293 |
Appendix 2 Matrix Algebra | p. 297 |
A.2.1 Definitions | p. 297 |
A.2.2 Addition and Subtraction of Matrices | p. 298 |
A.2.3 Multiplication of Vectors | p. 298 |
A.2.4 Multiplication of Matrices | p. 299 |
A.2.5 Matrix Inversion | p. 300 |
A.2.6 Eigenvectors | p. 301 |
A.2.7 Singular Value Decomposition | p. 302 |
References | p. 303 |
Appendix 3 Introduction to R | p. 305 |
A.3.1 General Information on R | p. 305 |
A.3.2 Installing R | p. 305 |
A.3.3 Starting R | p. 305 |
A.3.4 Working Directory | p. 306 |
A.3.5 Loading and Saving Data | p. 306 |
A.3.6 Important R Functions | p. 306 |
A.3.7 Operators and Basic Functions | p. 307 |
Mathematical and Logical Operators, Comparison | p. 307 |
Special Elements | p. 308 |
Mathematical Functions | p. 308 |
Matrix Manipulation | p. 308 |
Statistical Functions | p. 308 |
A.3.8 Data Types | p. 309 |
Missing Values | p. 309 |
A.3.9 Data Structures | p. 309 |
A.3.10 Selection and Extraction from Data Objects | p. 310 |
Examples for Creating Vectors | p. 310 |
Examples for Selecting Elements from a Vector or Factor | p. 310 |
Examples for Selecting Elements from a Matrix, Array, or Data Frame | p. 310 |
Examples for Selecting Elements from a List | p. 310 |
A.3.11 Generating and Saving Graphics | p. 311 |
Functions Relevant for Graphics | p. 311 |
Relevant Plot Parameters | p. 311 |
Statistical Graphics | p. 311 |
Saving Graphic Output | p. 311 |
References | p. 312 |
Index | p. 313 |