Cover image for Analyzing linguistic data : a practical introduction to statistics using R
Title:
Analyzing linguistic data : a practical introduction to statistics using R
Personal Author:
Publication Information:
New York, NY : Cambridge University Press, 2008
Physical Description:
xiii, 353 p. : ill. ; 26 cm.
ISBN:
9780521709187

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010197246 P138.5 B33 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.


Author Notes

R. H. Baayen is Professor of Quantitative Linguistics at the University of Alberta, Edmonton


Table of Contents

Prefacep. x
1 An introduction to Rp. 1
1.1 R as a calculatorp. 2
1.2 Getting data into and out of Rp. 4
1.3 Accessing information in data framesp. 6
1.4 Operations on data framesp. 10
1.4.1 Sorting a data frame by one or more columnsp. 10
1.4.2 Changing information in a data framep. 12
1.4.3 Extracting contingency tables from data framesp. 13
1.4.4 Calculations on data framesp. 15
1.5 Session managementp. 18
2 Graphical data explorationp. 20
2.1 Random variablesp. 20
2.2 Visualizing single random variablesp. 21
2.3 Visualizing two or more variablesp. 32
2.4 Trellis graphicsp. 37
3 Probability distributionsp. 44
3.1 Distributionsp. 44
3.2 Discrete distributionsp. 44
3.3 Continuous distributionsp. 57
3.3.1 The normal distributionp. 58
3.3.2 The t, F, and X[superscript 2] distributionsp. 63
4 Basic statistical methodsp. 68
4.1 Tests for single vectorsp. 71
4.1.1 Distribution testsp. 71
4.1.2 Tests for the meanp. 75
4.2 Tests for two independent vectorsp. 77
4.2.1 Are the distributions the same?p. 78
4.2.2 Are the means the same?p. 79
4.2.3 Are the variances the same?p. 81
4.3 Paired vectorsp. 82
4.3.1 Are the means or medians the same?p. 82
4.3.2 Functional relations: linear regressionp. 84
4.3.3 What does the joint density look like?p. 97
4.4 A numerical vector and a factor: analysis of variancep. 101
4.4.1 Two numerical vectors and a factor: analysis of covariancep. 108
4.5 Two vectors with countsp. 111
4.6 A note on statistical significancep. 114
5 Clustering and classificationp. 118
5.1 Clusteringp. 118
5.1.1 Tables with measurements: principal components analysisp. 118
5.1.2 Tables with measurements: factor analysisp. 126
5.1.3 Tables with counts: correspondence analysisp. 128
5.1.4 Tables with distances: multidimensional scalingp. 136
5.1.5 Tables with distances: hierarchical cluster analysisp. 138
5.2 Classificationp. 148
5.2.1 Classification treesp. 148
5.2.2 Discriminant analysisp. 154
5.2.3 Support vector machinesp. 160
6 Regression modelingp. 165
6.1 Introductionp. 165
6.2 Ordinary least squares regressionp. 169
6.2.1 Nonlinearitiesp. 174
6.2.2 Collinearityp. 181
6.2.3 Model criticismp. 188
6.2.4 Validationp. 193
6.3 Generalized linear modelsp. 195
6.3.1 Logistic regressionp. 195
6.3.2 Ordinal logistic regressionp. 208
6.4 Regression with breakpointsp. 214
6.5 Models for lexical richnessp. 222
6.6 General considerationsp. 236
7 Mixed modelsp. 241
7.1 Modeling data with fixed and random effectsp. 242
7.2 A comparison with traditional analysesp. 259
7.2.1 Mixed-effects models and quasi-Fp. 260
7.2.2 Mixed-effects models and Latin Square designsp. 266
7.2.3 Regression with subjects and itemsp. 269
7.3 Shrinkage in mixed-effects modelsp. 275
7.4 Generalized linear mixed modelsp. 278
7.5 Case studiesp. 284
7.5.1 Primed lexical decision latencies for Dutch neologismsp. 284
7.5.2 Self-paced reading latencies for Dutch neologismsp. 287
7.5.3 Visual lexical decision latencies of Dutch eight-year-oldsp. 289
7.5.4 Mixed-effects models in corpus linguisticsp. 295
Appendix A Solutions to the exercisesp. 303
Appendix B Overview of R functionsp. 335
Referencesp. 342
Indexp. 347
Index of data setsp. 347
Index of Rp. 347
Index of topicsp. 349
Index of authorsp. 352