Cover image for DNA, words and models : statistics of exceptional words
Title:
DNA, words and models : statistics of exceptional words
Personal Author:
Publication Information:
Cambridge, UK : Cambridge University Press, 2005
Physical Description:
xx, 138 p. : ill. ; 24 cm.
ISBN:
9780521847292

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010218789 QH438.4.M3 R62 2005 Open Access Book Book
Searching...

On Order

Summary

Summary

An important problem in computational biology is identifying short DNA sequences (mathematically, 'words') associated to a biological function. One approach consists in determining whether a particular word is simply random or is of statistical significance, for example, because of its frequency or location. This book introduces the mathematical and statistical ideas used in solving this so-called exceptional word problem. It begins with a detailed description of the principal models used in sequence analysis: Markovian models are central here and capture compositional information on the sequence being analysed. There follows an introduction to several statistical methods that are used for finding exceptional words with respect to the model used. The second half of the book is illustrated with numerous examples provided from the analysis of bacterial genomes, making this a practical guide for users facing a real situation and needing to make an adequate procedure choice.


Table of Contents

List of figuresp. vii
List of tablesp. ix
Prefacep. xi
Preliminary notions and notationsp. xiv
1 Introductionp. 1
1.1 The contextp. 1
1.2 Randomness and modelsp. 3
1.3 A bit of biologyp. 6
2 Simple models for biological sequencesp. 11
2.1 Why a model?p. 11
2.2 Permutation modelp. 12
2.3 Bernoulli modelp. 21
3 Introduction to Markov chain modelsp. 27
3.1 Assumptionsp. 27
3.2 Markov chain of order 1p. 28
3.3 Markov chain of order mp. 31
3.4 Estimation of the parametersp. 33
4 Taking heterogeneities into accountp. 39
4.1 Phased chainsp. 39
4.2 Piecewise homogeneous Markov chainsp. 43
4.3 Translation conditional modelsp. 51
5 Statistical properties of word occurrencesp. 57
5.1 Countp. 60
5.2 Positions and distancesp. 74
5.3 Distribution along the sequencep. 89
6 Words with unexpected frequenciesp. 99
6.1 Exact distribution and approximationsp. 101
6.2 Influence of the modelp. 106
6.3 Over-representation of Chi sites in E. coli and H. influenzaep. 112
6.4 Under-representation of palindromes of length 6 in E. coli and in the phage Lambdap. 118
7 Words with unexpected locationsp. 123
7.1 Chi sites in the genome of H. influenzaep. 123
7.2 Distribution of palindromes in E. coli's genomep. 127
7.3 Detection of promoter sites in B. subtilisp. 128
The last wordp. 131
Referencesp. 134
Indexp. 137