Neural networks and learning machines

Neural Networks and Learning Machines, Third Edition is renowned for its thoroughness and readability. This well-organized and completely up-to-date text remains the most comprehensive treatment of neural networks from an engineering perspective. This is ideal for professional engineers and research scientists.

Matlab codes used for the computer experiments in the text are available for download at: http://www.pearsonhighered.com/haykin/

Refocused, revised and renamed to reflect the duality of neural networks and learning machines, this edition recognizes that the subject matter is richer when these topics are studied together. Ideas drawn from neural networks and machine learning are hybridized to perform improved learning tasks beyond the capability of either independently.

Preface	p. x
Introduction	p. 1
1 What is a Neural Network?	p. 1
2 The Human Brain	p. 6
3 Models of a Neuron	p. 10
4 Neural Networks Viewed As Directed Graphs	p. 15
5 Feedback	p. 18
6 Network Architectures	p. 21
7 Knowledge Representation	p. 24
8 Learning Processes	p. 34
9 Learning Tasks	p. 38
10 Concluding Remarks	p. 45
Notes and References	p. 46
Chapter 1 Rosenblatt's Perceptron	p. 47
1.1 Introduction	p. 47
1.2 Perceptron	p. 48
1.3 The Perceptron Convergence Theorem	p. 50
1.4 Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment	p. 55
1.5 Computer Experiment: Pattern Classification	p. 60
1.6 The Batch Perceptron Algorithm	p. 62
1.7 Summary and Discussion	p. 65
Notes and References	p. 66
Problems	p. 66
Chapter 2 Model Building through Regression	p. 68
2.1 Introduction	p. 68
2.2 Linear Regression Model: Preliminary Considerations	p. 69
2.3 Maximum a Posteriori Estimation of the Parameter Vector	p. 71
2.4 Relationship Between Regularized Least-Squares Estimation and MAP Estimation	p. 76
2.5 Computer Experiment: Pattern Classification	p. 77
2.6 The Minimum-Description-Length Principle	p. 79
2.7 Finite Sample-Size Considerations	p. 82
2.8 The Instrumental-Variables Method	p. 86
2.9 Summary and Discussion	p. 88
Notes and References	p. 89
Problems	p. 89
Chapter 3 The Least-Mean-Square Algorithm	p. 91
3.1 Introduction	p. 91
3.2 Filtering Structure of the LMS Algorithm	p. 92
3.3 Unconstrained Optimization: a Review	p. 94
3.4 The Wiener Filter	p. 100
3.5 The Least-Mean-Square Algorithm	p. 102
3.6 Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter	p. 104
3.7 The Langevin Equation: Characterization of Brownian Motion	p. 106
3.8 Kushner's Direct-Averaging Method	p. 107
3.9 Statistical LMS Learning Theory for Small Learning-Rate Parameter	p. 108
3.10 Computer Experiment I: Linear Prediction	p. 110
3.11 Computer Experiment II: Pattern Classification	p. 112
3.12 Virtues and Limitations of the LMS Algorithm	p. 113
3.13 Learning-Rate Annealing Schedules	p. 115
3.14 Summary and Discussion	p. 117
Notes and References	p. 118
Problems	p. 119
Chapter 4 Multilayer Perceptrons	p. 122
4.1 Introduction	p. 123
4.2 Some Preliminaries	p. 124
4.3 Batch Learning and On-Line Learning	p. 126
4.4 The Back-Propagation Algorithm	p. 129
4.5 XOR Problem	p. 141
4.6 Heuristics for Making the Back-Propagation Algorithm Perform Better	p. 144
4.7 Computer Experiment: Pattern Classification	p. 150
4.8 Back Propagation and Differentiation	p. 153
4.9 The Hessian and Its Role in On-Line Learning	p. 155
4.10 Optimal Annealing and Adaptive Control of the Learning Rate	p. 157
4.11 Generalization	p. 164
4.12 Approximations of Functions	p. 166
4.13 Cross-Validation	p. 171
4.14 Complexity Regularization and Network Pruning	p. 175
4.15 Virtues and Limitations of Back-Propagation Learning	p. 180
4.16 Supervised Learning Viewed as an Optimization Problem	p. 186
4.17 Convolutional Networks	p. 201
4.18 Nonlinear Filtering	p. 203
4.19 Small-Scale Versus Large-Scale Learning Problems	p. 209
4.20 Summary and Discussion	p. 217
Notes and References	p. 219
Problems	p. 221
Chapter 5 Kernel Methods and Radial-Basis Function Networks	p. 230
5.1 Introduction	p. 230
5.2 Cover's Theorem on the Separability of Patterns	p. 231
5.3 The Interpolation Problem	p. 236
5.4 Radial-Basis-Function Networks	p. 239
5.5 K-Means Clustering	p. 242
5.6 Recursive Least-Squares Estimation of the Weight Vector	p. 245
5.7 Hybrid Learning Procedure for RBF Networks	p. 249
5.8 Computer Experiment: Pattern Classification	p. 250
5.9 Interpretations of the Gaussian Hidden Units	p. 252
5.10 Kernel Regression and Its Relation to RBF Networks	p. 255
5.11 Summary and Discussion	p. 259
Notes and References	p. 261
Problems	p. 263
Chapter 6 Support Vector Machines	p. 268
6.1 Introduction	p. 268
6.2 Optimal Hyperplane for Linearly Separable Patterns	p. 269
6.3 Optimal Hyperplane for Nonseparable Patterns	p. 276
6.4 The Support Vector Machine Viewed as a Kernel Machine	p. 281
6.5 Design of Support Vector Machines	p. 284
6.6 XOR Problem	p. 286
6.7 Computer Experiment: Pattern Classification	p. 289
6.8 Regression: Robustness Considerations	p. 289
6.9 Optimal Solution of the Linear Regression Problem	p. 293
6.10 The Representer Theorem and Related Issues	p. 296
6.11 Summary and Discussion	p. 302
Notes and References	p. 304
Problems	p. 307
Chapter 7 Regularization Theory	p. 313
7.1 Introduction	p. 313
7.2 Hadamard's Conditions for Well-Posedness	p. 314
7.3 Tikhonov's Regularization Theory	p. 315
7.4 Regularization Networks	p. 326
7.5 Generalized Radial-Basis-Function Networks	p. 327
7.6 The Regularized Least-Squares Estimator: Revisited	p. 331
7.7 Additional Notes of Interest on Regularization	p. 335
7.8 Estimation of the Regularization Parameter	p. 336
7.9 Semisupervised Learning	p. 342
7.10 Manifold Regularization: Preliminary Considerations	p. 343
7.11 Differentiable Manifolds	p. 345
7.12 Generalized Regularization Theory	p. 348
7.13 Spectral Graph Theory	p. 350
7.14 Generalized Representer Theorem	p. 352
7.15 Laplacian Regularized Least-Squares Algorithm	p. 354
7.16 Experiments on Pattern Classification Using Semisupervised Learning	p. 356
7.17 Summary and Discussion	p. 359
Notes and References	p. 361
Problems	p. 363
Chapter 8 Principal-Components Analysis	p. 367
8.1 Introduction	p. 367
8.2 Principles of Self-Organization	p. 368
8.3 Self-Organized Feature Analysis	p. 372
8.4 Principal-Components Analysis: Perturbation Theory	p. 373
8.5 Hebbian-Based Maximum Eigenfilter	p. 383
8.6 Hebbian-Based Principal-Components Analysis	p. 392
8.7 Case Study: Image Coding	p. 398
8.8 Kernel Principal-Components Analysis	p. 401
8.9 Basic Issues Involved in the Coding of Natural Images	p. 406
8.10 Kernel Hebbian Algorithm	p. 407
8.11 Summary and Discussion	p. 412
Notes and References	p. 415
Problems	p. 418
Chapter 9 Self-Organizing Maps	p. 425
9.1 Introduction	p. 425
9.2 Two Basic Feature-Mapping Models	p. 426
9.3 Self-Organizing Map	p. 428
9.4 Properties of the Feature Map	p. 437
9.5 Computer Experiments I: Disentangling Lattice Dynamics Using SOM	p. 445
9.6 Contextual Maps	p. 447
9.7 Hierarchical Vector Quantization	p. 450
9.8 Kernel Self-Organizing Map	p. 454
9.9 Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM	p. 462
9.10 Relationship Between Kernel SOM and Kullback-Leibler Divergence	p. 464
9.11 Summary and Discussion	p. 466
Notes and References	p. 468
Problems	p. 470
Chapter 10 Information-Theoretic Learning Models	p. 475
10.1 Introduction	p. 476
10.2 Entropy	p. 477
10.3 Maximum-Entropy Principle	p. 481
10.4 Mutual Information	p. 484
10.5 Kullback-Leibler Divergence	p. 486
10.6 Copulas	p. 489
10.7 Mutual Information as an Objective Function to be Optimized	p. 493
10.8 Maximum Mutual Information Principle	p. 494
10.9 Infomax and Redundancy Reduction	p. 499
10.10 Spatially Coherent Features	p. 501
10.11 Spatially Incoherent Features	p. 504
10.12 Independent-Components Analysis	p. 508
10.13 Sparse Coding of Natural Images and Comparison with ICA Coding	p. 514
10.14 Natural-Gradient Learning for Independent-Components Analysis	p. 516
10.15 Maximum-Likelihood Estimation for Independent-Components Analysis	p. 526
10.16 Maximum-Entropy Learning for Blind Source Separation	p. 529
10.17 Maximization of Negentropy for Independent-Components Analysis	p. 534
10.18 Coherent Independent-Components Analysis	p. 541
10.19 Rate Distortion Theory and Information Bottleneck	p. 549
10.20 Optimal Manifold Representation of Data	p. 553
10.21 Computer Experiment: Pattern Classification	p. 560
10.22 Summary and Discussion	p. 561
Notes and References	p. 564
Problems	p. 572
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics	p. 579
11.1 Introduction	p. 580
11.2 Statistical Mechanics	p. 580
11.3 Markov Chains	p. 582
11.4 Metropolis Algorithm	p. 591
11.5 Simulated Annealing	p. 594
11.6 Gibbs Sampling	p. 596
11.7 Boltzmann Machine	p. 598
11.8 Logistic Belief Nets	p. 604
11.9 Deep Belief Nets	p. 606
11.10 Deterministic Annealing	p. 610
11.11 Analogy of Deterministic Annealing with Expectation-Maximization Algorithm	p. 616
11.12 Summary and Discussion	p. 617
Notes and References	p. 619
Problems	p. 621
Chapter 12 Dynamic Programming	p. 627
12.1 Introduction	p. 627
12.2 Markov Decision Process	p. 629
12.3 Bellman's Optimality Criterion	p. 631
12.4 Policy Iteration	p. 635
12.5 Value Iteration	p. 637
12.6 Approximate Dynamic Programming: Direct Methods	p. 642
12.7 Temporal-Difference Learning	p. 643
12.8 Q-Learning	p. 648
12.9 Approximate Dynamic Programming: Indirect Methods	p. 652
12.10 Least-Squares Policy Evaluation	p. 655
12.11 Approximate Policy Iteration	p. 660
12.12 Summary and Discussion	p. 663
Notes and References	p. 665
Problems	p. 668
Chapter 13 Neurodynamics	p. 672
13.1 Introduction	p. 672
13.2 Dynamic Systems	p. 674
13.3 Stability of Equilibrium States	p. 678
13.4 Attractors	p. 684
13.5 Neurodynamic Models	p. 686
13.6 Manipulation of Attractors as a Recurrent Network Paradigm	p. 689
13.7 Hopfield Model	p. 690
13.8 The Cohen-Grossberg Theorem	p. 703
13.9 Brain-State-In-A-Box Model	p. 705
13.10 Strange Attractors and Chaos	p. 711
13.11 Dynamic Reconstruction of a Chaotic Process	p. 716
13.12 Summary and Discussion	p. 722
Notes and References	p. 724
Problems	p. 727
Chapter 14 Bayseian Filtering for State Estimation of Dynamic Systems	p. 731
14.1 Introduction	p. 731
14.2 State-Space Models	p. 732
14.3 Kalman Filters	p. 736
14.4 The Divergence-Phenomenon and Square-Root Filtering	p. 744
14.5 The Extended Kalman Filter	p. 750
14.6 The Bayesian Filter	p. 755
14.7 Cubature Kalman Filter: Building on the Kalman Filter	p. 759
14.8 Particle Filters	p. 765
14.9 Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters	p. 775
14.10 Kalman Filtering in Modeling of Brain Functions	p. 777
14.11 Summary and Discussion	p. 780
Notes and References	p. 782
Problems	p. 784
Chapter 15 Dynamically Driven Recurrent Networks	p. 790
15.1 Introduction	p. 790
15.2 Recurrent Network Architectures	p. 791
15.3 Universal Approximation Theorem	p. 797
15.4 Controllability and Observability	p. 799
15.5 Computational Power of Recurrent Networks	p. 804
15.6 Learning Algorithms	p. 806
15.7 Back Propagation Through Time	p. 808
15.8 Real-Time Recurrent Learning	p. 812
15.9 Vanishing Gradients in Recurrent Networks	p. 818
15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators	p. 822
15.11 Computer Experiment: Dynamic Reconstruction of Mackay-Glass Attractor	p. 829
15.12 Adaptivity Considerations	p. 831
15.13 Case Study: Model Reference Applied to Neurocontrol	p. 833
15.14 Summary and Discussion	p. 835
Notes and References	p. 839
Problems	p. 842
Bibliography	p. 845
Index	p. 889

Available:*

On Order

Summary

Summary

Table of Contents