Title:
Neural networks and learning machines
Personal Author:
Edition:
3rd ed.
Publication Information:
New Jersey : Prentice Hall, 2009
Physical Description:
xxx, 906 p. : ill. (some col.) ; 24 cm.
ISBN:
9780131471399
Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010204502 | QA76.87 H395 2009 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
For graduate-level neural network courses offered in the departments of Computer Engineering, Electrical Engineering, and Computer Science.
Neural Networks and Learning Machines, Third Edition is renowned for its thoroughness and readability. This well-organized and completely up-to-date text remains the most comprehensive treatment of neural networks from an engineering perspective. This is ideal for professional engineers and research scientists. Matlab codes used for the computer experiments in the text are available for download at: http://www.pearsonhighered.com/haykin/ Refocused, revised and renamed to reflect the duality of neural networks and learning machines, this edition recognizes that the subject matter is richer when these topics are studied together. Ideas drawn from neural networks and machine learning are hybridized to perform improved learning tasks beyond the capability of either independently.Table of Contents
Preface | p. x |
Introduction | p. 1 |
1 What is a Neural Network? | p. 1 |
2 The Human Brain | p. 6 |
3 Models of a Neuron | p. 10 |
4 Neural Networks Viewed As Directed Graphs | p. 15 |
5 Feedback | p. 18 |
6 Network Architectures | p. 21 |
7 Knowledge Representation | p. 24 |
8 Learning Processes | p. 34 |
9 Learning Tasks | p. 38 |
10 Concluding Remarks | p. 45 |
Notes and References | p. 46 |
Chapter 1 Rosenblatt's Perceptron | p. 47 |
1.1 Introduction | p. 47 |
1.2 Perceptron | p. 48 |
1.3 The Perceptron Convergence Theorem | p. 50 |
1.4 Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment | p. 55 |
1.5 Computer Experiment: Pattern Classification | p. 60 |
1.6 The Batch Perceptron Algorithm | p. 62 |
1.7 Summary and Discussion | p. 65 |
Notes and References | p. 66 |
Problems | p. 66 |
Chapter 2 Model Building through Regression | p. 68 |
2.1 Introduction | p. 68 |
2.2 Linear Regression Model: Preliminary Considerations | p. 69 |
2.3 Maximum a Posteriori Estimation of the Parameter Vector | p. 71 |
2.4 Relationship Between Regularized Least-Squares Estimation and MAP Estimation | p. 76 |
2.5 Computer Experiment: Pattern Classification | p. 77 |
2.6 The Minimum-Description-Length Principle | p. 79 |
2.7 Finite Sample-Size Considerations | p. 82 |
2.8 The Instrumental-Variables Method | p. 86 |
2.9 Summary and Discussion | p. 88 |
Notes and References | p. 89 |
Problems | p. 89 |
Chapter 3 The Least-Mean-Square Algorithm | p. 91 |
3.1 Introduction | p. 91 |
3.2 Filtering Structure of the LMS Algorithm | p. 92 |
3.3 Unconstrained Optimization: a Review | p. 94 |
3.4 The Wiener Filter | p. 100 |
3.5 The Least-Mean-Square Algorithm | p. 102 |
3.6 Markov Model Portraying the Deviation of the LMS Algorithm from the Wiener Filter | p. 104 |
3.7 The Langevin Equation: Characterization of Brownian Motion | p. 106 |
3.8 Kushner's Direct-Averaging Method | p. 107 |
3.9 Statistical LMS Learning Theory for Small Learning-Rate Parameter | p. 108 |
3.10 Computer Experiment I: Linear Prediction | p. 110 |
3.11 Computer Experiment II: Pattern Classification | p. 112 |
3.12 Virtues and Limitations of the LMS Algorithm | p. 113 |
3.13 Learning-Rate Annealing Schedules | p. 115 |
3.14 Summary and Discussion | p. 117 |
Notes and References | p. 118 |
Problems | p. 119 |
Chapter 4 Multilayer Perceptrons | p. 122 |
4.1 Introduction | p. 123 |
4.2 Some Preliminaries | p. 124 |
4.3 Batch Learning and On-Line Learning | p. 126 |
4.4 The Back-Propagation Algorithm | p. 129 |
4.5 XOR Problem | p. 141 |
4.6 Heuristics for Making the Back-Propagation Algorithm Perform Better | p. 144 |
4.7 Computer Experiment: Pattern Classification | p. 150 |
4.8 Back Propagation and Differentiation | p. 153 |
4.9 The Hessian and Its Role in On-Line Learning | p. 155 |
4.10 Optimal Annealing and Adaptive Control of the Learning Rate | p. 157 |
4.11 Generalization | p. 164 |
4.12 Approximations of Functions | p. 166 |
4.13 Cross-Validation | p. 171 |
4.14 Complexity Regularization and Network Pruning | p. 175 |
4.15 Virtues and Limitations of Back-Propagation Learning | p. 180 |
4.16 Supervised Learning Viewed as an Optimization Problem | p. 186 |
4.17 Convolutional Networks | p. 201 |
4.18 Nonlinear Filtering | p. 203 |
4.19 Small-Scale Versus Large-Scale Learning Problems | p. 209 |
4.20 Summary and Discussion | p. 217 |
Notes and References | p. 219 |
Problems | p. 221 |
Chapter 5 Kernel Methods and Radial-Basis Function Networks | p. 230 |
5.1 Introduction | p. 230 |
5.2 Cover's Theorem on the Separability of Patterns | p. 231 |
5.3 The Interpolation Problem | p. 236 |
5.4 Radial-Basis-Function Networks | p. 239 |
5.5 K-Means Clustering | p. 242 |
5.6 Recursive Least-Squares Estimation of the Weight Vector | p. 245 |
5.7 Hybrid Learning Procedure for RBF Networks | p. 249 |
5.8 Computer Experiment: Pattern Classification | p. 250 |
5.9 Interpretations of the Gaussian Hidden Units | p. 252 |
5.10 Kernel Regression and Its Relation to RBF Networks | p. 255 |
5.11 Summary and Discussion | p. 259 |
Notes and References | p. 261 |
Problems | p. 263 |
Chapter 6 Support Vector Machines | p. 268 |
6.1 Introduction | p. 268 |
6.2 Optimal Hyperplane for Linearly Separable Patterns | p. 269 |
6.3 Optimal Hyperplane for Nonseparable Patterns | p. 276 |
6.4 The Support Vector Machine Viewed as a Kernel Machine | p. 281 |
6.5 Design of Support Vector Machines | p. 284 |
6.6 XOR Problem | p. 286 |
6.7 Computer Experiment: Pattern Classification | p. 289 |
6.8 Regression: Robustness Considerations | p. 289 |
6.9 Optimal Solution of the Linear Regression Problem | p. 293 |
6.10 The Representer Theorem and Related Issues | p. 296 |
6.11 Summary and Discussion | p. 302 |
Notes and References | p. 304 |
Problems | p. 307 |
Chapter 7 Regularization Theory | p. 313 |
7.1 Introduction | p. 313 |
7.2 Hadamard's Conditions for Well-Posedness | p. 314 |
7.3 Tikhonov's Regularization Theory | p. 315 |
7.4 Regularization Networks | p. 326 |
7.5 Generalized Radial-Basis-Function Networks | p. 327 |
7.6 The Regularized Least-Squares Estimator: Revisited | p. 331 |
7.7 Additional Notes of Interest on Regularization | p. 335 |
7.8 Estimation of the Regularization Parameter | p. 336 |
7.9 Semisupervised Learning | p. 342 |
7.10 Manifold Regularization: Preliminary Considerations | p. 343 |
7.11 Differentiable Manifolds | p. 345 |
7.12 Generalized Regularization Theory | p. 348 |
7.13 Spectral Graph Theory | p. 350 |
7.14 Generalized Representer Theorem | p. 352 |
7.15 Laplacian Regularized Least-Squares Algorithm | p. 354 |
7.16 Experiments on Pattern Classification Using Semisupervised Learning | p. 356 |
7.17 Summary and Discussion | p. 359 |
Notes and References | p. 361 |
Problems | p. 363 |
Chapter 8 Principal-Components Analysis | p. 367 |
8.1 Introduction | p. 367 |
8.2 Principles of Self-Organization | p. 368 |
8.3 Self-Organized Feature Analysis | p. 372 |
8.4 Principal-Components Analysis: Perturbation Theory | p. 373 |
8.5 Hebbian-Based Maximum Eigenfilter | p. 383 |
8.6 Hebbian-Based Principal-Components Analysis | p. 392 |
8.7 Case Study: Image Coding | p. 398 |
8.8 Kernel Principal-Components Analysis | p. 401 |
8.9 Basic Issues Involved in the Coding of Natural Images | p. 406 |
8.10 Kernel Hebbian Algorithm | p. 407 |
8.11 Summary and Discussion | p. 412 |
Notes and References | p. 415 |
Problems | p. 418 |
Chapter 9 Self-Organizing Maps | p. 425 |
9.1 Introduction | p. 425 |
9.2 Two Basic Feature-Mapping Models | p. 426 |
9.3 Self-Organizing Map | p. 428 |
9.4 Properties of the Feature Map | p. 437 |
9.5 Computer Experiments I: Disentangling Lattice Dynamics Using SOM | p. 445 |
9.6 Contextual Maps | p. 447 |
9.7 Hierarchical Vector Quantization | p. 450 |
9.8 Kernel Self-Organizing Map | p. 454 |
9.9 Computer Experiment II: Disentangling Lattice Dynamics Using Kernel SOM | p. 462 |
9.10 Relationship Between Kernel SOM and Kullback-Leibler Divergence | p. 464 |
9.11 Summary and Discussion | p. 466 |
Notes and References | p. 468 |
Problems | p. 470 |
Chapter 10 Information-Theoretic Learning Models | p. 475 |
10.1 Introduction | p. 476 |
10.2 Entropy | p. 477 |
10.3 Maximum-Entropy Principle | p. 481 |
10.4 Mutual Information | p. 484 |
10.5 Kullback-Leibler Divergence | p. 486 |
10.6 Copulas | p. 489 |
10.7 Mutual Information as an Objective Function to be Optimized | p. 493 |
10.8 Maximum Mutual Information Principle | p. 494 |
10.9 Infomax and Redundancy Reduction | p. 499 |
10.10 Spatially Coherent Features | p. 501 |
10.11 Spatially Incoherent Features | p. 504 |
10.12 Independent-Components Analysis | p. 508 |
10.13 Sparse Coding of Natural Images and Comparison with ICA Coding | p. 514 |
10.14 Natural-Gradient Learning for Independent-Components Analysis | p. 516 |
10.15 Maximum-Likelihood Estimation for Independent-Components Analysis | p. 526 |
10.16 Maximum-Entropy Learning for Blind Source Separation | p. 529 |
10.17 Maximization of Negentropy for Independent-Components Analysis | p. 534 |
10.18 Coherent Independent-Components Analysis | p. 541 |
10.19 Rate Distortion Theory and Information Bottleneck | p. 549 |
10.20 Optimal Manifold Representation of Data | p. 553 |
10.21 Computer Experiment: Pattern Classification | p. 560 |
10.22 Summary and Discussion | p. 561 |
Notes and References | p. 564 |
Problems | p. 572 |
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics | p. 579 |
11.1 Introduction | p. 580 |
11.2 Statistical Mechanics | p. 580 |
11.3 Markov Chains | p. 582 |
11.4 Metropolis Algorithm | p. 591 |
11.5 Simulated Annealing | p. 594 |
11.6 Gibbs Sampling | p. 596 |
11.7 Boltzmann Machine | p. 598 |
11.8 Logistic Belief Nets | p. 604 |
11.9 Deep Belief Nets | p. 606 |
11.10 Deterministic Annealing | p. 610 |
11.11 Analogy of Deterministic Annealing with Expectation-Maximization Algorithm | p. 616 |
11.12 Summary and Discussion | p. 617 |
Notes and References | p. 619 |
Problems | p. 621 |
Chapter 12 Dynamic Programming | p. 627 |
12.1 Introduction | p. 627 |
12.2 Markov Decision Process | p. 629 |
12.3 Bellman's Optimality Criterion | p. 631 |
12.4 Policy Iteration | p. 635 |
12.5 Value Iteration | p. 637 |
12.6 Approximate Dynamic Programming: Direct Methods | p. 642 |
12.7 Temporal-Difference Learning | p. 643 |
12.8 Q-Learning | p. 648 |
12.9 Approximate Dynamic Programming: Indirect Methods | p. 652 |
12.10 Least-Squares Policy Evaluation | p. 655 |
12.11 Approximate Policy Iteration | p. 660 |
12.12 Summary and Discussion | p. 663 |
Notes and References | p. 665 |
Problems | p. 668 |
Chapter 13 Neurodynamics | p. 672 |
13.1 Introduction | p. 672 |
13.2 Dynamic Systems | p. 674 |
13.3 Stability of Equilibrium States | p. 678 |
13.4 Attractors | p. 684 |
13.5 Neurodynamic Models | p. 686 |
13.6 Manipulation of Attractors as a Recurrent Network Paradigm | p. 689 |
13.7 Hopfield Model | p. 690 |
13.8 The Cohen-Grossberg Theorem | p. 703 |
13.9 Brain-State-In-A-Box Model | p. 705 |
13.10 Strange Attractors and Chaos | p. 711 |
13.11 Dynamic Reconstruction of a Chaotic Process | p. 716 |
13.12 Summary and Discussion | p. 722 |
Notes and References | p. 724 |
Problems | p. 727 |
Chapter 14 Bayseian Filtering for State Estimation of Dynamic Systems | p. 731 |
14.1 Introduction | p. 731 |
14.2 State-Space Models | p. 732 |
14.3 Kalman Filters | p. 736 |
14.4 The Divergence-Phenomenon and Square-Root Filtering | p. 744 |
14.5 The Extended Kalman Filter | p. 750 |
14.6 The Bayesian Filter | p. 755 |
14.7 Cubature Kalman Filter: Building on the Kalman Filter | p. 759 |
14.8 Particle Filters | p. 765 |
14.9 Computer Experiment: Comparative Evaluation of Extended Kalman and Particle Filters | p. 775 |
14.10 Kalman Filtering in Modeling of Brain Functions | p. 777 |
14.11 Summary and Discussion | p. 780 |
Notes and References | p. 782 |
Problems | p. 784 |
Chapter 15 Dynamically Driven Recurrent Networks | p. 790 |
15.1 Introduction | p. 790 |
15.2 Recurrent Network Architectures | p. 791 |
15.3 Universal Approximation Theorem | p. 797 |
15.4 Controllability and Observability | p. 799 |
15.5 Computational Power of Recurrent Networks | p. 804 |
15.6 Learning Algorithms | p. 806 |
15.7 Back Propagation Through Time | p. 808 |
15.8 Real-Time Recurrent Learning | p. 812 |
15.9 Vanishing Gradients in Recurrent Networks | p. 818 |
15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators | p. 822 |
15.11 Computer Experiment: Dynamic Reconstruction of Mackay-Glass Attractor | p. 829 |
15.12 Adaptivity Considerations | p. 831 |
15.13 Case Study: Model Reference Applied to Neurocontrol | p. 833 |
15.14 Summary and Discussion | p. 835 |
Notes and References | p. 839 |
Problems | p. 842 |
Bibliography | p. 845 |
Index | p. 889 |