|
|
|
1 | (24) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.1 Parallel Processing for Simulating ANNs |
|
|
1 | (4) |
|
1.1.1 Performance Metrics |
|
|
2 | (1) |
|
1.1.2 General Aspects of Parallel Processing |
|
|
2 | (3) |
|
1.2 Classification of ANN Models |
|
|
5 | (1) |
|
1.3 ANN Models Covered in This Book |
|
|
5 | (20) |
|
1.3.1 Multilayer Feed-Forward Networks with BP Learning |
|
|
7 | (6) |
|
|
|
13 | (2) |
|
1.3.3 Multilayer Recurrent Networks |
|
|
15 | (1) |
|
1.3.4 Adaptive Resonance Theory (ART) Networks |
|
|
16 | (1) |
|
1.3.5 Self-Organizing Map (SOM) Networks |
|
|
17 | (1) |
|
1.3.6 Processor Topologies and Hardware Platforms |
|
|
18 | (7) |
|
2 A Review of Parallel Implementations of Backpropagation Neural Networks |
|
|
25 | (40) |
|
|
|
|
|
|
|
|
|
|
|
|
|
25 | (1) |
|
2.2 Parallelization of Feed-Forward Neural Networks |
|
|
25 | (29) |
|
2.2.1 Distributed Computing for Each Degree of BP Parallelism |
|
|
26 | (3) |
|
2.2.2 A Survey of Different Parallel Implementations |
|
|
29 | (20) |
|
2.2.3 Neural Network Applications |
|
|
49 | (5) |
|
2.3 Conclusions on Neural Applications and Parallel Hardware |
|
|
54 | (11) |
| I Analysis of Parallel Implementations |
|
65 | (118) |
|
3 Network Parallelism for Backpropagation Neural Networks on a Heterogeneous Architecture |
|
|
67 | (44) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 | (2) |
|
3.2 Heterogeneous Network Topology |
|
|
69 | (1) |
|
3.3 Mathematical Model for the Parallelized BP Algorithm |
|
|
70 | (6) |
|
3.3.1 Timing Diagram for the Parallelized BP Algorithm |
|
|
70 | (5) |
|
3.3.2 Prediction of Iteration Time |
|
|
75 | (1) |
|
3.4 Experimental Validation of the Model Using Benchmark Problems |
|
|
76 | (3) |
|
3.4.1 Benchmark Problems Used for Validation |
|
|
76 | (1) |
|
3.4.2 Validation Setup and Results |
|
|
76 | (3) |
|
3.5 Optimal Distribution of Neurons Among the Processing Nodes |
|
|
79 | (4) |
|
3.5.1 Communication Constraints |
|
|
79 | (1) |
|
3.5.2 Temporal Dependence Constraints |
|
|
80 | (1) |
|
|
|
81 | (1) |
|
3.5.4 Feasibility Constraints |
|
|
82 | (1) |
|
|
|
82 | (1) |
|
3.6 Methods of Solution to the Optimal Mapping Problem |
|
|
83 | (5) |
|
3.6.1 Genetic Algorithmic Solution |
|
|
83 | (3) |
|
3.6.2 Approximate Linear Heuristic (ALH) Solution |
|
|
86 | (1) |
|
3.6.3 Experimental Results |
|
|
87 | (1) |
|
3.7 Statistical Validation of the Optimal Mapping |
|
|
88 | (3) |
|
|
|
91 | (9) |
|
3.8.1 Worthwhileness of Finding Optimal Mappings |
|
|
91 | (3) |
|
3.8.2 Processor Location in a Ring |
|
|
94 | (2) |
|
3.8.3 Cost-Benefit Analysis |
|
|
96 | (1) |
|
3.8.4 Optimal Number of Processors for Homogeneous Processor Arrays |
|
|
97 | (3) |
|
|
|
100 | (1) |
|
A3.1 Theoretical Expressions for Processes in the Parallel BP Algorithm |
|
|
101 | (4) |
|
A3.1.1 Computation Processes |
|
|
101 | (3) |
|
A3.1.2 Communication Processes |
|
|
104 | (1) |
|
|
|
105 | (1) |
|
A3.2.1 Storing the Training Set |
|
|
105 | (1) |
|
A3.2.2 Storing the Neural Network Parameters |
|
|
105 | (1) |
|
A3.2.3 Overall Memory Requirement |
|
|
106 | (1) |
|
A3.3 Elemental Timings for T805 Transputers |
|
|
106 | (5) |
|
4 Training-Set Parallelism for Backpropagation Neural Networks on a Heterogeneous Architecture |
|
|
111 | (24) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 | (1) |
|
4.2 Parallelization of BP Algorithm |
|
|
112 | (7) |
|
4.2.1 Process Synchronization Graph |
|
|
114 | (3) |
|
4.2.2 Variable Synchronization Graph |
|
|
117 | (1) |
|
4.2.3 Predicting the Epoch Time |
|
|
117 | (2) |
|
4.3 Experimental Validation of the Model Using Benchmark Problems |
|
|
119 | (1) |
|
4.4 Optimal Distribution of Patterns Among the Processing Nodes |
|
|
120 | (4) |
|
4.4.1 Communication Constraints |
|
|
121 | (1) |
|
4.4.2 Temporal Dependence Constraints |
|
|
121 | (1) |
|
|
|
122 | (1) |
|
4.4.4 Feasibility Constraints |
|
|
123 | (1) |
|
4.4.5 Feasibility of Pattern Assignments |
|
|
123 | (1) |
|
4.4.6 Feasibility of Waiting |
|
|
123 | (1) |
|
|
|
123 | (1) |
|
4.5 Genetic Algorithmic Solution to the Optimal Mapping Problem |
|
|
124 | (1) |
|
4.5.1 Experimental Results |
|
|
125 | (1) |
|
4.6 Statistical Validation of the Optimal Mapping |
|
|
125 | (1) |
|
|
|
126 | (2) |
|
4.7.1 Worthwhileness of Finding Optimal Distribution |
|
|
126 | (1) |
|
4.7.2 Processor Location in a Ring |
|
|
127 | (1) |
|
|
|
128 | (1) |
|
4.9 Process Decomposition |
|
|
129 | (1) |
|
|
|
130 | (5) |
|
4.10.1 Storing the Network Parameters |
|
|
130 | (1) |
|
4.10.2 Storing the Training Set |
|
|
130 | (1) |
|
4.10.3 Memory Required for the Forward Pass of the Backpropagation |
|
|
130 | (1) |
|
4.10.4 Memory Required for the Backward Pass of the Backpropagation |
|
|
131 | (1) |
|
4.10.5 Temporary Memory Storage during Weight Changes Transfer |
|
|
131 | (1) |
|
4.10.6 Overall Memory Requirement |
|
|
131 | (4) |
|
5 Parallel Real-Time Recurrent Algorithm for Training Large Fully Recurrent Neural Networks |
|
|
135 | (22) |
|
|
|
|
|
|
|
|
|
|
|
|
|
135 | (1) |
|
|
|
136 | (4) |
|
5.2.1 The Real-Time Recurrent Learning Algorithm |
|
|
136 | (3) |
|
5.2.2 Matrix Formulation of the RTRL Algorithm |
|
|
139 | (1) |
|
5.3 Parallel RTRL Algorithm Derivation |
|
|
140 | (9) |
|
5.3.1 The Retrieving Phase |
|
|
140 | (3) |
|
|
|
143 | (6) |
|
5.4 Training Very Large RNNs on Fixed-Size Ring Arrays |
|
|
149 | (5) |
|
5.4.1 Partitioning for the Retrieving Phase |
|
|
149 | (1) |
|
5.4.2 Partitioning for the Learning Phase |
|
|
150 | (1) |
|
5.4.3 A Transputer-Based Implementation |
|
|
150 | (4) |
|
|
|
154 | (3) |
|
6 Parallel Implementation of ART1 Neural Networks on Processor Ring Architectures |
|
|
157 | (26) |
|
|
|
|
|
|
|
|
|
|
|
|
|
157 | (1) |
|
6.2 ART1 Network Architecture |
|
|
158 | (3) |
|
|
|
161 | (3) |
|
6.4 Parallel Ring Algorithm |
|
|
164 | (6) |
|
6.4.1 Partitioning Strategy |
|
|
169 | (1) |
|
|
|
170 | (3) |
|
6.5.1 The MEIKO Computing Surface System |
|
|
170 | (1) |
|
6.5.2 Performance and Scalability Analysis |
|
|
171 | (2) |
|
|
|
173 | (10) |
| II Implementations on a Big General-Purpose Parallel Computer |
|
183 | (48) |
|
7 Implementation of Backpropagation Neural Networks on Large Parallel Computers |
|
|
185 | (46) |
|
|
|
|
|
|
|
|
|
|
|
|
|
185 | (1) |
|
7.2 Hardware for Running Neural Networks |
|
|
186 | (2) |
|
|
|
186 | (1) |
|
7.2.2 Neural Network Applications Used in This Work |
|
|
187 | (1) |
|
7.2.3 Experimental Conditions in This Work |
|
|
188 | (1) |
|
7.3 General Mapping onto 2D-Torus MIMD Computers |
|
|
188 | (13) |
|
7.3.1 The Proposed Mapping Scheme |
|
|
189 | (7) |
|
7.3.2 Heuristic for Selection of the Best Mapping |
|
|
196 | (5) |
|
|
|
201 | (1) |
|
7.4 Results on the General BP Mapping |
|
|
201 | (24) |
|
|
|
201 | (15) |
|
7.4.2 Sonar Target Classification |
|
|
216 | (5) |
|
7.4.3 Speech Recognition Network |
|
|
221 | (1) |
|
|
|
221 | (4) |
|
7.5 Conclusions on the Application Adaptable Mapping |
|
|
225 | (6) |
| III Special Parallel Architectures and Application Case Studies |
|
231 | |
|
8 Massively Parallel Architectures for Large-Scale Neural Network Computations |
|
|
233 | (38) |
|
|
|
|
|
|
|
|
233 | (2) |
|
|
|
235 | (2) |
|
8.3 Toroidal Lattice and Planar Lattice Architectures of Virtual Processors |
|
|
237 | (1) |
|
8.4 The Simulation of a Hopfield Neural Network |
|
|
237 | (5) |
|
8.4.1 The Simulation of an HNN on TLA |
|
|
238 | (3) |
|
8.4.2 The Simulation of an HNN on PLA |
|
|
241 | (1) |
|
8.5 The Simulation of a Multilayer Perceptron |
|
|
242 | (3) |
|
8.6 Mapping onto Physical Node Processors from Virtual Processors |
|
|
245 | (5) |
|
8.7 Load Balancing of Node Processors |
|
|
250 | (1) |
|
8.8 Estimation of the Performance |
|
|
251 | (4) |
|
|
|
255 | (4) |
|
|
|
259 | (2) |
|
A8.1 Load Balancing Mapping Algorithm |
|
|
261 | (2) |
|
A8.2 Processing Time of the NP Array |
|
|
263 | (8) |
|
9 Regularly Structured Neural Networks on the DREAM Machine |
|
|
271 | (32) |
|
|
|
|
|
|
|
|
|
|
|
|
|
271 | (1) |
|
9.2 Mapping Method Preliminaries |
|
|
272 | (7) |
|
9.2.1 Neural Network Computation and Structure |
|
|
272 | (2) |
|
9.2.2 Implementing Neural Networks on the Ring Systolic Architecture |
|
|
274 | (2) |
|
9.2.3 System Utilization Characteristic of the Mapping onto the Ring Systolic Architecture |
|
|
276 | (1) |
|
9.2.4 Execution Rate Characteristics of the Mapping onto the Ring Systolic Architecture |
|
|
277 | (1) |
|
9.2.5 Mapping Multilayer Neural Networks onto the Ring Systolic Architecture |
|
|
278 | (1) |
|
9.2.6 Deficiencies of the Mapping onto the Ring Systolic Architecture |
|
|
278 | (1) |
|
9.3 DREAM Machine Architecture |
|
|
279 | (4) |
|
9.3.1 System Level Overview |
|
|
279 | (1) |
|
9.3.2 Processor-Memory Interface |
|
|
280 | (1) |
|
9.3.3 Implementing a Table Lookup Mechanism on the DREAM Machine |
|
|
281 | (1) |
|
9.3.4 Interprocessor Communication Network |
|
|
282 | (1) |
|
9.4 Mapping Structured Neural Networks onto the DREAM Machine |
|
|
283 | (10) |
|
9.4.1 General Mapping Problems |
|
|
283 | (1) |
|
9.4.2 The Algorithmic Mapping Method and Its Applicability |
|
|
284 | (1) |
|
9.4.3 Using Variable Length Rings to Implement Neural Network Processing |
|
|
285 | (2) |
|
9.4.4 Implementing Multilayer Networks |
|
|
287 | (1) |
|
9.4.5 Implementing Backpropagation Learning Algorithms |
|
|
288 | (1) |
|
9.4.6 Implementing Blocked Connected Networks |
|
|
289 | (2) |
|
9.4.7 Implementing Neural Networks Larger Than the Processor Array |
|
|
291 | (1) |
|
9.4.8 Batch-Mode Implementation |
|
|
291 | (1) |
|
9.4.9 Implementing Competitive Learning |
|
|
292 | (1) |
|
9.5 Implementation Examples and Performance Evaluation |
|
|
293 | (4) |
|
|
|
294 | (1) |
|
9.5.2 Implementing Fully Connected Multilayer Neural Networks |
|
|
294 | (1) |
|
9.5.3 Implementing a Block-Connected Multilayer Neural Network |
|
|
295 | (1) |
|
9.5.4 Implementing a Fully Connected Single Layer Network |
|
|
295 | (2) |
|
|
|
297 | (6) |
|
10 High-Performance Parallel Backpropagation Simulation with On-Line Learning |
|
|
303 | (42) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
303 | (1) |
|
10.2 The MUSIC Parallel Supercomputer |
|
|
304 | (2) |
|
|
|
304 | (2) |
|
10.2.2 System Programming |
|
|
306 | (1) |
|
10.3 Backpropagation Implementation |
|
|
306 | (4) |
|
10.3.1 The Backpropagation Algorithm |
|
|
306 | (1) |
|
|
|
307 | (3) |
|
10.4 Performance Analysis |
|
|
310 | (2) |
|
|
|
311 | (1) |
|
|
|
311 | (1) |
|
10.4.3 Performance Results |
|
|
312 | (1) |
|
10.5 The NeuroBasic Parallel Simulation Environment |
|
|
312 | (4) |
|
|
|
313 | (1) |
|
10.5.2 An Example Program |
|
|
314 | (2) |
|
10.5.3 Performance versus Programming Time |
|
|
316 | (1) |
|
10.6 Examples of Practical Research Work |
|
|
316 | (17) |
|
10.6.1 Neural Networks in Photofinishing |
|
|
316 | (11) |
|
10.6.2 The Truck Backer-Upper |
|
|
327 | (6) |
|
10.7 Analysis of RISC Performance for Backpropagation |
|
|
333 | (6) |
|
|
|
334 | (1) |
|
10.7.2 Linearization of the Instruction Stream |
|
|
334 | (1) |
|
10.7.3 Reduction of Load/Store Operations |
|
|
335 | (1) |
|
10.7.4 Improvement of the Internal Instruction Stream Parallelism |
|
|
336 | (2) |
|
|
|
338 | (1) |
|
|
|
339 | (6) |
|
11 Training Neural Networks with SPERT-II |
|
|
345 | (20) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
345 | (1) |
|
11.2 Algorithm Development |
|
|
346 | (1) |
|
11.3 TO: A Vector Microprocessor |
|
|
347 | (2) |
|
11.4 The SPERT-II Workstation Accelerator |
|
|
349 | (2) |
|
11.5 Mapping Backpropagation to SPERT-II |
|
|
351 | (4) |
|
11.6 Mapping Kohonen Nets to SPERT-II |
|
|
355 | (2) |
|
|
|
357 | (8) |
|
|
|
365 | (2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
367 | |