-
公开(公告)号:CA2437036A1
公开(公告)日:2002-09-06
申请号:CA2437036
申请日:2002-02-25
Applicant: IBM
Inventor: VRANAS PAVLOS M , STEINMACHER-BUROW BURKHARD D , GARA ALAN G , CHEN DONG , BHANOT GYAN V , HEIDELBERGER PHILIP , GIAMPAPA MARK E
IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20
Abstract: The present invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transfor m (FFT) of a multidimensional array comprising a plurality of elements initial ly distributed in a multi-node computer system(100) comprising a plurality of nodes(Q11-Q33) in communication over a network, comprising distributing the plurali ty of elements of the array in a first dimension across the pluralit y of nodes of the computer system over the network to facilitate a first one- dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing t he one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FF T on elements of the array re-distributed at each node in the second dimension , wherein the random order facilitated efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re- distribution of the array elements is further efficiently implemented in applications other that the multidimensional FFT on the distributed-memory parallel supercomputer.
-
公开(公告)号:CA2437035A1
公开(公告)日:2002-09-06
申请号:CA2437035
申请日:2002-02-25
Applicant: IBM
Inventor: CHEN DONG , GARA ALAN G , TAKKEN TODD E , BLUMRICH MATTHIAS A , COTEUS PAUL W , HEIDELBERGER PHILIP , KOPSCAY GERARD V , STEINMACHER-BUROW BURKHARD D , GIAMPAPA MARK E
IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , G06F15/00 , G06F15/76
Abstract: A system and method for generating global asynchronous signals in a computin g structure. Particularly, a global interrupt and barrier network is implement ed that implements logic for generating global interrupt and barrier signals fo r controlling global asynchronous operations perfomed by processing elements a t selected processing nodes (12) of computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes (12) for communicating the global interrupt and barrier signals to the elements via low latency paths. The global asynchronous signa ls respectively initiate interrupt and barrier operations at the processing nod es (12) at times selected for otpimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structur e comprising a plurality of processing nodes interconnected by multiple independent networks.
-
公开(公告)号:CA2437661A1
公开(公告)日:2002-09-06
申请号:CA2437661
申请日:2002-02-25
Applicant: IBM
Inventor: HOENICKE DIRK , BLUMRICH MATTHIAS A , HEIDELBERGER PHILIP , CHEN DONG , TAKKEN TODD E , GIAMPAPA MARK E , GARA ALAN G , COTEUS PAUL W , VRANAS PAVLOS M , STEINMACHER-BUROW BURKHARD D
IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/16 , G06F15/173 , G06F15/177 , G06F15/76 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , G06F15/00 , H04M1/64
Abstract: A system and method for enabling high-speed, low-latency global tree communications among processing nodes interconnected according to a tree network structure. The global tree network (100) optimally enables collectiv e reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices (200) are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations include one or more of: global broadcast operations downstream from a root node (110) to leaf nodes (120) of a virtual tree, global reduction operations upstream from leaf nodes to the root node (110) in the virtual tree, and point-to-point message passing from and any node to th e root node (110) in the virtual tree. One node of the virtual tree network is coupled to and functions as an I/O node for providing I/O functionality with an external system for each node of the virtual tree. The global tree networ k (100) may be configured to provide global barrier and interrupt functionalit y in asynchronous or synchronized manner. Thus, parallel algorithm processing operations, for example,employed in parallel computing systems, may be optimally performed in accordance with certain operating phases of the parallel algorithm operations. When implemented in a massively-parallel supercomputing structure, the global tree network (100) is physically and logically partitionable according to needs of a processing algorithm.
-
公开(公告)号:CA2436413A1
公开(公告)日:2002-09-06
申请号:CA2436413
申请日:2002-02-25
Applicant: IBM
Inventor: CHEN DONG , COTEUS PAUL W , HEIDELBERGER PHILIP , GARA ALAN G , GIAMPAPA MARK E , BLUMRICH MATTHIAS A , BHANOT GYAN V , TAKKEN TODD E , VRANAS PAVLOS M , STEINMACHER-BUROW BURKHARD D
IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , H04L1/18 , H04J3/02
Abstract: Class network routing is emplemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes (Q00-Q22) thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With cla ss network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dens e matrix inversion algorithms on distributed memory parallel supercomputers (Fig. 1) with hardware class function (multicast) capability. This is achiev ed by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware classe functions, which results in faste r execution times.
-
-
-