Patent search ap:("IBM") AND inv:"BLUMRICH MATTHIAS A" Page 2

11.

发明申请
METHOD AND APPARATUS FOR FILTERING SNOOP REQUESTS USING STREAM REGISTERS 审中-公开
Title translation: 使用流记录器过滤SNOOP请求的方法和装置

公开(公告)号：WO2006104747A3

公开(公告)日：2007-12-21

申请号：PCT/US2006010038

申请日：2006-03-17

Applicant: IBM , BLUMRICH MATTHIAS A , GARA ALAN G , SALAPURA VALENTINA

Inventor： BLUMRICH MATTHIAS A , GARA ALAN G , SALAPURA VALENTINA

IPC: G06F13/28

CPC classification number: G06F12/0831 , G06F12/0822 , G06F2212/507 , Y02D10/13

Abstract: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having a local cache memory associated therewith. A snoop filter device is associated with each processing unit and includes at least one snoop filter primitive implementing filtering method based on usage of stream registers sets and associated stream register comparison logic. From the plurality of stream registers sets, at least one stream register set is active, and at least one stream register set is labeled historic at any point in time. In addition, the snoop filter block is operatively coupled with cache wrap detection logic whereby the content of the active stream register set is switched into a historic stream register set upon the cache wrap condition detection, and the content of at least one active stream register set is reset. Each filter primitive implements stream register comparison logic that determines whether a received snoop request is to be forwarded to the processor or discarded.

Abstract translation: 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置，每个处理单元具有与其相关联的本地高速缓冲存储器。窥探过滤设备与每个处理单元相关联并且包括至少一个基于流寄存器集合和相关流寄存器比较逻辑的使用实现过滤方法的窥探过滤器原语。从多个流寄存器组中，至少一个流寄存器组是有效的，并且至少一个流寄存器集合在任何时间点被标记为历史。另外，监听滤波器块可操作地与高速缓存包检测逻辑耦合，从而将活动流寄存器集合的内容切换到在高速缓存环绕条件检测时设置的历史流寄存器，并且至少一个活动流寄存器集合的内容被复位。每个滤波器基元实现流寄存器比较逻辑，其确定接收的窥探请求是否被转发到处理器或丢弃。

12.

发明专利
CLASS NETWORK ROUTING 未知

公开(公告)号：CA2436413C

公开(公告)日：2011-09-27

申请号：CA2436413

申请日：2002-02-25

Applicant: IBM

Inventor： BHANOT GYAN V , BLUMRICH MATTHIAS A , CHEN DONG , COTEUS PAUL W , GARA ALAN G , GIAMPAPA MARK E , HEIDELBERGER PHILIP , STEINMACHER-BUROW BURKHARD D , TAKKEN TODD E , VRANAS PAVLOS M

IPC: G06F11/10 , H04L1/18 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04J3/02 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

Abstract: Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast, Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.

13.

发明专利
ARITHMETIC FUNCTIONS IN TORUS AND TREE NETWORKS 未知

公开(公告)号：CA2437629A1

公开(公告)日：2002-09-06

申请号：CA2437629

申请日：2002-02-25

Applicant: IBM

Inventor： VRANAS PAVLOS M , CHEN DONG , BLUMRICH MATTHIAS A , BHANOT GYAN V , STEINMACHER-BUROW BURKHARD D , HEIDELBERGER PHILIP , GIAMPAPA MARK E , GARA ALAN G

IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

Abstract: Methods and systems for performing arithmetic functions. In accordance with a first aspect of the invention, methods and apparatus are provided, working i n conjunction of software algorithms and hardware implementation of class network routing, to achieve a very significant reduction in the time require d for global arithmetic operation on the torus. Therefore, it leads to greater scalability of applications running on large parallel machines. The inventio n involves three steps in improving the efficiency and accuracy of global operations: (1) Ensuring, when necessary, that all the nodes do the global operation on the data in the same order and so obtain a unique answer, independent of roundoff error; (2) Using the topology of the torus to minimi ze the number of hops and the bidirectional capabilities of the network to redu ce the number of time steps in the data transfer operation to an absolute minimum; and (3) Using class function routing to reduce latency in the data transfer. With the method of this invention, every single element is injecte d into the network only once and it will be stored and forwarded without any further software overhead. In accordance with a second aspect of the invention, methods and systems are provided to efficiently implement global arithmetic operations on a network that supports the global combining operations. The latency of doing such global operations are greatly reduced by using these methods (Figure 4, node0, node1, node2, node3).

14.

发明专利
GLOBAL INTERRUPT AND BARRIER NETWORKS 未知

公开(公告)号：CA2437035C

公开(公告)日：2009-01-06

申请号：CA2437035

申请日：2002-02-25

Applicant: IBM

Inventor： COTEUS PAUL W , CHEN DONG , BLUMRICH MATTHIAS A , GARA ALAN G , HEIDELBERGER PHILIP , TAKKEN TODD E , GIAMPAPA MARK E , STEINMACHER-BUROW BURKHARD D , KOPSCAY GERARD V

IPC: G06F11/10 , G06F15/173 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/177 , G06F15/76 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

Abstract: A system and method for generating global asynchronous signals in a computin g structure. Particularly, a global interrupt and barrier network is implement ed that implements logic for generating global interrupt and barrier signals fo r controlling global asynchronous operations perfomed by processing elements a t selected processing nodes (12) of computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes (12) for communicating the global interrupt and barrier signals to the elements via low latency paths. The global asynchronous signa ls respectively initiate interrupt and barrier operations at the processing nod es (12) at times selected for otpimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structur e comprising a plurality of processing nodes interconnected by multiple independent networks.

15.

发明专利
MANAGING COHERENCE VIA PUT/GET WINDOWS 未知

公开(公告)号：CA2437663A1

公开(公告)日：2002-09-06

申请号：CA2437663

申请日：2002-02-25

Applicant: IBM

Inventor： CHEN DONG , GIAMPAPA MARK E , HEIDELBERGER PHILIP , OHMACHT MARTIN , HOENICKE DIRK , BLUMRICH MATTHIAS A , COTEUS PAUL W , GARA ALAN G

IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , G06F15/16

Abstract: A method and apparatus for managing coherence between two processors of a tw o processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtua l memory that (a) does not actually exist, and (b) is therefore able to respon d instantly to read and write requests from the processing elements.

16.

发明专利
LOW LATENCY MEMORAY SYSTEM ACCESS 未知

公开(公告)号：CA2436474A1

公开(公告)日：2002-09-06

申请号：CA2436474

申请日：2002-02-25

Applicant: IBM

Inventor： COTEUS PAUL W , BLUMRICH MATTHIAS A , CHEN DONG , GARA ALAN G , HOENICKE DIRK , OHMACHT MARTIN , VRANAS PAVLOS M , TAKKEN TODD E , STEINMARCHER-BUROW BURKHARD D , GIAMPAPA MARK E , HEIDELBERGER PHILIP

IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F12/14 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

Abstract: A low latency memory system access is provided in association with a weakly- ordered multiprocessor system(Fig.1). Each processor(12-1, 12-2) in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device(10) that provides support for synchronization between the multiple processors(12-1, 12-2) in the multiprocessor and the orderly sharing of the resources. A processor(12-1, 12-2) only has permissio n to access a resource when it owns the lock associated with that resource, an d an attempt by a processor(12-1, 12-2) to own a l ock requires only a single load operation, rather than a traditional atomic load followed by store, suc h that the processor(12-1, 12-2) only performs a read operation and the hardwa re locking device(10) performs a subsequent write operation rather than the processor(12-1, 12-2).

17.

发明专利
NEUARTIGER MASSIVPARALLELER SUPERCOMPUTER 未知

公开(公告)号：DE60237433D1

公开(公告)日：2010-10-07

申请号：DE60237433

申请日：2002-02-25

Applicant: IBM

Inventor： BLUMRICH MATTHIAS A , CHEN DONG , CHIU GEORGE L , CIPOLLA THOMAS M , COTEUS PAUL W , GARA ALAN G , GIAMPAPA MARK E , HEIDELBERGER PHILIP , KOPSCAY GERALD V , MOK LAWRENCE S , TAKKEN TODD E

IPC: G06F11/10 , G06F15/16 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/00 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

18.

发明专利
未知

公开(公告)号：AT479147T

公开(公告)日：2010-09-15

申请号：AT02733807

申请日：2002-02-25

Applicant: IBM

Inventor： BLUMRICH MATTHIAS A , CHEN DONG , CHIU GEORGE L , CIPOLLA THOMAS , COTEUS PAUL , GARA ALAN , GIAMPAPA MARK , HEIDELBERGER PHILIP , KOPSCAY GERALD , MOK LAWRENCE , TAKKEN TODD

IPC: G06F11/10 , G06F15/16 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/00 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.

19.

发明专利
OPTIMIZED SCALABALE NETWORK SWITCH 未知

公开(公告)号：CA2438195A1

公开(公告)日：2002-10-24

申请号：CA2438195

申请日：2002-02-25

Applicant: IBM

Inventor： GARA ALAN G , GIAMPAPA MARK E , HEIDELBERGER PHILIP , CHEN DONG , BLUMRICH MATTHIAS A , COTEUS PAUL W , STEINMACHER-BUROW BURKHARD D , TAKKEN TODD E , VRANAS PAVLOS M

IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , G06F15/16 , H04L12/54

Abstract: In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors (115, 154) containing information derived from downstream nodes. A multileve l arbitration process (116, 155) in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers (130, 140), is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors (115, 154). This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.

20.

发明专利
A NOVEL MASSIVELY PARRALLEL SUPERCOMPUTER 未知

公开(公告)号：CA2437039A1

公开(公告)日：2002-10-24

申请号：CA2437039

申请日：2002-02-25

Applicant: IBM

Inventor： BLUMRICH MATTHIAS A , CHIU GEORGE L , TAKKEN TODD E , CIPOLLA THOMAS M , CHEN DONG , MOK LAWRENCE S , COTEUS PAUL W , GARA ALAN G , KOPSCAY GERALD V , HEIDELBERGER PHILIP , GIAMPAPA MARK E

IPC: G06F11/10 , G06F9/46 , G06F9/52 , G06F11/00 , G06F11/20 , G06F12/00 , G06F12/02 , G06F12/08 , G06F12/10 , G06F13/00 , G06F13/24 , G06F13/38 , G06F15/173 , G06F15/177 , G06F15/80 , G06F17/14 , H04L1/00 , H04L7/02 , H04L7/033 , H04L12/28 , H04L12/56 , H04L25/02 , H05K7/20 , G06F15/16 , G06F15/00

Abstract: A novel massively parallel supercomputer of hundreds of teraOPS-scale includ es node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes (20) are interconnected by multiple independent networks (26) that optimally maximizes packet communications throughput and minimizes latency. The multiple networks may include three high-speed networ ks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification