Optimized high bandwidth cache coherence mechanism
    31.
    发明申请
    Optimized high bandwidth cache coherence mechanism 有权
    优化高带宽缓存一致性机制

    公开(公告)号:US20040162949A1

    公开(公告)日:2004-08-19

    申请号:US10368090

    申请日:2003-02-18

    Applicant: Cray Inc.

    CPC classification number: G06F12/082 G06F12/0813

    Abstract: A method and apparatus for a coherence mechanism that supports a distributed memory programming model in which processors each maintain their own memory area, and communicate data between them. A hierarchical programming model is supported, which uses distributed memory semantics on top of shared memory nodes. Coherence is maintained globally, but caching is restricted to a local region of the machine (a nullnodenull or nullcaching domainnull). A directory cache is held in an on-chip cache and is multi-banked, allowing very high transaction throughput. Directory associativity allows the directory cache to map contents of all caches concurrently. References off node are converted to non-allocating references, allowing the same access mechanism (a regular load or store) to be used for both for intra-node and extra-node references. Stores (Puts) to remote caches automatically update the caches instead of invalidating the caches, allowing producer/consumer data sharing to occur through cache instead of through main memory.

    Abstract translation: 一种用于相干机制的方法和装置,其支持分布式存储器编程模型,其中处理器各自保持自己的存储区域,并在它们之间传送数据。 支持分层编程模型,其在共享存储器节点之上使用分布式存储器语义。 全局维护一致性,但缓存仅限于机器的本地区域(“节点”或“缓存域”)。 目录缓存保存在片上缓存中,并且是多存储的,允许非常高的事务吞吐量。 目录关联性允许目录缓存同时映射所有缓存的内容。 引用关闭节点被转换为非分配引用,允许将同一访问机制(常规加载或存储)用于节点内和节点外引用。 存储(Puts)到远程缓存自动更新高速缓存,而不是使缓存无效,从而允许生产者/消费者数据共享通过缓存而不是主内存发生。

    METHOD FOR CROSS-TALK REDUCTION TECHNIQUE WITH FINE PITCH VIAS

    公开(公告)号:US20200375024A1

    公开(公告)日:2020-11-26

    申请号:US16882146

    申请日:2020-05-22

    Applicant: CRAY INC.

    Abstract: Systems and methods are provided for reducing crosstalk between differential signals in a printed circuit board (PCB) using fine pitch vias. A pair of contact pads are on the top surface of the PCB and configured to couple a PCB component to the PCB, the contacts a first distance from each other. A first via of a plurality of vias is electrically coupled to a first contact of the pair of contacts and a second via is electrically coupled to a second contact, the first via and second via a second distance from each other, the second distance being less than current standards for minimum via pitch. Each via comprises a via pad on the top surface and a plated through-hole extending from the top surface to a termination point. A separator gap is between the first via and the second via.

    Assisting parallelization of a computer program

    公开(公告)号:US10761820B2

    公开(公告)日:2020-09-01

    申请号:US14978211

    申请日:2015-12-22

    Applicant: Cray Inc.

    Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.

    HIGH-BANDWIDTH PREFETCHER FOR HIGH-BANDWIDTH MEMORY

    公开(公告)号:US20190163637A9

    公开(公告)日:2019-05-30

    申请号:US15913749

    申请日:2018-03-06

    Applicant: Cray Inc.

    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.

    High-bandwidth prefetcher for high-bandwidth memory

    公开(公告)号:US10303610B2

    公开(公告)日:2019-05-28

    申请号:US15913749

    申请日:2018-03-06

    Applicant: Cray Inc.

    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.

    Application ramp rate control in large installations

    公开(公告)号:US10216245B2

    公开(公告)日:2019-02-26

    申请号:US14978990

    申请日:2015-12-22

    Applicant: Cray Inc.

    Abstract: To eliminate the adverse effects of power swings in a large scale computing system during the life cycle of an application or job, control of several operating characteristics for the collective group of processors is provided. By providing certain levels of coordination for the many processors utilized in large scale computing systems, significant and abrupt changes in power needs can be avoided. In certain circumstances, this may involve limiting the transition between several C-States of the processors involved and the overall power transitions for a large scale system are not detrimental and do not create issues for the data center or local power utility. Some cases will require stepped transitions between C-States, while other cases will include both stepped and modulated transitions. Other cases will incorporate random wait times at the various transitions in order to spread the power consumption involved. In yet further circumstances the C-States can be pinned to a specific setting, thus avoiding transitions caused by C-State transitions. To deal with further issues, the processor P-States can also be overridden.

    Increasingly minimal bias routing
    38.
    发明授权

    公开(公告)号:US10142235B2

    公开(公告)日:2018-11-27

    申请号:US15437201

    申请日:2017-02-20

    Applicant: Cray Inc.

    Abstract: A system and algorithm configured to generate diversity at the traffic source so that packets are uniformly distributed over all of the available paths, but to increase the likelihood of taking a minimal path with each hop the packet takes. This is achieved by configuring routing biases so as to prefer non-minimal paths at the injection point, but increasingly prefer minimal paths as the packet proceeds, referred to herein as Increasing Minimal Bias (IMB).

    Resiliency to memory failures in computer systems

    公开(公告)号:US10127109B2

    公开(公告)日:2018-11-13

    申请号:US15625985

    申请日:2017-06-16

    Applicant: Cray Inc.

    Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

Patent Agency Ranking