-
公开(公告)号:US10324792B2
公开(公告)日:2019-06-18
申请号:US15625957
申请日:2017-06-16
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US10296834B2
公开(公告)日:2019-05-21
申请号:US14458509
申请日:2014-08-13
Applicant: Cray Inc.
Inventor: David Mizell , Christopher Douglas Rickett
Abstract: A method and system for inferring facts in parallel in a multiprocessor computing environment is provided. An inference system infers facts by applying rules to a collection of existing facts. For each existing fact, the inference system schedules a thread to apply the rules to that existing fact. As a thread infers a new fact (i.e., one that is not already in the collection of facts), the thread adds that inferred fact to the collection of facts. When a thread adds a new fact to the collection, the thread also applies the rules to that new fact. After the threads complete execution, the inference system may apply the rules to the facts of the collection, including the newly inferred facts, by again launching a thread for each fact to apply the rules to that fact. The inference system performs this processing iteratively until a termination condition is satisfied.
-
公开(公告)号:US10154581B2
公开(公告)日:2018-12-11
申请号:US15428865
申请日:2017-02-09
Applicant: Cray Inc.
Inventor: Andy Becker , Hyunjun Kim , Shawn Utz , Paul Wildes
Abstract: The various structures forming communication paths on a printed circuit board can create several undesired effects, especially when high frequency signals are considered. Non-functional pads created during the manufacturing process have the potential to create an undesired effect, but when the overall collection of non-functional pads are carefully configured, an optimized communication path can be formed. More specifically, by selectively removing some collection of the non-functional pads, the high frequency characteristics of the communication paths can be optimized.
-
公开(公告)号:US10129329B2
公开(公告)日:2018-11-13
申请号:US14881157
申请日:2015-10-13
Applicant: Cray Inc.
Inventor: Edwin L. Froese , Eric P. Lundberg , Igor Gorodetsky , Howard Pritchard , Charles Giefer , Robert L. Alverson , Duncan Roweth
IPC: G06F15/167 , H04L29/08 , G06F8/41 , G06F9/52 , G06F9/54 , H04L12/26 , H04L12/751 , H04L12/715
Abstract: An improved method for the prevention of deadlock in a massively parallel processor (MPP) system wherein, prior to a process sending messages to another process running on a remote processor, the process allocates space in a deadlock-avoidance FIFO. The allocated space provides a “landing zone” for requests that the software process (the application software) will subsequently issue using a remote-memory-access function. In some embodiments, the deadlock-avoidance (DLA) function provides two different deadlock-avoidance schemes: controlled discard and persistent reservation. In some embodiments, the software process determines which scheme will be used at the time the space is allocated.
-
公开(公告)号:US20180165209A1
公开(公告)日:2018-06-14
申请号:US15374114
申请日:2016-12-09
Applicant: Cray, Inc.
Inventor: Heidi Lynn Poxon , William Homer , David W. Oehmke , Luiz DeRose , Clayton D. Andreasen , Sanyam Mehta
IPC: G06F12/0871 , G06F12/02
CPC classification number: G06F12/0871 , G06F8/41 , G06F8/443 , G06F12/023 , G06F12/0284 , G06F12/08 , G06F2201/885 , G06F2212/1016 , G06F2212/1021 , G06F2212/465 , G06F2212/601 , G06F2212/604
Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
-
公开(公告)号:US20180160526A1
公开(公告)日:2018-06-07
申请号:US15370498
申请日:2016-12-06
Applicant: Cray Inc.
Inventor: Andy Becker
CPC classification number: H01P3/026 , H01P11/003 , H05K1/0219 , H05K1/0237 , H05K1/025 , H05K2203/0307
Abstract: Signal transmission structures within a printed circuit are formed to have reduced loss by making specific accommodations to reduce the surface roughness of an adjacent power plane, and thereby reducing the effects of magnetically induced currents. The power plane structure will retain sufficient surface roughness to accommodate manufacturing operations, while also contributing to reduced signal transmission losses in the adjacent signal transmission structure. The transmission structures thereby being capable of more efficiently transmitting high speed signals without undesired attenuation and loss.
-
公开(公告)号:US09910731B2
公开(公告)日:2018-03-06
申请号:US15357448
申请日:2016-11-21
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
CPC classification number: G06F11/1076 , G06F3/0619 , G06F3/064 , G06F3/067 , G06F3/0673 , G06F11/08 , G06F11/10 , G06F11/1004 , G06F11/1008 , G06F11/1016 , G06F11/1068 , G06F11/1088 , G06F11/14 , G06F11/1402 , G06F11/1405 , G06F11/141 , G06F11/1479 , G06F11/1662 , G06F11/202 , G06F11/2023 , G06F11/2035 , G06F2201/805 , G06F2201/82
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
-
公开(公告)号:US20170177070A1
公开(公告)日:2017-06-22
申请号:US14978990
申请日:2015-12-22
Applicant: Cray Inc.
Inventor: Josh Williams , Steve Martin , Clark Snyder , David Rush , Matthew Kappel
IPC: G06F1/32
CPC classification number: G06F1/3228 , G06F1/329 , G06F9/5094 , Y02D10/24
Abstract: To eliminate the adverse effects of power swings in a large scale computing system during the life cycle of an application or job, control of several operating characteristics for the collective group of processors is provided. By providing certain levels of coordination for the many processors utilized in large scale computing systems, significant and abrupt changes in power needs can be avoided. In certain circumstances, this may involve limiting the transition between several C-States of the processors involved and the overall power transitions for a large scale system are not detrimental and do not create issues for the data center or local power utility. Some cases will require stepped transitions between C-States, while other cases will include both stepped and modulated transitions. Other cases will incorporate random wait times at the various transitions in order to spread the power consumption involved. In yet further circumstances the C-States can be pinned to a specific setting, thus avoiding transitions caused by C-State transitions. To deal with further issues, the processor P-States can also be overridden.
-
公开(公告)号:US20170068596A1
公开(公告)日:2017-03-09
申请号:US15357448
申请日:2016-11-21
Applicant: Cray Inc.
Inventor: Laurence S. Kaplan , Preston Pengra Briggs, III , Miles Arthur Ohlrich , Willard Huston Leslie
CPC classification number: G06F11/1076 , G06F3/0619 , G06F3/064 , G06F3/067 , G06F3/0673 , G06F11/08 , G06F11/10 , G06F11/1004 , G06F11/1008 , G06F11/1016 , G06F11/1068 , G06F11/1088 , G06F11/14 , G06F11/1402 , G06F11/1405 , G06F11/141 , G06F11/1479 , G06F11/1662 , G06F11/202 , G06F11/2023 , G06F11/2035 , G06F2201/805 , G06F2201/82
Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
Abstract translation: 弹性系统使用先前存储的纠错信息来检测和校正由计算系统的存储器系统报告的存储器错误。 当程序将数据存储到存储器位置时,在计算系统上执行的弹性系统生成并存储纠错信息。 当程序然后执行加载指令以从存储器位置检索数据时,如果没有存储器错误,则加载指令正常完成。 然而,如果存在内存错误,则计算系统将控制权传给弹性系统(例如,经由陷阱)来处理存储器错误。 弹性系统检索存储器位置的纠错信息并重新创建存储器位置的数据。 弹性系统存储数据,就好像加载指令已经正常完成,并将控制权传给程序的下一条指令。
-
公开(公告)号:US09577918B2
公开(公告)日:2017-02-21
申请号:US13681058
申请日:2012-11-19
Applicant: Cray Inc.
Inventor: Abdulla Bataineh , Thomas Court , Duncan Roweth
IPC: H04L12/26 , H04L12/733
CPC classification number: H04L47/11 , H04L45/12 , H04L45/122 , H04L45/20 , H04L45/54
Abstract: A system and algorithm configured to generate diversity at the traffic source so that packets are uniformly distributed over all of the available paths, but to increase the likelihood of taking a minimal path with each hop the packet takes. This is achieved by configuring routing biases so as to prefer non-minimal paths at the injection point, but increasingly prefer minimal paths as the packet proceeds, referred to herein as Increasing Minimal Bias (IMB).
Abstract translation: 配置成在流量源处生成分集的系统和算法,使得分组在所有可用路径上均匀分布,但是增加了在分组所需的每一跳中采取最小路径的可能性。 这通过配置路由偏移来实现,以便优选在注入点处的非最小路径,但是随着分组进行而越来越倾向于最小路径,这里称为增加最小偏差(IMB)。
-
-
-
-
-
-
-
-
-