Resiliency to memory failures in computer systems

    公开(公告)号:US10324792B2

    公开(公告)日:2019-06-18

    申请号:US15625957

    申请日:2017-06-16

    Applicant: Cray Inc.

    Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

    Forward inferencing of facts in parallel

    公开(公告)号:US10296834B2

    公开(公告)日:2019-05-21

    申请号:US14458509

    申请日:2014-08-13

    Applicant: Cray Inc.

    Abstract: A method and system for inferring facts in parallel in a multiprocessor computing environment is provided. An inference system infers facts by applying rules to a collection of existing facts. For each existing fact, the inference system schedules a thread to apply the rules to that existing fact. As a thread infers a new fact (i.e., one that is not already in the collection of facts), the thread adds that inferred fact to the collection of facts. When a thread adds a new fact to the collection, the thread also applies the rules to that new fact. After the threads complete execution, the inference system may apply the rules to the facts of the collection, including the newly inferred facts, by again launching a thread for each fact to apply the rules to that fact. The inference system performs this processing iteratively until a termination condition is satisfied.

    Method for impedance compensation in printed circuit boards

    公开(公告)号:US10154581B2

    公开(公告)日:2018-12-11

    申请号:US15428865

    申请日:2017-02-09

    Applicant: Cray Inc.

    Abstract: The various structures forming communication paths on a printed circuit board can create several undesired effects, especially when high frequency signals are considered. Non-functional pads created during the manufacturing process have the potential to create an undesired effect, but when the overall collection of non-functional pads are carefully configured, an optimized communication path can be formed. More specifically, by selectively removing some collection of the non-functional pads, the high frequency characteristics of the communication paths can be optimized.

    PCB TRANSMISSION LINES HAVING REDUCED LOSS
    96.
    发明申请

    公开(公告)号:US20180160526A1

    公开(公告)日:2018-06-07

    申请号:US15370498

    申请日:2016-12-06

    Applicant: Cray Inc.

    Inventor: Andy Becker

    Abstract: Signal transmission structures within a printed circuit are formed to have reduced loss by making specific accommodations to reduce the surface roughness of an adjacent power plane, and thereby reducing the effects of magnetically induced currents. The power plane structure will retain sufficient surface roughness to accommodate manufacturing operations, while also contributing to reduced signal transmission losses in the adjacent signal transmission structure. The transmission structures thereby being capable of more efficiently transmitting high speed signals without undesired attenuation and loss.

    APPLICATION RAMP RATE CONTROL IN LARGE INSTALLATIONS

    公开(公告)号:US20170177070A1

    公开(公告)日:2017-06-22

    申请号:US14978990

    申请日:2015-12-22

    Applicant: Cray Inc.

    CPC classification number: G06F1/3228 G06F1/329 G06F9/5094 Y02D10/24

    Abstract: To eliminate the adverse effects of power swings in a large scale computing system during the life cycle of an application or job, control of several operating characteristics for the collective group of processors is provided. By providing certain levels of coordination for the many processors utilized in large scale computing systems, significant and abrupt changes in power needs can be avoided. In certain circumstances, this may involve limiting the transition between several C-States of the processors involved and the overall power transitions for a large scale system are not detrimental and do not create issues for the data center or local power utility. Some cases will require stepped transitions between C-States, while other cases will include both stepped and modulated transitions. Other cases will incorporate random wait times at the various transitions in order to spread the power consumption involved. In yet further circumstances the C-States can be pinned to a specific setting, thus avoiding transitions caused by C-State transitions. To deal with further issues, the processor P-States can also be overridden.

    RESILIENCY TO MEMORY FAILURES IN COMPUTER SYSTEMS
    99.
    发明申请
    RESILIENCY TO MEMORY FAILURES IN COMPUTER SYSTEMS 有权
    计算机系统中存储器故障的恢复

    公开(公告)号:US20170068596A1

    公开(公告)日:2017-03-09

    申请号:US15357448

    申请日:2016-11-21

    Applicant: Cray Inc.

    Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

    Abstract translation: 弹性系统使用先前存储的纠错信息来检测和校正由计算系统的存储器系统报告的存储器错误。 当程序将数据存储到存储器位置时,在计算系统上执行的弹性系统生成并存储纠错信息。 当程序然后执行加载指令以从存储器位置检索数据时,如果没有存储器错误,则加载指令正常完成。 然而,如果存在内存错误,则计算系统将控制权传给弹性系统(例如,经由陷阱)来处理存储器错误。 弹性系统检索存储器位置的纠错信息并重新创建存储器位置的数据。 弹性系统存储数据,就好像加载指令已经正常完成,并将控制权传给程序的下一条指令。

    Increasingly minimal bias routing
    100.
    发明授权
    Increasingly minimal bias routing 有权
    越来越少的偏差路由

    公开(公告)号:US09577918B2

    公开(公告)日:2017-02-21

    申请号:US13681058

    申请日:2012-11-19

    Applicant: Cray Inc.

    CPC classification number: H04L47/11 H04L45/12 H04L45/122 H04L45/20 H04L45/54

    Abstract: A system and algorithm configured to generate diversity at the traffic source so that packets are uniformly distributed over all of the available paths, but to increase the likelihood of taking a minimal path with each hop the packet takes. This is achieved by configuring routing biases so as to prefer non-minimal paths at the injection point, but increasingly prefer minimal paths as the packet proceeds, referred to herein as Increasing Minimal Bias (IMB).

    Abstract translation: 配置成在流量源处生成分集的系统和算法,使得分组在所有可用路径上均匀分布,但是增加了在分组所需的每一跳中采取最小路径的可能性。 这通过配置路由偏移来实现,以便优选在注入点处的非最小路径,但是随着分组进行而越来越倾向于最小路径,这里称为增加最小偏差(IMB)。

Patent Agency Ranking