-
公开(公告)号:US20190121784A1
公开(公告)日:2019-04-25
申请号:US15886138
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Stephen Felix , Richard Luke Southwell Osborne , Simon Christian Knowles , Alan Graham Alexander , Ian James Quinn
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
-
公开(公告)号:US20190121616A1
公开(公告)日:2019-04-25
申请号:US15886505
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Stephen Felix , Godfrey Da Costa
Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.
-
公开(公告)号:US12273268B2
公开(公告)日:2025-04-08
申请号:US18159387
申请日:2023-01-25
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Stephen Felix , Daniel John Pelham Wilkinson
IPC: H04L45/60 , G06F13/16 , G06F15/163 , G06F15/78 , H04L45/00
Abstract: A memory attachment and routing chip includes a single die having a set of external ports; at least one memory attachment interface comprising a memory controller to attach to external memory, and a fabric core in which routing logic is implemented. The routing logic can (i) receive a first packet of a first type from a first port of the set of ports, the first type of packet being a memory access packet with a memory address which lies in a range of memory addresses associated with the memory attachment and routing chip, detect the memory address and route the packet of the first type to the memory attachment interface. The routing logic can (ii) receive a second packet of a second type, the second type of packet being an inter-processor packet comprising a destination identifier identifying a processing chip external to the memory attachment.
-
公开(公告)号:US12019527B2
公开(公告)日:2024-06-25
申请号:US17447369
申请日:2021-09-10
Applicant: Graphcore Limited
Inventor: Stephen Felix
CPC classification number: G06F11/2002 , G06F11/0724 , G06F11/076 , G06F11/0763 , G06F11/0793 , G06F11/2028 , G06F11/2035
Abstract: A processor comprises a plurality of processing units, wherein there is a fixed transmission time for transmitting a message from a sending processing unit to a receiving processing unit, based on the physical positions of the sending and receiving processing units in the processor. The processing units are arranged in a column, and the fixed transmission time depends on the position of a processing circuit in the column. An exchange fabric is provided for exchanging messages between sending and receiving processing units, the columns being arranged with respect to the exchange fabric such that the fixed transmission time depends on the distances of the processing circuits with respect to the exchange fabric.
-
公开(公告)号:US11625061B2
公开(公告)日:2023-04-11
申请号:US17349488
申请日:2021-06-16
Applicant: Graphcore Limited
Inventor: Simon Douglas Chambers , Stephen Felix , Ian Malcolm King
Abstract: Two clocks, a fast clock and a slow clock are provided for clocking a processing unit. A plurality of frequency settings, referred to as gears, are defined for the two clock. Each of these gears indicates a maximum frequency for the fast clock and a minimum frequency for the slow clock, such that the gap between the two frequencies may be kept to a manageable level so as to reduce transients upon switching between the two clocks. The system switches between the gears as required. In response to a determination to increase the frequency of the clock signal, a higher gear is selected at which the maximum and minimum frequencies defined for that gear are higher than the previous selected gear.
-
公开(公告)号:US11586483B2
公开(公告)日:2023-02-21
申请号:US17320904
申请日:2021-05-14
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Simon Christian Knowles , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
-
公开(公告)号:US11449117B2
公开(公告)日:2022-09-20
申请号:US16842859
申请日:2020-04-08
Applicant: Graphcore Limited
Inventor: Stephen Felix , Daniel Wilkinson
IPC: G06F1/30 , G06F1/08 , G06F1/3206
Abstract: During normal operation of a processor, voltage droop is likely to occur and there is, therefore, a need for techniques for rapidly addressing this droop so as to reduce the probability of circuit timing failures. This problem is addressed by provided an apparatus that is configured to detect the droop and react to mitigate the droop. The apparatus includes a frequency divider that is configured to receive an output of a clock signal generator (e.g. a phase locked loop) and produce an output signal in which a predefined fraction of the clock pulses in the output of the clock signal generator are removed from the output signal. By reducing the frequency of the clock signal in this way (as may be understood by examining equation 3) VDD is increased, hence mitigating the voltage droop. This technique provides a fast throttling mechanism that prevents excessive VDD droop across the processor.
-
公开(公告)号:US11340868B2
公开(公告)日:2022-05-24
申请号:US16395502
申请日:2019-04-26
Applicant: Graphcore Limited
Inventor: Jonathan Mangnall , Stephen Felix
Abstract: An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.
-
公开(公告)号:US11334320B2
公开(公告)日:2022-05-17
申请号:US16797582
申请日:2020-02-21
Applicant: Graphcore Limited
Inventor: Stephen Felix , Godfrey Da Costa
Abstract: An execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.
-
公开(公告)号:US11321272B2
公开(公告)日:2022-05-03
申请号:US15886131
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
IPC: G06F15/173 , G06F8/41 , G06F9/38 , G06F15/80
Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
-
-
-
-
-
-
-
-
-