-
公开(公告)号:US11748287B2
公开(公告)日:2023-09-05
申请号:US16831617
申请日:2020-03-26
Applicant: Graphcore Limited
Inventor: Simon Knowles , Ola Torudbakken , Lars Paul Huse
CPC classification number: G06F13/4027 , G06F15/17325 , G06F15/8015 , G06N3/04 , G06N3/063 , G06N3/08
Abstract: According to an aspect of the invention, there is provided a computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple stacked layers. Each layer comprises four processing nodes connected by respective links between the processing nodes. In end layers of the stack, the four processing nodes are interconnected in a ring formation by two links between the nodes, the two links adapted to operate simultaneously. Processing nodes in the multiple stacked layers provide four faces, each face comprising multiple layers, each layer comprising a pair of processing nodes. The processing nodes are programmed to operate a configuration to transmit data around embedded one-dimensional rings, each ring formed by processing nodes in two opposing faces.
-
公开(公告)号:US11625356B2
公开(公告)日:2023-04-11
申请号:US17211232
申请日:2021-03-24
Applicant: Graphcore Limited
Inventor: Simon Knowles
IPC: G06F15/17 , G06F15/173 , G06F15/80 , G06F13/40
Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
-
公开(公告)号:US11531637B2
公开(公告)日:2022-12-20
申请号:US17211202
申请日:2021-03-24
Applicant: Graphcore Limited
Inventor: Simon Knowles
IPC: G06F15/173 , G06F15/80 , G06F13/40
Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a toroid configuration in which multiple layers of interconnected nodes are arranged along an axis; each layer comprising a plurality of processing nodes connected in a ring in a non-axial plane by at least an intralayer respective set of links between each pair of neighbouring processing nodes, the links in each set adapted to operate simultaneously; wherein each of the processing nodes in each layer is connected to a respective corresponding node in each adjacent layer by an interlayer link to form respective rings along the axis; the computer programmed to provide a plurality of embedded one-dimensional logical paths and to transmit data around each of the embedded one-dimensional paths in such a manner that the plurality of embedded one-dimensional logical paths operate simultaneously, each logical path using all processing nodes of the computer in a sequence.
-
公开(公告)号:US11169956B2
公开(公告)日:2021-11-09
申请号:US16831590
申请日:2020-03-26
Applicant: Graphcore Limited
Inventor: Simon Knowles , Ola Torudbakken , Stephen Felix , Lars Paul Huse
Abstract: One aspect of the invention provides a computer comprising a plurality of interconnected processing nodes arranged in a ladder configuration comprising a plurality of facing pairs of processing nodes. The processing nodes of each pair are connected to each other by two links. A processing node in each pair is connected to a corresponding processing node in an adjacent pair by at least one link. The processing nodes are programmed to operate the ladder configuration to transmit data around two embedded one-dimensional rings formed by respective sets of processing nodes and links, each ring using all processing nodes in the ladder once only.
-
公开(公告)号:US12248429B2
公开(公告)日:2025-03-11
申请号:US18185880
申请日:2023-03-17
Applicant: Graphcore Limited
Inventor: Simon Knowles
IPC: G06F15/17 , G06F13/40 , G06F15/173
Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
-
公开(公告)号:US12112164B2
公开(公告)日:2024-10-08
申请号:US18176034
申请日:2023-02-28
Applicant: Graphcore Limited
Inventor: Alan Alexander , Simon Knowles , Godfrey Da Costa , Badreddine Noune
IPC: G06F9/30
CPC classification number: G06F9/30014 , G06F9/3013
Abstract: A processing device comprising a plurality of operand registers, wherein a first subset of the operand registers are configured to store state information for a plurality of bins, comprising a range of values and a bin count associated with each respective bin, wherein a second subset of the operand registers is configured to store a vector of floating-point values; and an execution unit configured to execute a first instruction taking the state information for the plurality of bins and the vector of floating-point values as operands, and in response to execution of the first instruction, for each of the floating-point values: identify based on an exponent of the respective floating-point value, each one of the plurality of bins for which the respective floating-point value falls within the associated range of values; and increment the bin count associated with the identified bins.
-
公开(公告)号:US11372791B2
公开(公告)日:2022-06-28
申请号:US16831630
申请日:2020-03-26
Applicant: Graphcore Limited
Inventor: Simon Knowles
Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers is provided. Each layer comprises a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links adapted to operate simultaneously. Nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link. Each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer. Data is transmitted around a plurality of embedded one-dimensional logical rings with an asymmetric bandwidth utilisation, each logical ring using all processing nodes of the computer in such a manner that the plurality of embedded one-dimensional logical rings operate simultaneously.
-
公开(公告)号:US11928523B2
公开(公告)日:2024-03-12
申请号:US17446681
申请日:2021-09-01
Applicant: Graphcore Limited
Inventor: Simon Knowles , Daniel John Pelham Wilkinson , Alan Alexander , Stephen Felix , Richard Osborne , David Lacey , Lars Paul Huse
CPC classification number: G06F9/522 , G06F9/30087 , G06F9/3858 , G06F1/12
Abstract: A multi-tile processing unit in which the tiles in the processing unit may be divided between two or more different external sync groups for performing barrier synchronisations. In this way, different sets of tiles of the same processing unit each sync with different sets of tiles external to that processing unit.
-
公开(公告)号:US11720510B2
公开(公告)日:2023-08-08
申请号:US16831580
申请日:2020-03-26
Applicant: Graphcore Limited
Inventor: Simon Knowles
CPC classification number: G06F13/4027 , G06F15/17325 , G06F15/8015 , G06N3/04 , G06N3/063 , G06N3/08
Abstract: A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected by respective interlayer links to form respective edges. Each pair forms part of a layers, each layer comprising multiple nodes, each node connected to their neighbouring nodes in the layer by at least one of the intralayer links to form a ring. Data is transmitted around paths formed by respective sets of nodes and links, each path having a first portion between a first and second endmost layers, and a second portion provided between the second and first endmost layers and comprising one of the edges.
-
公开(公告)号:US11704270B2
公开(公告)日:2023-07-18
申请号:US17305680
申请日:2021-07-13
Applicant: Graphcore Limited
Inventor: Simon Knowles , Hachem Yassine
IPC: G06F13/40 , H04L49/109
CPC classification number: G06F13/4022 , G06F13/4013 , G06F13/4063 , H04L49/109 , G06F2213/0038
Abstract: A network comprising interconnected first and second processors, each processor comprising one or more of: multiple processing units arranged on a chip configured to execute program code; an on-chip interconnect comprising groups of exchange paths connected to receive data from corresponding groups of the processing units; external interfaces configured to communicate data off-chip as packets, each having a destination address, external interfaces of the first and second processors being connected by an external link; multiple exchange blocks, each connected to groups of the exchange paths; a routing bus configured to route packets between the exchange blocks and the external interfaces. Processing units of the first processor generate off-chip packets such that the group of processing units serviced by the first exchange block on the first processor address off-chip packets to the group of processing units on the second processor serviced by the corresponding first exchange block of the second processor.
-
-
-
-
-
-
-
-
-