-
公开(公告)号:US11037050B2
公开(公告)日:2021-06-15
申请号:US16458020
申请日:2019-06-29
Applicant: Intel Corporation
Inventor: Krishna N. Vinod , Sujoyita Kaushikkar , Aniket S. Kakade , Kermin ChoFleming , Ping Zou , Alexey Suprun , Bhavya K. Daya
IPC: G06N3/04 , G06F7/53 , G06F1/3234 , G06F9/22
Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
-
公开(公告)号:US11385873B2
公开(公告)日:2022-07-12
申请号:US17113185
申请日:2020-12-07
Applicant: Intel Corporation
Inventor: Kermin ChoFleming
Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
-
公开(公告)号:US20210165642A1
公开(公告)日:2021-06-03
申请号:US17113185
申请日:2020-12-07
Applicant: Intel Corporation
Inventor: Kermin ChoFleming
Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
-
公开(公告)号:US20220222177A1
公开(公告)日:2022-07-14
申请号:US17710524
申请日:2022-03-31
Applicant: Intel Corporation
Inventor: Kermin ChoFleming , Swapna Raj
IPC: G06F12/0806
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for improving data transfer for heterogeneous programs. An example apparatus includes instructions in the apparatus, and processor circuitry to at least one of execute or instantiate the instructions to determine a runtime associated with executing a code object by a heterogeneous electronic device based on at least one of a location of a memory object or a data transfer penalty, the data transfer penalty associated with access of the memory object in response to execution of the code object, identify a memory operation for the memory object based on the runtime, and generate an executable file based on the memory operation, the executable file, when executed, to cause execution of the code object by at least one of first hardware or second hardware of the heterogeneous electronic device based on the memory operation.
-
5.
公开(公告)号:US20200310994A1
公开(公告)日:2020-10-01
申请号:US16370928
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Kermin ChoFleming , Yu Bai , Simon C. Steely
IPC: G06F13/16 , G06F12/0806 , G06F16/901
Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.
-
6.
公开(公告)号:US20200310797A1
公开(公告)日:2020-10-01
申请号:US16370915
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Jesus Corbal , Rohan Sharma , Simon Steely, JR. , Chinmay Ashok , Kent D. Glossop , Dennis Bradford , Paul Caprioli , Louise Huot , Kermin ChoFleming , Barry Tannenbaum
Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA. In one embodiment, a CSA includes a plurality of processing elements, a circuit switched interconnect network between the plurality of processing elements, and a configuration register within each processing element to store a configuration value having a first portion that, when set to a first value that indicates a first mode, causes the processing element to pass an input value to operation circuitry of the processing element without modifying the input value, and, when set to a second value that indicates a second mode, causes the processing element to perform a swizzle operation on the input value to form a swizzled input value before sending the swizzled input value to the operation circuitry of the processing element, and a second portion that causes the processing element to perform an operation indicated by the second portion the configuration value on the input value in the first mode and the swizzled input value in the second mode with the operation circuitry.
-
7.
公开(公告)号:US10817291B2
公开(公告)日:2020-10-27
申请号:US16370915
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Jesus Corbal , Rohan Sharma , Simon Steely, Jr. , Chinmay Ashok , Kent D. Glossop , Dennis Bradford , Paul Caprioli , Louise Huot , Kermin ChoFleming , Barry Tannenbaum
Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA. In one embodiment, a CSA includes a plurality of processing elements, a circuit switched interconnect network between the plurality of processing elements, and a configuration register within each processing element to store a configuration value having a first portion that, when set to a first value that indicates a first mode, causes the processing element to pass an input value to operation circuitry of the processing element without modifying the input value, and, when set to a second value that indicates a second mode, causes the processing element to perform a swizzle operation on the input value to form a swizzled input value before sending the swizzled input value to the operation circuitry of the processing element, and a second portion that causes the processing element to perform an operation indicated by the second portion the configuration value on the input value in the first mode and the swizzled input value in the second mode with the operation circuitry.
-
8.
公开(公告)号:US20200210358A1
公开(公告)日:2020-07-02
申请号:US16236423
申请日:2018-12-29
Applicant: Intel Corporation
Inventor: Kermin ChoFleming , Simon Steely, JR. , Kent Glossop
IPC: G06F13/24 , H04L12/933 , H04L29/08
Abstract: Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a plurality of processing elements; a circuit switched interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the circuit switched interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements; and an in-network storage element of the circuit switched interconnect network comprising a queue coupled to an output queue of a first processing element, and a controller that switches the in-network storage element into a first mode that provides a value stored in the queue of the in-network storage element by the output queue of the first processing element to an input queue of a second processing element when a configuration value is a first value, and into a second mode that bypasses the queue of the in-network storage element and provides a value from the output queue of the first processing element to the input queue of the second processing element when the configuration value is a second value.
-
公开(公告)号:US20190317744A1
公开(公告)日:2019-10-17
申请号:US16456953
申请日:2019-06-28
Applicant: Intel Corporation
Inventor: Kermin ChoFleming
Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
-
公开(公告)号:US20220405209A1
公开(公告)日:2022-12-22
申请号:US17352628
申请日:2021-06-21
Applicant: Intel Corporation
Inventor: Kermin ChoFleming , Yu Bai , Ping Zou
IPC: G06F12/0895 , G06F12/0853 , G06F12/0811 , G06F12/02
Abstract: An embodiment of an integrated circuit comprises circuitry to generate a cache tag for data to be stored in a cache memory, store a first portion of the cache tag in a primary tag memory, and store a second portion of the cache tag in a secondary tag memory, wherein a size of the first portion is smaller than a size of the second portion. Other embodiments are disclosed and claimed.
-
-
-
-
-
-
-
-
-