-
公开(公告)号:US20220129265A1
公开(公告)日:2022-04-28
申请号:US17430574
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Joydeep Ray , Mike Macpherson , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Subramaniam Maiyuran , Vasanth Ranganathan , Jayakrishna P S , K Pattabhiraman , Sudhakar Kamma
Abstract: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20220114108A1
公开(公告)日:2022-04-14
申请号:US17428529
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K. , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian
IPC: G06F12/123 , G06F12/0891 , G06F12/0875 , G06T1/60
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.
-
公开(公告)号:US20220114096A1
公开(公告)日:2022-04-14
申请号:US17428527
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri
IPC: G06F12/0802
Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:US20210374897A1
公开(公告)日:2021-12-02
申请号:US17303654
申请日:2021-06-03
Applicant: Intel Corporation
Inventor: Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Andrei Valentin , Ashutosh Garg , Yoav Harel , Arthur Hunter, JR. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
-
公开(公告)号:US20210312697A1
公开(公告)日:2021-10-07
申请号:US17304092
申请日:2021-06-14
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh
Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.
-
公开(公告)号:US20200294180A1
公开(公告)日:2020-09-17
申请号:US16355303
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
-
137.
公开(公告)号:US20200294178A1
公开(公告)日:2020-09-17
申请号:US16355250
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Aravindh Anantaraman , Altug Koker , Varghese George , Subramaniam Maiyuran , SungYe Kim , Valentin Andrei
Abstract: Apparatuses including general-purpose graphics processing units and graphics multiprocessors that exploit queues or transitional buffers for improved low-latency high-bandwidth on-die data retrieval are disclosed. In one embodiment, a graphics multiprocessor includes at least one compute engine to provide a request, a queue or transitional buffer, and logic coupled to the queue or transitional buffer. The logic is configured to cause a request to be transferred to a queue or transitional buffer for temporary storage without processing the request and to determine whether the queue or transitional buffer has a predetermined amount of storage capacity.
-
公开(公告)号:US20200293368A1
公开(公告)日:2020-09-17
申请号:US16355187
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Valentin Andrei , Subramaniam Maiyuran , SungYe Kim , Varghese George , Altug Koker , Aravindh Anantaraman
Abstract: Apparatuses to synchronize lanes that diverge or threads that drift are disclosed. In one embodiment, a graphics multiprocessor includes a queue having an initial state of groups with a first group having threads of first and second instruction types and a second group having threads of the first and second instruction types. A regroup engine (or regroup circuitry) regroups threads into a third group having threads of the first instruction type and a fourth group having threads of the second instruction type.
-
公开(公告)号:US09870044B2
公开(公告)日:2018-01-16
申请号:US15280057
申请日:2016-09-29
Applicant: Intel Corporation
Inventor: Sanjeev Jahagirdar , Varghese George , John B. Conrad , Robert Milstrey , Stephen A. Fischer , Alon Naveh , Shai Rotem
IPC: G06F1/32 , G06F12/08 , G06F11/14 , G06F12/084 , G06F12/0875 , G11C7/10 , G06F9/44
CPC classification number: G06F1/3287 , G06F1/3203 , G06F1/324 , G06F1/3243 , G06F1/3246 , G06F1/3275 , G06F1/3293 , G06F1/3296 , G06F9/4418 , G06F11/1441 , G06F12/084 , G06F12/0875 , G06F2212/281 , G06F2212/305 , G06F2212/314 , G11C7/1072 , Y02B70/123 , Y02B70/126 , Y02B70/32 , Y02D10/152 , Y02D10/172 , Y02D50/20 , Y02P80/11 , Y10T307/305 , Y10T307/406 , Y10T307/582 , Y10T307/826
Abstract: Embodiments of the invention relate to a method and apparatus for a zero voltage processor sleep state. A processor may include a dedicated cache memory. A voltage regulator may be coupled to the processor to provide an operating voltage to the processor. During a transition to a zero voltage power management state for the processor, the operational voltage applied to the processor by the voltage regulator may be reduced to approximately zero and the state variables associated with the processor may be saved to the dedicated cache memory.
-
140.
公开(公告)号:US09280172B2
公开(公告)日:2016-03-08
申请号:US14140875
申请日:2013-12-26
Applicant: INTEL CORPORATION
Inventor: Jose P. Allarey , Varghese George , Sanjeev S. Jahagirdar , Oren Lamdan
CPC classification number: G06F15/82 , G06F1/08 , G06F1/206 , G06F1/3203 , G06F1/324 , G06F9/06 , G06F9/30145 , G06F15/76 , Y02D10/126 , Y02D10/16
Abstract: With the progress toward multi-core processors, each core is can not readily ascertain the status of the other dies with respect to an idle or active status. A proposal for utilizing an interface to transmit core status among multiple cores in a multi-die microprocessor is discussed. Consequently, this facilitates thermal management by allowing an optimal setting for setting performance and frequency based on utilizing each core status.
-
-
-
-
-
-
-
-
-