-
公开(公告)号:US20200310993A1
公开(公告)日:2020-10-01
申请号:US16370587
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Sanjay Kumar , David Koufaty , Philip Lantz , Pratik Marolia , Rajesh Sankaran , Koen Koning
IPC: G06F13/16 , G06F12/1027 , G06F3/06
Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes physical-to-virtual address translation circuitry and migration circuitry. The physical-to-virtual address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits. Thus, the processor memory circuitry and accelerator memory circuitry may be dynamically allocated to advantageously minimize system latency attributable to data access operations.
-
公开(公告)号:US11379236B2
公开(公告)日:2022-07-05
申请号:US16728665
申请日:2019-12-27
Applicant: Intel Corporation
Inventor: Pratik Marolia , Rajesh Sankaran
IPC: G11C16/04 , G06F9/30 , G06F9/38 , G06F12/1027
Abstract: An apparatus and method for hybrid software-hardware coherency. An apparatus comprises one or more processing elements to process data; a memory controller to couple the one or more processing elements to a device memory; an interconnect to couple the one or more processing elements to a host processor memory and to couple a host processor to the device memory; one or more device caches to store cache lines read from the host processor memory and/or the device memory; coherency circuitry to manage an ownership indication for each cache line, the ownership indication to be set to a first value to indicate ownership by the host processor and to be set to a second value to indicate ownership by the processing device, wherein the coherency circuitry is to transfer ownership of a first cache line from the processing device to the host processor by updating the ownership indication from the second value to the first value, the coherency circuitry to provide indirect access to the cache line by the processing device while the ownership indication is set to the first value, the coherency circuitry to maintain the ownership indication at the first value until receiving a request to change the ownership indication.
-
公开(公告)号:US11201838B2
公开(公告)日:2021-12-14
申请号:US16582224
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: Pratik Marolia , Rajesh Sankaran , Ishwar Agarwal , Nitish Paliwal
IPC: H04L12/935 , H04L29/06
Abstract: In one embodiment, an input/output port includes a stateful transmit port having: a history storage to store a value corresponding to a transmit on change field of a prior data packet; a comparator to compare a transmit on change field of the data packet to the value stored in the history storage; and a selection circuit to output the data packet without the transmit on change field when the transmit on change field of the data packet matches the value. Other embodiments are described and claimed.
-
公开(公告)号:US11954062B2
公开(公告)日:2024-04-09
申请号:US17310540
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: Joydeep Ray , Niranjan Cooray , Subramaniam Maiyuran , Altug Koker , Prasoonkumar Surti , Varghese George , Valentin Andrei , Abhishek Appu , Guadalupe Garcia , Pattabhiraman K , Sungye Kim , Sanjay Kumar , Pratik Marolia , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , William Sadler , Lakshminarayanan Striramassarma
IPC: G06F12/00 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
-
公开(公告)号:US10817441B2
公开(公告)日:2020-10-27
申请号:US16370587
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Sanjay Kumar , David Koufaty , Philip Lantz , Pratik Marolia , Rajesh Sankaran , Koen Koning
IPC: G06F13/16 , G06F12/1027 , G06F3/06
Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes virtual-to-physical address translation circuitry and migration circuitry. The virtual-to-physical address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits. Thus, the processor memory circuitry and accelerator memory circuitry may be dynamically allocated to advantageously minimize system latency attributable to data access operations.
-
公开(公告)号:US20200021540A1
公开(公告)日:2020-01-16
申请号:US16582224
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: Pratik Marolia , Rajesh Sankaran , Ishwar Agarwal , Nitish Paliwal
IPC: H04L12/935 , H04L29/06
Abstract: In one embodiment, an input/output port includes a stateful transmit port having: a history storage to store a value corresponding to a transmit on change field of a prior data packet; a comparator to compare a transmit on change field of the data packet to the value stored in the history storage; and a selection circuit to output the data packet without the transmit on change field when the transmit on change field of the data packet matches the value. Other embodiments are described and claimed.
-
公开(公告)号:US12086082B2
公开(公告)日:2024-09-10
申请号:US17026516
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: Pratik Marolia , Sanjay Kumar , Rajesh Sankaran , Utkarsh Y. Kakaiya
CPC classification number: G06F13/20 , G06F3/061 , G06F3/0655 , G06F3/0662 , G06F3/0679 , G06F9/45558
Abstract: Methods and apparatus for PASID-based routing extension for Scalable IOV systems. The system may include a Central Processing Unit (CPU) operatively coupled to a scalable Input/Output Virtualization (IOV) device via an in-line device such as a smart controller or accelerator. A Control Process Address Space Identifier (C-PASID) associated with a first memory space is implemented in an Assignable Device Interface (ADI) for the IOV device. The ADI also implements a Data PASID (D-PASID) associated with a second memory space in which data are stored. The C-PASID is used to fetch a descriptor in the first memory space and the D-PASID is employed to fetch data in the second memory space. A hub embedded on the in-line device or implemented as a discrete device is used to steer memory access requests and/or fetches to the CPU or to the in-line device using the C-PASID and D-PASID. IOV devices include multi-PASID helper devices and off-the-shelf devices such as NICs with modified ADIs to support C-PASID and D-PASID usage.
-
公开(公告)号:US20210200545A1
公开(公告)日:2021-07-01
申请号:US16728665
申请日:2019-12-27
Applicant: Intel Corporation
Inventor: Pratik Marolia , Rajesh Sankaran
Abstract: An apparatus and method for hybrid software-hardware coherency. For example, one embodiment of an apparatus comprises: one or more processing elements to process data; a memory controller to couple the one or more processing elements to a device memory; an interconnect to couple the one or more processing elements to a host processor memory and to couple a host processor to the device memory; one or more device caches to store cache lines read from the host processor memory and/or the device memory; coherency circuitry to manage an ownership indication for each cache line, the ownership indication to be set to a first value to indicate ownership by the host processor and to be set to a second value to indicate ownership by the processing device, wherein the coherency circuitry is to transfer ownership of a first cache line from the processing device to the host processor by updating the ownership indication from the second value to the first value, the coherency circuitry to provide indirect access to the cache line by the processing device while the ownership indication is set to the first value, the coherency circuitry to maintain the ownership indication at the first value until receiving a request to change the ownership indication.
-
公开(公告)号:US12229069B2
公开(公告)日:2025-02-18
申请号:US17083200
申请日:2020-10-28
Applicant: Intel Corporation
Inventor: Pratik Marolia , Andrew Herdrich , Rajesh Sankaran , Rahul Pal , David Puffer , Sayantan Sur , Ajaya Durg
Abstract: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.
-
公开(公告)号:US10762244B2
公开(公告)日:2020-09-01
申请号:US16024022
申请日:2018-06-29
Applicant: INTEL CORPORATION
Inventor: Joshua Fender , Utkarsh Y. Kakaiya , Mohan Nair , Brian Morris , Pratik Marolia
IPC: G06F21/00 , G06F21/76 , H04L29/08 , G06F11/14 , G06F1/3206 , G06F21/54 , G06F1/324 , G06F21/74 , H04L29/06 , G06F1/20 , G06F1/3287
Abstract: Various embodiments are generally directed to securing systems that include hardware accelerators, such as FPGA-based accelerators, and privileged system components. Some embodiments may provide a security broker. In various embodiments, the security broker may provide interfaces between the hardware accelerator and the privileged component. Some embodiments may receive an instruction from the hardware accelerator targeting the privileged component, and validate the instruction based on a configuration. In some embodiments, upon determining the instruction is not validated, the instruction is restricted from further processing.
-
-
-
-
-
-
-
-
-