-
公开(公告)号:US12197601B2
公开(公告)日:2025-01-14
申请号:US17560193
申请日:2021-12-22
Applicant: Intel Corporation
Inventor: Ren Wang , Sameh Gobriel , Somnath Paul , Yipeng Wang , Priya Autee , Abhirupa Layek , Shaman Narayana , Edwin Verplanke , Mrittika Ganguli , Jr-Shian Tsai , Anton Sorokin , Suvadeep Banerjee , Abhijit Davare , Desmond Kirkpatrick , Rajesh M. Sankaran , Jaykant B. Timbadiya , Sriram Kabisthalam Muthukumar , Narayan Ranganathan , Nalini Murari , Brinda Ganesh , Nilesh Jain
Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.
-
2.
公开(公告)号:US10789176B2
公开(公告)日:2020-09-29
申请号:US16059147
申请日:2018-08-09
Applicant: Intel Corporation
Inventor: Ren Wang , Yipeng Wang , Tsung-Yuan Tai , Cristian Florin Dumitrescu , Xiangyang Guo
IPC: G06F12/123 , G06F12/126 , G06F12/128 , G06F12/0864 , G06F12/0891 , G06F9/30 , G06F12/0871
Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.
-
3.
公开(公告)号:US10719442B2
公开(公告)日:2020-07-21
申请号:US16126907
申请日:2018-09-10
Applicant: Intel Corporation
Inventor: Ren Wang , Raanan Sade , Yipeng Wang , Tsung-Yuan Tai , Sameh Gobriel
IPC: G06F12/00 , G06F12/0811 , G06F9/38 , G06F16/18
Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.
-
4.
公开(公告)号:US20190042471A1
公开(公告)日:2019-02-07
申请号:US16059147
申请日:2018-08-09
Applicant: Intel Corporation
Inventor: Ren Wang , Yipeng Wang , Tsung-Yuan Tai , Cristian Florin Dumitrescu , Xiangyang Guo
IPC: G06F12/123 , G06F12/128 , G06F12/126 , G06F12/0891 , G06F12/0871 , G06F12/0864 , G06F9/30
Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.
-
公开(公告)号:US12293231B2
公开(公告)日:2025-05-06
申请号:US17471889
申请日:2021-09-10
Applicant: Intel Corporation
Inventor: Chenmin Sun , Yipeng Wang , Rahul R. Shah , Ren Wang , Sameh Gobriel , Hongjun Ni , Mrittika Ganguli , Edwin Verplanke
IPC: G06F9/50 , H04L47/11 , H04L47/12 , H04L47/125 , H04L12/70
Abstract: Examples described herein include a device interface; a first set of one or more processing units; and a second set of one or more processing units. In some examples, the first set of one or more processing units are to perform heavy flow detection for packets of a flow and the second set of one or more processing units are to perform processing of packets of a heavy flow. In some examples, the first set of one or more processing units and second set of one or more processing units are different. In some examples, the first set of one or more processing units is to allocate pointers to packets associated with the heavy flow to a first set of one or more queues of a load balancer and the load balancer is to allocate the packets associated with the heavy flow to one or more processing units of the second set of one or more processing units based, at least in part on a packet receive rate of the packets associated with the heavy flow.
-
公开(公告)号:US20200042479A1
公开(公告)日:2020-02-06
申请号:US16601137
申请日:2019-10-14
Applicant: Intel Corporation
Inventor: Ren Wang , Yipeng Wang , Andrew Herdrich , Jr-Shian Tsai , Tsung-Yuan C. Tai , Niall D. McDonnell , Hugh Wilkinson , Bradley A. Burres , Bruce Richardson , Namakkal N. Venkatesan , Debra Bernstein , Edwin Verplanke , Stephen R. Van Doren , An Yan , Andrew Cunningham , David Sonnier , Gage Eads , James T. Clee , Jamison D. Whitesell , Jerry Pirog , Jonathan Kenny , Joseph R. Hasting , Narender Vangati , Stephen Miller , Te K. Ma , William Burroughs
IPC: G06F13/37 , G06F12/0811 , G06F13/16 , G06F9/54 , G06F12/0868
Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
-
公开(公告)号:US10216668B2
公开(公告)日:2019-02-26
申请号:US15087154
申请日:2016-03-31
Applicant: Intel Corporation
Inventor: Ren Wang , Yipeng Wang , Jr-Shian Tsai , Andrew Herdrich , Tsung-Yuan Tai , Niall McDonnell , Stephen Van Doren , David Sonnier , Debra Bernstein , Hugh Wilkinson , Narender Vangati , Stephen Miller , Gage Eads , Andrew Cunningham , Jonathan Kenny , Bruce Richardson , William Burroughs , Joseph Hasting , An Yan , James Clee , Te Ma , Jerry Pirog , Jamison Whitesell
IPC: G06F13/24 , G06F13/36 , G06F13/40 , G06F12/1027
Abstract: Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.
-
公开(公告)号:US11811660B2
公开(公告)日:2023-11-07
申请号:US17396553
申请日:2021-08-06
Applicant: Intel Corporation
Inventor: Ren Wang , Tsung-Yuan C. Tai , Yipeng Wang , Sameh Gobriel
IPC: H04L45/7453 , H04L47/2441 , H04L45/745 , H04L61/10 , H04L61/5046 , H04L45/02
CPC classification number: H04L45/7453 , H04L45/745 , H04L47/2441 , H04L61/10 , H04L45/02 , H04L61/5046
Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.
-
公开(公告)号:US11698929B2
公开(公告)日:2023-07-11
申请号:US16207065
申请日:2018-11-30
Applicant: Intel Corporation
Inventor: Ren Wang , Andrew J. Herdrich , Tsung-Yuan C. Tai , Yipeng Wang , Raghu Kondapalli , Alexander Bachmutsky , Yifan Yuan
IPC: G06F7/00 , G06F16/901 , G06F16/903 , G06F16/906
CPC classification number: G06F16/9017 , G06F16/906 , G06F16/90335
Abstract: A central processing unit can offload table lookup or tree traversal to an offload engine. The offload engine can provide hardware accelerated operations such as instruction queueing, bit masking, hashing functions, data comparisons, a results queue, and a progress tracking. The offload engine can be associated with a last level cache. In the case of a hash table lookup, the offload engine can apply a hashing function to a key to generate a signature, apply a comparator to compare signatures against the generated signature, retrieve a key associated with the signature, and apply the comparator to compare the key against the retrieved key. Accordingly, a data pointer associated with the key can be provided in the result queue. Acceleration of operations in tree traversal and tuple search can also occur.
-
公开(公告)号:US11201940B2
公开(公告)日:2021-12-14
申请号:US15862311
申请日:2018-01-04
Applicant: Intel Corporation
Inventor: Yipeng Wang , Ren Wang , Antonio Fischetti , Sameh Gobriel , Tsung-Yuan C. Tai
Abstract: Technologies for flow rule aware exact match cache compression include multiple computing devices in communication over a network. A computing device reads a network packet from a network port and extracts one or more key fields from the packet to generate a lookup key. The key fields are identified by a key field specification of an exact match flow cache. The computing device may dynamically configure the key field specification based on an active flow rule set. The computing device may compress the key field specification to match a union of non-wildcard fields of the active flow rule set. The computing device may expand the key field specification in response to insertion of a new flow rule. The computing device looks up the lookup key in the exact match flow cache and, if a match is found, applies the corresponding action. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-