Patent search ap:("INTEL CORPORATION") AND inv:"Yipeng Wang" Page 1

1.

发明授权
Hardware offload circuitry 有权

公开(公告)号：US12197601B2

公开(公告)日：2025-01-14

申请号：US17560193

申请日：2021-12-22

Applicant: Intel Corporation

Inventor： Ren Wang , Sameh Gobriel , Somnath Paul , Yipeng Wang , Priya Autee , Abhirupa Layek , Shaman Narayana , Edwin Verplanke , Mrittika Ganguli , Jr-Shian Tsai , Anton Sorokin , Suvadeep Banerjee , Abhijit Davare , Desmond Kirkpatrick , Rajesh M. Sankaran , Jaykant B. Timbadiya , Sriram Kabisthalam Muthukumar , Narayan Ranganathan , Nalini Murari , Brinda Ganesh , Nilesh Jain

IPC: G06F15/78 , G06F9/50 , G06F21/62 , G06F21/72

Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.

2.

发明授权
Technologies for a least recently used cache replacement policy using vector instructions 有权

公开(公告)号：US10789176B2

公开(公告)日：2020-09-29

申请号：US16059147

申请日：2018-08-09

Applicant: Intel Corporation

Inventor： Ren Wang , Yipeng Wang , Tsung-Yuan Tai , Cristian Florin Dumitrescu , Xiangyang Guo

IPC: G06F12/123 , G06F12/126 , G06F12/128 , G06F12/0864 , G06F12/0891 , G06F9/30 , G06F12/0871

Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.

3.

发明授权
Apparatus and method for prioritized quality of service processing for transactional memory 有权

公开(公告)号：US10719442B2

公开(公告)日：2020-07-21

申请号：US16126907

申请日：2018-09-10

Applicant: Intel Corporation

Inventor： Ren Wang , Raanan Sade , Yipeng Wang , Tsung-Yuan Tai , Sameh Gobriel

IPC: G06F12/00 , G06F12/0811 , G06F9/38 , G06F16/18

Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.

4.

发明申请
TECHNOLOGIES FOR A LEAST RECENTLY USED CACHE REPLACEMENT POLICY USING VECTOR INSTRUCTIONS 审中-公开

公开(公告)号：US20190042471A1

公开(公告)日：2019-02-07

申请号：US16059147

申请日：2018-08-09

Applicant: Intel Corporation

Inventor： Ren Wang , Yipeng Wang , Tsung-Yuan Tai , Cristian Florin Dumitrescu , Xiangyang Guo

IPC: G06F12/123 , G06F12/128 , G06F12/126 , G06F12/0891 , G06F12/0871 , G06F12/0864 , G06F9/30

Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.

5.

发明授权
Packet processing load balancer 有权

公开(公告)号：US12293231B2

公开(公告)日：2025-05-06

申请号：US17471889

申请日：2021-09-10

Applicant: Intel Corporation

Inventor： Chenmin Sun , Yipeng Wang , Rahul R. Shah , Ren Wang , Sameh Gobriel , Hongjun Ni , Mrittika Ganguli , Edwin Verplanke

IPC: G06F9/50 , H04L47/11 , H04L47/12 , H04L47/125 , H04L12/70

Abstract: Examples described herein include a device interface; a first set of one or more processing units; and a second set of one or more processing units. In some examples, the first set of one or more processing units are to perform heavy flow detection for packets of a flow and the second set of one or more processing units are to perform processing of packets of a heavy flow. In some examples, the first set of one or more processing units and second set of one or more processing units are different. In some examples, the first set of one or more processing units is to allocate pointers to packets associated with the heavy flow to a first set of one or more queues of a load balancer and the load balancer is to allocate the packets associated with the heavy flow to one or more processing units of the second set of one or more processing units based, at least in part on a packet receive rate of the packets associated with the heavy flow.

6.

发明申请
MULTI-CORE COMMUNICATION ACCELERATION USING HARDWARE QUEUE DEVICE 审中-公开

公开(公告)号：US20200042479A1

公开(公告)日：2020-02-06

申请号：US16601137

申请日：2019-10-14

Applicant: Intel Corporation

Inventor： Ren Wang , Yipeng Wang , Andrew Herdrich , Jr-Shian Tsai , Tsung-Yuan C. Tai , Niall D. McDonnell , Hugh Wilkinson , Bradley A. Burres , Bruce Richardson , Namakkal N. Venkatesan , Debra Bernstein , Edwin Verplanke , Stephen R. Van Doren , An Yan , Andrew Cunningham , David Sonnier , Gage Eads , James T. Clee , Jamison D. Whitesell , Jerry Pirog , Jonathan Kenny , Joseph R. Hasting , Narender Vangati , Stephen Miller , Te K. Ma , William Burroughs

IPC: G06F13/37 , G06F12/0811 , G06F13/16 , G06F9/54 , G06F12/0868

Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

7.

发明授权
Technologies for a distributed hardware queue manager 有权

公开(公告)号：US10216668B2

公开(公告)日：2019-02-26

申请号：US15087154

申请日：2016-03-31

Applicant: Intel Corporation

Inventor： Ren Wang , Yipeng Wang , Jr-Shian Tsai , Andrew Herdrich , Tsung-Yuan Tai , Niall McDonnell , Stephen Van Doren , David Sonnier , Debra Bernstein , Hugh Wilkinson , Narender Vangati , Stephen Miller , Gage Eads , Andrew Cunningham , Jonathan Kenny , Bruce Richardson , William Burroughs , Joseph Hasting , An Yan , James Clee , Te Ma , Jerry Pirog , Jamison Whitesell

IPC: G06F13/24 , G06F13/36 , G06F13/40 , G06F12/1027

Abstract: Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.

8.

发明授权
Flow classification apparatus, methods, and systems 有权

公开(公告)号：US11811660B2

公开(公告)日：2023-11-07

申请号：US17396553

申请日：2021-08-06

Applicant: Intel Corporation

Inventor： Ren Wang , Tsung-Yuan C. Tai , Yipeng Wang , Sameh Gobriel

IPC: H04L45/7453 , H04L47/2441 , H04L45/745 , H04L61/10 , H04L61/5046 , H04L45/02

CPC classification number: H04L45/7453 , H04L45/745 , H04L47/2441 , H04L61/10 , H04L45/02 , H04L61/5046

Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.

9.

发明授权
Offload of data lookup operations 有权

公开(公告)号：US11698929B2

公开(公告)日：2023-07-11

申请号：US16207065

申请日：2018-11-30

Applicant: Intel Corporation

Inventor： Ren Wang , Andrew J. Herdrich , Tsung-Yuan C. Tai , Yipeng Wang , Raghu Kondapalli , Alexander Bachmutsky , Yifan Yuan

IPC: G06F7/00 , G06F16/901 , G06F16/903 , G06F16/906

CPC classification number: G06F16/9017 , G06F16/906 , G06F16/90335

Abstract: A central processing unit can offload table lookup or tree traversal to an offload engine. The offload engine can provide hardware accelerated operations such as instruction queueing, bit masking, hashing functions, data comparisons, a results queue, and a progress tracking. The offload engine can be associated with a last level cache. In the case of a hash table lookup, the offload engine can apply a hashing function to a key to generate a signature, apply a comparator to compare signatures against the generated signature, retrieve a key associated with the signature, and apply the comparator to compare the key against the retrieved key. Accordingly, a data pointer associated with the key can be provided in the result queue. Acceleration of operations in tree traversal and tuple search can also occur.

10.

发明授权
Technologies for flow rule aware exact match cache compression 有权

公开(公告)号：US11201940B2

公开(公告)日：2021-12-14

申请号：US15862311

申请日：2018-01-04

Applicant: Intel Corporation

Inventor： Yipeng Wang , Ren Wang , Antonio Fischetti , Sameh Gobriel , Tsung-Yuan C. Tai

IPC: H04L12/24 , H04L29/08 , H04L29/06 , H04L12/26 , G06F9/455

Abstract: Technologies for flow rule aware exact match cache compression include multiple computing devices in communication over a network. A computing device reads a network packet from a network port and extracts one or more key fields from the packet to generate a lookup key. The key fields are identified by a key field specification of an exact match flow cache. The computing device may dynamically configure the key field specification based on an active flow rule set. The computing device may compress the key field specification to match a union of non-wildcard fields of the active flow rule set. The computing device may expand the key field specification in response to insertion of a new flow rule. The computing device looks up the lookup key in the exact match flow cache and, if a match is found, applies the corresponding action. Other embodiments are described and claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification