-
公开(公告)号:US12182018B2
公开(公告)日:2024-12-31
申请号:US17133615
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
IPC: G06F12/0811 , G06F9/38 , G06F12/0862 , G06F12/0895
Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
-
22.
公开(公告)号:US12028094B2
公开(公告)日:2024-07-02
申请号:US17133622
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
CPC classification number: H03M7/6029 , G06F9/3877 , G06F9/541
Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11874773B2
公开(公告)日:2024-01-16
申请号:US16729344
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Rahul Bera , Anant Vithal Nori , Sreenivas Subramoney
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/602
Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described. In one embodiment, a prefetch circuit is to prefetch a cache line into a cache from a memory by tracking page and cache line accesses to the cache for a single access signature, generate a spatial bit pattern, for the cache line accesses for each page of a plurality of pages, that is shifted to a first cache line access for each page, generate a single spatial bit pattern for the single access signature for each of the spatial bit patterns that have a same spatial bit pattern to form a plurality of single spatial bit patterns, perform a logical OR operation on the plurality of single spatial bit patterns to create a first modulated bit pattern for the single access signature, perform a logical AND operation on the plurality of single spatial bit patterns to create a second modulated bit pattern for the single access signature, receive a prefetch request for the single access signature, and perform a prefetch operation for the prefetch request using the first modulated bit pattern when a threshold is not exceeded and the second modulated bit pattern when the threshold is exceeded.
-
公开(公告)号:US11734174B2
公开(公告)日:2023-08-22
申请号:US16576687
申请日:2019-09-19
Applicant: Intel Corporation
Inventor: Huichu Liu , Tanay Karnik , Tejpal Singh , Yen-Cheng Liu , Lavanya Subramanian , Mahesh Kumashikar , Sri Harsha Choday , Sreenivas Subramoney , Kaushik Vaidyanathan , Daniel H. Morris , Uygar E. Avci , Ian A. Young
IPC: G06F12/08 , G06F12/0804 , G06F12/0866 , G06F12/0806 , G06F11/20
CPC classification number: G06F12/0804 , G06F11/2089 , G06F12/0806 , G06F12/0866
Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.
-
公开(公告)号:US20230205699A1
公开(公告)日:2023-06-29
申请号:US17561831
申请日:2021-12-24
Applicant: Intel Corporation
Inventor: Swaraj Sha , Anant Vithal Nori , Sreenivas Subramoney , Stanislav Shwartsman , Pavel I. Kryukov , Lihu Rappoport
IPC: G06F12/0862 , G06F12/0811 , G06F12/0877 , G06F9/38
CPC classification number: G06F12/0862 , G06F12/0811 , G06F12/0877 , G06F9/3816
Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.
-
26.
公开(公告)号:US11645078B2
公开(公告)日:2023-05-09
申请号:US16729349
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Franck Sala , Jayesh Gaur , Zeev Sperber , Lihu Rappoport , Adi Yoaz , Sreenivas Subramoney
CPC classification number: G06F9/3806 , G06F9/30058 , G06F9/30145
Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.
-
公开(公告)号:US20230103206A1
公开(公告)日:2023-03-30
申请号:US17448806
申请日:2021-09-24
Applicant: Intel Corporation
IPC: G06F9/38 , G06F9/30 , G06F12/0875
Abstract: In an embodiment, a processor may include an execution circuit to execute a plurality of instructions, a cache, and a decode circuit. The decode circuit may be to: detect a branch instruction in a program, the branch instruction to cause execution to follow either a first path or a second path in the program; and in response to a determination that the branch instruction is a hard to predict (HTP) branch, cause first and second sets of instructions to be stored in the cache, where the first set of instructions is included in the first path, and where the second set of instructions is included in the second path. Other embodiments are described and claimed.
-
公开(公告)号:US20220382514A1
公开(公告)日:2022-12-01
申请号:US17832999
申请日:2022-06-06
Applicant: Intel Corporation
Inventor: Kamlesh Pillai , Gurpreet Singh Kalsi , Sreedevi Ambika , Om Omer , Sreenivas Subramoney
IPC: G06F7/544
Abstract: Systems, apparatuses, and methods include technology that determines whether an operation is a floating-point based computation or an integer-based computation. When the operation is the floating-point based computation, the technology generates a map of the operation to integer-based compute engines to control the integer-based compute engines to execute the floating-point based computation. When the operation is the integer-based computation, the technology controls the integer-based compute engines to execute the integer-based computation.
-
公开(公告)号:US20220197794A1
公开(公告)日:2022-06-23
申请号:US17130698
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Prathmesh Kallurkar , Anant Vithal Nori , Sreenivas Subramoney
IPC: G06F12/084 , G06F12/0811 , G06F9/50
Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.
-
30.
公开(公告)号:US20220197659A1
公开(公告)日:2022-06-23
申请号:US17133622
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-