-
公开(公告)号:US10423411B2
公开(公告)日:2019-09-24
申请号:US14866921
申请日:2015-09-26
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar
IPC: G06F9/30
Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
2.
公开(公告)号:US20180173437A1
公开(公告)日:2018-06-21
申请号:US15384178
申请日:2016-12-19
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Deborah T. Marr , Edward T. Grochowski
IPC: G06F3/06
CPC classification number: G06F3/0614 , G06F3/0646 , G06F3/0683 , G06F9/3001 , G06F9/30036 , G06F9/3877 , G06F2212/1016
Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.
-
公开(公告)号:US20180165381A1
公开(公告)日:2018-06-14
申请号:US15376172
申请日:2016-12-12
Applicant: Intel Corporation
Inventor: Ganesh Venkatesh , Nicholas P. Carter , Deborah T. Marr
IPC: G06F17/30
CPC classification number: G06F17/30982 , G06F9/30036
Abstract: A processor may include a gather-update-scatter accelerator, and circuitry to direct an instruction to the accelerator for execution. The instruction may include a search index, an operation to be performed, and a scalar data value. The accelerator may include a content-associative memory (CAM) storing multiple entries, each of which stores a respective index key and a data value associated with the index key. The accelerator may include a CAM controller, including circuitry to select, based on the information in the instruction, one of the plurality of entries in the CAM on which to operate, an arithmetic logic unit (ALU), including circuitry to perform an arithmetic or logical operation on the selected entry, the operation being dependent on the information in the instruction, and circuitry to store a result of the operation in the selected entry in the CAM.
-
公开(公告)号:US20170185415A1
公开(公告)日:2017-06-29
申请号:US14757609
申请日:2015-12-23
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Kshitij A. Doshi , Elmoustapha Ould-Ahmed-Vall , Deborah T. Marr
CPC classification number: G06F9/3889 , G06F9/30 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30112 , G06F9/3861 , G06F9/3887
Abstract: A processor comprises a first register to store a plurality of data items at a plurality of positions within the first register, a second register, and an execution unit, operatively coupled to the first register and the second register, the execution unit comprising a logic circuit implementing a sort instruction for sorting the plurality of data items stored in the first register in an order of data item values, and storing, in the second register, a plurality of indices, wherein each index identifies a position associated with a data item stored in the first register prior to the sorting.
-
公开(公告)号:US10452551B2
公开(公告)日:2019-10-22
申请号:US15376242
申请日:2016-12-12
Applicant: Intel Corporation
Inventor: Ganesh Venkatesh , Christopher B. Wilkerson , Seth H. Pugsley , Deborah T. Marr
IPC: G06F12/08 , G06F9/30 , G06F12/0862
Abstract: A processor may include a programmable memory prefetcher that includes a programmable hardware prefetch engine and a prefetch engine control register. The programmable memory prefetcher may include circuitry and may be configured to receive, during execution of an application, a first instruction for configuring the prefetch engine for prefetching multiple cache lines to be accessed in the future, at predictable locations, by the application; to store, in the prefetch engine control register, dependent on information in the first instruction, data representing an amount of prefetching to be performed, and data representing a stride distance between consecutive cache lines to be prefetched; to receive a second instruction for prefetching a single cache line whose location is identified in the second instruction; and to initiate, in response to receiving the second instruction, prefetching of multiple cache lines by the prefetch engine, to be performed in parallel with execution of the application and in accordance with the data stored in the prefetch engine control register. The prefetch engine control register may store multiple entries, each including an identifier of a given operation to prefetch multiple cache lines. An instruction may also be received to disable prefetching of multiple cache lines. The multiple cache lines may be prefetched from a last-level cache (LLC) to a mid-level cache.
-
公开(公告)号:US20170090924A1
公开(公告)日:2017-03-30
申请号:US14866921
申请日:2015-09-26
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha OuId-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar
IPC: G06F9/30
Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
公开(公告)号:US12135981B2
公开(公告)日:2024-11-05
申请号:US18207870
申请日:2023-06-09
Applicant: Intel Corporation
Inventor: Rajesh M. Sankaran , Gilbert Neiger , Narayan Ranganathan , Stephen R. Van Doren , Joseph Nuzman , Niall D. McDonnell , Michael A. O'Hanlon , Lokpraveen B. Mosur , Tracy Garrett Drysdale , Eriko Nurvitadhi , Asit K. Mishra , Ganesh Venkatesh , Deborah T. Marr , Nicholas P. Carter , Jonathan D. Pearce , Edward T. Grochowski , Richard J. Greco , Robert Valentine , Jesus Corbal , Thomas D. Fletcher , Dennis R. Bradford , Dwight P. Manley , Mark J. Charney , Jeffrey J. Cook , Paul Caprioli , Koichi Yamada , Kent D. Glossop , David B. Sheffield
Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US11416281B2
公开(公告)日:2022-08-16
申请号:US16474978
申请日:2016-12-31
Applicant: Intel Corporation
Inventor: Rajesh M. Sankaran , Gilbert Neiger , Narayan Ranganathan , Stephen R. Van Doren , Joseph Nuzman , Niall D. McDonnell , Michael A. O'Hanlon , Lokpraveen B. Mosur , Tracy Garrett Drysdale , Eriko Nurvitadhi , Asit K. Mishra , Ganesh Venkatesh , Deborah T. Marr , Nicholas P. Carter , Jonathan D. Pearce , Edward T. Grochowski , Richard J. Greco , Robert Valentine , Jesus Corbal , Thomas D. Fletcher , Dennis R. Bradford , Dwight P. Manley , Mark J. Charney , Jeffrey J. Cook , Paul Caprioli , Koichi Yamada , Kent D. Glossop , David B. Sheffield
Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US10740281B2
公开(公告)日:2020-08-11
申请号:US16103798
申请日:2018-08-14
Applicant: INTEL CORPORATION
Inventor: Varghese George , Sanjeev S. Jahagirdar , Deborah T. Marr
IPC: G06F15/80 , G06F1/3206 , G06F1/3293 , G06F9/50 , G06F1/3296 , G06F13/40
Abstract: A method is described that entails operating enabled cores of a multi-core processor such that both cores support respective software routines with a same instruction set, a first core being higher performance and consuming more power than a second core under a same set of applied supply voltage and operating frequency.
-
公开(公告)号:US10437562B2
公开(公告)日:2019-10-08
申请号:US15394968
申请日:2016-12-30
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Yu Wang , Deborah T. Marr
Abstract: An apparatus and method are described for designing an accelerator for processing sparse data. For example, one embodiment comprises a machine-readable medium having program code stored thereon which, when executed by a processor, causes the processor to perform the operations of: analyzing input graph program code and parameters associated with a target accelerator in view of an accelerator architecture template; responsively mapping the parameters onto the architecture template to implement customizations to the accelerator architecture template; and generating a hardware description representation of the target accelerator based on the determined mapping of the parameters to apply to the accelerator architecture template.
-
-
-
-
-
-
-
-
-