VECTOR MOVE INSTRUCTION CONTROLLED BY READ AND WRITE MASKS
    3.
    发明申请
    VECTOR MOVE INSTRUCTION CONTROLLED BY READ AND WRITE MASKS 审中-公开
    矢量移动指令由读和写掩码控制

    公开(公告)号:WO2014051733A3

    公开(公告)日:2014-08-21

    申请号:PCT/US2013045429

    申请日:2013-06-12

    Applicant: INTEL CORP

    CPC classification number: G06F15/8084 G06F9/3885

    Abstract: A processor executes a vector move instruction to move data elements from a second vector register to a first vector register under the control of a first mask register and a second mask register. A register file within the processor includes the first vector register, the second vector register, the first mask register and the second mask register. In response to the vector move instruction, execution circuitry in the processor is to replace a given number of target data elements in the first vector register with the given number of source data elements in the second vector register. Each source data element corresponds to a mask bit in the second mask register having a second bit value, and wherein each target data element corresponds to a mask bit in the first mask register having a first bit value.

    Abstract translation: 处理器在第一掩码寄存器和第二掩码寄存器的控制下执行矢量移动指令以将数据元素从第二矢量寄存器移动到第一矢量寄存器。 处理器内的寄存器文件包括第一矢量寄存器,第二矢量寄存器,第一掩码寄存器和第二掩码寄存器。 响应于向量移动指令,处理器中的执行电路用第二向量寄存器中的给定数量的源数据元素替换第一向量寄存器中的给定数量的目标数据元素。 每个源数据元素对应于具有第二比特值的第二掩码寄存器中的掩码比特,并且其中每个目标数据元素对应于第一掩码寄存器中具有第一比特值的掩码比特。

    SYSTEM AND METHOD FOR NON-UNIFORM CACHE IN A MULTI-CORE PROCESSOR
    5.
    发明申请
    SYSTEM AND METHOD FOR NON-UNIFORM CACHE IN A MULTI-CORE PROCESSOR 审中-公开
    在多核处理器中进行非均匀高速缓存的系统和方法

    公开(公告)号:WO2006072061A3

    公开(公告)日:2007-01-18

    申请号:PCT/US2005047592

    申请日:2005-12-27

    Abstract: A system and method for the design and operation of a distributed shared cache in a multi-core processor is disclosed. In one embodiment, the shared cache may be distributed among multiple cache molecules. Each of the cache molecules may be closest, in terms of access latency time, to one of the processor cores. In one embodiment, a cache line brought in from memory may initially be placed into a cache molecule that is not closest to a requesting processor core. When the requesting processor core makes repeated accesses to that cache line, it may be moved either between cache molecules or within a cache molecule. Due to the ability to move the cache lines within the cache, in various embodiments special search methods may be used to locate a particular cache line.

    Abstract translation: 公开了一种用于在多核处理器中设计和操作分布式共享高速缓存的系统和方法。 在一个实施例中,共享高速缓存可以分布在多个高速缓存分子之间。 每个缓存分子在访问延迟时间方面可能最接近于处理器核心之一。 在一个实施例中,从存储器引入的高速缓存行可以最初被放置在不最接近请求处理器核的高速缓存分
    区中。 当请求处理器核心重复访问该高速缓存行时,其可以在高速缓存分子之间或高速缓存分子内移动。 由于能够将高速缓存线移动到高速缓存内,在各种实施例中,可以使用特殊搜索方法来定位特定的高速缓存行。

    SYSTEM AND METHOD FOR CACHE COHERENCY IN A CACHE WITH DIFFERENT CACHE LOCATION LENGTHS
    6.
    发明申请
    SYSTEM AND METHOD FOR CACHE COHERENCY IN A CACHE WITH DIFFERENT CACHE LOCATION LENGTHS 审中-公开
    用于具有不同高速缓存位置长度的高速缓存中的高速缓存一致性的系统和方法

    公开(公告)号:WO2006072064A3

    公开(公告)日:2006-09-14

    申请号:PCT/US2005047595

    申请日:2005-12-27

    CPC classification number: G06F12/04 G06F12/0817 G06F12/0886 Y10S707/99952

    Abstract: A system and method for the design and operation of a cache system with differing cache location lengths in level one caches is disclosed. In one embodiment, each level one cache may include groups of cache locations of differing length, capable of holding portions of a level two cache line. A state tree may be created from data in a sharing vector. When a request arrives from a level one cache, the level two cache may examine the nodes of the state tree to determine whether the node of the state tree corresponding to the incoming request is already active. The results of this determination may be used to inhibit or permit the concurrent processing of the request.

    Abstract translation: 公开了一种用于在一级高速缓存中设计和操作具有不同高速缓存位置长度的高速缓存系统的系统和方法。 在一个实施例中,每个一级高速缓存可以包括不同长度的高速缓存位置组,其能够保存二级高速缓存线的部分。 状态树可以从共享向量中的数据创建。 当请求从一级高速缓存到达时,二级高速缓存可以检查状态树的节点以确定与输入请求相对应的状态树的节点是否已经是活动的。 该确定的结果可用于禁止或允许请求的并行处理。

    Read and write masks update instruction for vectorization of recursive computations over independent data

    公开(公告)号:GB2583415A

    公开(公告)日:2020-10-28

    申请号:GB202007409

    申请日:2013-06-12

    Applicant: INTEL CORP

    Abstract: A processor executes a mask update instruction to perform updates to a first mask register K2 and a second mask register K1. In response to the mask update instruction, execution circuitry inverts a given number of mask bits in the first mask register from a first bit value (e.g. 1) indicating valid data to a second bit value (e.g. 0) indicating an available slot, and inverts the given number of mask bits in the second mask register from the second bit value to the first bit value. The execution circuitry further moves (e.g. merge 320) the given number of elements (e.g. 3, shown with value C0) from a first vector register V2 to a second vector register V1 at the same relative positions as the inverted bits in the second mask register K1.

    Verfahren, Vorrichtungen, Befehle und Logik zum Bereitstellen von Vektoradressenkonflikt-Detektionsfunktionalität

    公开(公告)号:DE112013005416T5

    公开(公告)日:2015-07-30

    申请号:DE112013005416

    申请日:2013-06-30

    Applicant: INTEL CORP

    Abstract: Befehle und eine Logik stellen eine SIMD-Adressenkonflikt-Detektionsfunktionalität bereit. Einige Ausführungsformen umfassen Prozessoren mit einem Register mit einer variablen Anzahl von Datenfeldern, wobei jedes der Datenfelder einen Versatz für ein Datenelement in einem Speicher speichern soll. Ein Zielregister hat entsprechende Datenfelder, wobei jedes dieser Datenfelder eine variable zweite Anzahl von Bits speichern soll, um eine Konfliktmaske mit einem Maskenbit für jeden Versatz zu speichern. Als Antwort auf die Decodierung eines Vektorkonfliktbefehls vergleichen Ausführungseinheiten den Versatz in jedem Datenfeld mit jedem niedrigerwertigen Datenfeld, um zu bestimmen, ob sie einen übereinstimmenden Versatz tragen, und setzen in entsprechenden Konfliktmasken in dem Zielregister jegliche Maskenbits, die einem niedrigerwertigen Datenfeld entsprechen, das einen übereinstimmenden Versatz aufweist. Eine Vektoradressenkonfliktdetektion kann mit Elementen variabler Größe verwendet werden und zum Erzeugen von Konfliktmasken verwendet werden, um Abhängigkeiten in Sammeln-Modifizieren-Verteilen-SIMD-Operationen zu lösen.

Patent Agency Ranking