APPARATUS AND METHODS TO REDUCE CASTOUTS IN A MULTI-LEVEL CACHE HIERARCHY
    1.
    发明申请
    APPARATUS AND METHODS TO REDUCE CASTOUTS IN A MULTI-LEVEL CACHE HIERARCHY 审中-公开
    减少多层次高速缓存中的突发事件的方法和方法

    公开(公告)号:WO2008095025A1

    公开(公告)日:2008-08-07

    申请号:PCT/US2008/052507

    申请日:2008-01-30

    Abstract: Techniques and methods are used to reduce allocations to a higher level cache of cache lines displaced from a lower level cache. When it is determined that displaced lines have already been allocated in a higher level, the allocations of the displaced cache lines are prevented in the next level cache, thus, reducing castouts. To such ends, a line is selected to be displaced in a lower level cache. Information associated with the selected line is identified which indicates that the selected line is present in a higher level cache. An allocation of the selected line in the higher level cache is prevented based on the identified information. Preventing an allocation of the selected line saves power that would be associated with the allocation.

    Abstract translation: 技术和方法用于减少从较低级别缓存中移位的高速缓存行的更高级缓存的分配。 当确定已移位的行已经被分配在较高级别时,在下一级高速缓存中防止移位的高速缓存行的分配,从而减少转移。 为此,选择在下一级缓存中移位的行。 识别与所选行相关联的信息,其指示所选择的行存在于较高级别的高速缓存中。 基于所识别的信息来防止在较高级别高速缓存中的所选行的分配。 防止所选线路的分配节省与分配相关联的功率。

    SEGMENTED PIPELINE FLUSHING FOR MISPREDICTED BRANCHES
    2.
    发明申请
    SEGMENTED PIPELINE FLUSHING FOR MISPREDICTED BRANCHES 审中-公开
    用于错误分支的SEGMENTED管道冲洗

    公开(公告)号:WO2008092045A1

    公开(公告)日:2008-07-31

    申请号:PCT/US2008/051966

    申请日:2008-01-24

    CPC classification number: G06F9/384 G06F9/3842 G06F9/3863 G06F9/3867

    Abstract: A processor pipeline is segmented into an upper portion - prior to instructions going out of program order - and one or more lower portions beyond the upper portion. The upper pipeline is flushed upon detecting that a branch instruction was mispredicted, minimizing the delay in fetching of instructions from the correct branch target address. The lower pipelines may continue execution until the mispredicted branch instruction confirms, at which time all uncommitted instructions are flushed from the lower pipelines. Existing exception pipeline flushing mechanisms may be utilized, by adding a mispredicted branch identifier, reducing the complexity and hardware cost of flushing the lower pipelines.

    Abstract translation: 处理器管线被分割成上部,在指令超出程序顺序之前,以及超出上部的一个或多个下部。 在检测到分支指令被错误预测时,上级流水线被刷新,从而使得从正确的分支目标地址获取指令的延迟最小化。 较低的管道可以继续执行,直到错误的分支指令确认,此时所有未提交的指令都从较低的管道冲洗。 可以通过添加错误的分支标识符来减少冲洗下层管道的复杂性和硬件成本,来利用现有的异常流水线冲洗机制。

    METHOD AND APPARATUS FOR POWER REDUCTION UTILIZING HETEROGENEOUSLY- MULTI-PIPELINED PROCESSOR
    3.
    发明申请
    METHOD AND APPARATUS FOR POWER REDUCTION UTILIZING HETEROGENEOUSLY- MULTI-PIPELINED PROCESSOR 审中-公开
    用于减少异质多管道加工器的减少电力的方法和装置

    公开(公告)号:WO2006094196A2

    公开(公告)日:2006-09-08

    申请号:PCT/US2006/007607

    申请日:2006-03-03

    Abstract: A processor includes a common instruction decode front end, e.g. fetch and decode stages, and a heterogeneous set of processing pipelines. A lower performance pipeline has fewer stages and may utilize lower speed/power circuitry. A higher performance pipeline has more stages and utilizes faster circuitry. The pipelines share other processor resources, such as an instruction cache, a register file stack, a data cache, a memory interface, and other architected registers within the system. In disclosed examples, the processor is controlled such that processes requiring higher performance run in the higher performance pipeline, whereas those requiring lower performance utilize the lower performance pipeline, in at least some instances while the higher performance pipeline is effectively inactive or even shut-off to minimize power consumption. The configuration of the processor at any given time, that is to say the pipeline(s) currently operating, may be controlled via several different techniques.

    Abstract translation: 处理器包括公共指令解码前端,例如, 提取和解码阶段,以及一组异构的处理流水线。 较低性能的管道具有较少的级,并且可以利用较低速度/功率电路。 更高性能的管道具有更多的阶段并且利用更快的电路。 管道共享其他处理器资源,例如指令高速缓存,寄存器堆栈,数据高速缓存,存储器接口和系统内的其他架构寄存器。 在公开的示例中,处理器被控制,使得需要更高性能的处理在较高性能流水线中运行,而那些需要较低性能的流程在至少某些情况下利用较低性能流水线,而较高性能流水线有效地不活动或甚至切断 以最小化功耗。 处理器在任何给定时间的配置,也就是说当前操作的流水线可以通过几种不同的技术来控制。

    METHODS AND APPARATUS TO INSURE CORRECT PREDECODE
    4.
    发明申请
    METHODS AND APPARATUS TO INSURE CORRECT PREDECODE 审中-公开
    确保正确预测的方法和装置

    公开(公告)号:WO2006091857A1

    公开(公告)日:2006-08-31

    申请号:PCT/US2006/006677

    申请日:2006-02-24

    CPC classification number: G06F9/30149 G06F8/447 G06F9/382

    Abstract: Techniques for ensuring a synchronized predecoding of an instruction string are disclosed. The instruction string contains instructions from a variable length instruction set and embedded data. One technique includes defining a granule to be equal to the smallest length instruction in the instruction set and defining the number of granules that compose the longest length instruction in the instruction set to be MAX. The technique further includes determining the end of an embedded data segment, when a program is compiled or assembled into the instruction string and inserting a padding of length, MAX -1, into the instruction string to the end of the embedded data. Upon predecoding of the padded instruction string, a predecoder maintains synchronization with the instructions in the padded instruction string even if embedded data is coincidentally encoded to resemble an existing instruction in the variable length instruction set.

    Abstract translation: 公开了用于确保指令串的同步预解码的技术。 指令串包含来自可变长度指令集和嵌入数据的指令。 一种技术包括定义一个等于指令集中最小长度指令的粒子,并将构成指令集中最长指令的粒子数定义为MAX。 该技术还包括当程序被编译或组合到指令串中并且将长度为MAX -1的填充插入指令串中以确定嵌入数据段的结束时,到嵌入数据的结尾。 在预编译填充指令串时,即使嵌入数据被巧合地编码成类似于可变长度指令集中的现有指令,预解码器也保持与填充指令串中的指令的同步。

    METHODS AND APPARATUS FOR EMULATING THE BRANCH PREDICTION BEHAVIOR OF AN EXPLICIT SUBROUTINE CALL
    6.
    发明申请
    METHODS AND APPARATUS FOR EMULATING THE BRANCH PREDICTION BEHAVIOR OF AN EXPLICIT SUBROUTINE CALL 审中-公开
    用于模拟显示子程序调用的分支预测行为的方法和装置

    公开(公告)号:WO2008028103A2

    公开(公告)日:2008-03-06

    申请号:PCT/US2007/077340

    申请日:2007-08-31

    Abstract: An apparatus for emulating the branch prediction behavior of an explicit subroutine call is disclosed. The apparatus includes a first input which is configured to receive an instruction address and a second input. The second input is configured to receive predecode information which describes the instruction address as being related to an implicit subroutine call to a subroutine. In response to the predecode information, the apparatus also includes an adder configured to add a constant to the instruction address defining a return address, causing the return address to be stored to an explicit subroutine resource, thus, facilitating subsequent branch prediction of a return call instruction.

    Abstract translation: 公开了一种用于模拟显式子程序调用的分支预测行为的装置。 该装置包括被配置为接收指令地址和第二输入的第一输入。 第二输入被配置为接收描述与子程序的隐式子程序调用有关的指令地址的预解码信息。 响应于预解码信息,该装置还包括加法器,其被配置为向定义返回地址的指令地址添加常数,使得将返回地址存储到显式子例程资源,从而便于后续分支预测返回呼叫 指令。

    TLB LOCK INDICATOR
    7.
    发明申请
    TLB LOCK INDICATOR 审中-公开
    TLB锁定指示器

    公开(公告)号:WO2007024937A1

    公开(公告)日:2007-03-01

    申请号:PCT/US2006/032902

    申请日:2006-08-22

    CPC classification number: G06F12/1027 G06F12/126 G06F2212/681

    Abstract: A processor includes a hierarchical Translation Lookaside Buffer (TLB) comprising a Level-1 TLB and a small, high-speed Level-0 TLB. Entries in the L0 TLB replicate entries in the L1 TLB. The processor first accesses the L0 TLB in an address translation, and access the L1 TLB if a virtual address misses in the L0 TLB. When the virtual address hits in the L1 TLB, the virtual address, physical address, and page attributes are written to the L0 TLB, replacing an existing entry if the L0 TLB is full. The entry may be locked against replacement in the L0 TLB in response to an L0 Lock (L0L) indicator in the L1 TLB entry. Similarly, in a hardware-managed L1 TLB, entries may be locked against replacement in response to an L1 Lock (L1L) indicator in the corresponding page table entry.

    Abstract translation: 处理器包括包括Level-1 TLB和小的高速Level-0 TLB的分级翻译后备缓冲器(TLB)。 L0 TLB中的条目复制L1 TLB中的条目。 处理器首先在地址转换中访问L0 TLB,如果在L0 TLB中虚拟地址丢失,则访问L1 TLB。 当虚拟地址在L1 TLB中时,虚拟地址,物理地址和页面属性被写入L0 TLB,如果L0 TLB已满,则替换现有条目。 响应于L1 TLB条目中的L0锁定(L0L)指示灯,该条目可能被锁定在L0 TLB中。 类似地,在硬件管理的L1 TLB中,可以响应于相应页表条目中的L1锁定(L1L)指示符来锁定条目以替代。

    MEMORY MANAGEMENT UNIT WITH PRE-FILLING CAPABILITY
    10.
    发明申请
    MEMORY MANAGEMENT UNIT WITH PRE-FILLING CAPABILITY 审中-公开
    具有预填充能力的内存管理单元

    公开(公告)号:WO2012119148A1

    公开(公告)日:2012-09-07

    申请号:PCT/US2012/027739

    申请日:2012-03-05

    Abstract: Systems and method for memory management units (MMUs) configured to automatically pre-fill a translation lookaside buffer (TLB) (206, 208) with address translation (202-204) entries expected to be used in the future, thereby reducing TLB miss rate and improving performance. The TLB may be pre-filled with translation entries, wherein addresses corresponding to the pre-fill may be selected based on predictions. Predictions may be derived from external devices (214), or based on stride values, wherein the stride values may be a predetermined constant or dynamically altered based on access patterns (216). Pre-filling the TLB may effectively remove latency involved in determining address translations for TLB misses from the critical path.

    Abstract translation: 用于存储器管理单元(MMU)的系统和方法,被配置为使用预期在将来使用的地址转换(202-204)的条目自动预填充翻译后备缓冲器(206,208),从而减少TLB未命中率 并提高性能。 可以预先填充TLB,其中可以基于预测来选择与预填充相对应的地址。 可以从外部设备(214)或基于步幅值导出预测,其中步幅值可以是预定常数或基于访问模式(216)动态地改变。 预填充TLB可以有效地消除从关键路径确定TLB未命中的地址转换所涉及的延迟。

Patent Agency Ranking