Method and device for reducing memory latency in software application
    1.
    发明专利
    Method and device for reducing memory latency in software application 有权
    用于在软件应用中减少存储器延迟的方法和装置

    公开(公告)号:JP2011090705A

    公开(公告)日:2011-05-06

    申请号:JP2010286087

    申请日:2010-12-22

    CPC classification number: G06F9/3851 G06F8/4442 G06F9/383 G06F9/4843 G06F9/52

    Abstract: PROBLEM TO BE SOLVED: To provide a method and a device for reducing a memory latency in a software application.
    SOLUTION: A performance analysis tool 208 is used to profile a resource use amount of the software application 210, and specifies an area of the software application 210 experiencing a performance bottleneck. A compiler runtime command is generated within the software application, to generate and manage a helper thread. The helper thread prefetches a data in the specified areas of the software application experiencing the performance bottleneck. A counting mechanism is inserted into the helper thread and the counting mechanism is inserted into a main thread, to help ensure the prefetched data is not removed from a cache before the main thread is able to take advantage of the prefetched data.
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:提供一种用于减少软件应用程序中的存储器延迟的方法和装置。 解决方案:性能分析工具208用于描述软件应用210的资源使用量,并指定软件应用210遇到性能瓶颈的区域。 在软件应用程序中生成编译器运行时命令,以生成和管理辅助线程。 辅助线程将预览遇到性能瓶颈的软件应用程序的指定区域中的数据。 计数机制被插入到辅助线程中,并且计数机制被插入到主线程中,以帮助确保在主线程能够利用预取数据之前,预取数据不被从高速缓存中移除。 版权所有(C)2011,JPO&INPIT

    Mechanism for instruction set based on thread execution on plurality of instruction sequencers
    2.
    发明专利
    Mechanism for instruction set based on thread execution on plurality of instruction sequencers 有权
    基于指令序列的多项式执行指令集的机制

    公开(公告)号:JP2011023032A

    公开(公告)日:2011-02-03

    申请号:JP2010204922

    申请日:2010-09-13

    CPC classification number: G06F9/3851 G06F9/4843

    Abstract: PROBLEM TO BE SOLVED: To provide a mechanism for scheduling user-level threads so that the user-level threads can be executed on a processor that is not directly managed by an OS.
    SOLUTION: User-level threads on a first instruction sequencer are managed in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least (1) a field that makes reference to one or more instruction sequencers or (2) implicitly references with a pointer to a code that specifically addresses one or more instruction sequencers when the code is executed.
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:提供一种用于调度用户级线程的机制,使得可以在不由OS直接管理的处理器上执行用户级线程。 解决方案:响应于在应用级程序的控制下的第二指令定序器上执行用户级指令来管理第一指令定序器上的用户级线程。 在第二指令定序器上运行第一用户级线程并且包含一个或多个用户级指令。 第一用户级指令至少具有(1)引用一个或多个指令定序器的字段,或(2)隐含地引用指向代码执行时特定地址一个或多个指令定序器的代码的指针。 版权所有(C)2011,JPO&INPIT

    LOOP PARALLELIZATION BASED ON LOOP SPLITTING OR INDEX ARRAY
    3.
    发明申请
    LOOP PARALLELIZATION BASED ON LOOP SPLITTING OR INDEX ARRAY 审中-公开
    基于环路分割或索引阵列的环路并行化

    公开(公告)号:WO2012087988A3

    公开(公告)日:2012-09-27

    申请号:PCT/US2011065948

    申请日:2011-12-19

    CPC classification number: G06F8/456 G06F8/4441

    Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.

    Abstract translation: 描述了基于环路分割和/或索引阵列提供环路并行化的方法和装置。 在一个实施例中,基于错误猜测信息生成对应于原始循环的一个或多个分割循环。 在另一个实施例中,基于索引阵列从原始循环生成多个子循环。 还描述了其它实施例。

    METHODS AND APPARATUS FOR REDUCING MEMORY LATENCY IN A SOFTWARE APPLICATION
    5.
    发明申请
    METHODS AND APPARATUS FOR REDUCING MEMORY LATENCY IN A SOFTWARE APPLICATION 审中-公开
    用于减少软件应用中的存储器延迟的方法和装置

    公开(公告)号:WO2005033926A3

    公开(公告)日:2005-12-29

    申请号:PCT/US2004032212

    申请日:2004-09-29

    Applicant: INTEL CORP

    CPC classification number: G06F9/3851 G06F8/4442 G06F9/383 G06F9/4843 G06F9/52

    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

    Abstract translation: 公开了用于减少软件应用中的存储器延迟的方法和设备。 所公开的系统使用一个或多个助手线程来预取主线程的变量以减少由于存储器延迟和/或缓存未命中导致的性能瓶颈。 性能分析工具用于剖析软件应用程序的资源使用情况,并识别出现性能瓶颈的软件应用程序中的区域。 编译器运行时指令生成到软件应用程序中以创建和管理帮助程序线程。 帮助程序线程会预取遇到性能瓶颈的软件应用程序的标识区域中的数据。 将计数机制插入到辅助线程中,并将计数机制插入到主线程中,以协调辅助线程与主线程的执行,并帮助确保在主线程能够执行之前,预取的数据不会从缓存中移除 利用预取数据。

    SYSTEM, METHOD AND APPARATUS FOR DEPENDENCY CHAIN PROCESSING
    7.
    发明申请
    SYSTEM, METHOD AND APPARATUS FOR DEPENDENCY CHAIN PROCESSING 审中-公开
    用于依赖链处理的系统,方法和装置

    公开(公告)号:WO2006036504A2

    公开(公告)日:2006-04-06

    申请号:PCT/US2005032118

    申请日:2005-09-12

    Applicant: INTEL CORP

    CPC classification number: G06F8/443 G06F8/433 G06F8/451

    Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

    Abstract translation: 本发明的实施例提供了一种方法,装置和系统,其可以包括将依赖链分解成一组缩减宽度的依赖性链; 将一个或多个依赖关系链映射到一个或多个聚类依赖链处理器上,其中一个或多个所述簇的问题宽度适于适应所述依赖关系链的大小; 和/或并行处理多个跟踪的依赖性链。 描述和要求保护其他实施例。

    MULTI-ENTRY THREADING METHOD AND APPARATUS FOR AUTOMATIC AND DIRECTIVE-GUIDED PARALLELIZATION OF A SOURCE PROGRAM
    9.
    发明申请
    MULTI-ENTRY THREADING METHOD AND APPARATUS FOR AUTOMATIC AND DIRECTIVE-GUIDED PARALLELIZATION OF A SOURCE PROGRAM 审中-公开
    用于自动和方向引导的源程序并行化的多入口打包方法和装置

    公开(公告)号:WO0203194A2

    公开(公告)日:2002-01-10

    申请号:PCT/US0118614

    申请日:2001-06-08

    CPC classification number: G06F8/456 G06F8/443

    Abstract: A method and apparatus for compiling a source program are described. Multiple predetermined sequences within the source program are located. A start code is inserted in the source program prior to a first instruction of each predetermined sequence. An invocation code is inserted in the source program prior to the start code, the invocation code addressing the start code and transferring each sequence to a system for execution. Finally, a stop code is inserted in the source program after a last instruction of each sequence, the stop code signaling to the system to step execution of the sequence.

    Abstract translation: 描述用于编译源程序的方法和装置。 位于源程序内的多个预定序列。 在每个预定序列的第一指令之前,在源程序中插入起始码。 在起始代码之前的源程序中插入一个调用代码,调用代码寻址起始代码,并将每个序列传送到一个系统执行。 最后,在每个序列的最后一个指令之后,在源程序中插入一个停止代码,停止代码向系统发出信号以逐步执行序列。

Patent Agency Ranking