COMPUTER-IMPLEMENTED METHOD AND PROCESSING UNIT FOR PREDICTING BRANCH TARGET ADDRESSES
    1.
    发明申请
    COMPUTER-IMPLEMENTED METHOD AND PROCESSING UNIT FOR PREDICTING BRANCH TARGET ADDRESSES 审中-公开
    用于预测分支目标地址的计算机实现的方法和处理单元

    公开(公告)号:WO2007042482A2

    公开(公告)日:2007-04-19

    申请号:PCT/EP2006067155

    申请日:2006-10-06

    CPC classification number: G06F9/30058 G06F9/3806

    Abstract: Under the present invention, a branch target address corresponding to a target instruction to be pre-fetched is predicted based on two values. The first value is a "predictor value" that is known for the branch target address. The second value is the address of the branch instruction from which the target instruction is branched to within the program code. Once these two values are provided, they can be processed (e.g., hashed) to yield an index value, which is used to obtain a predicted branch target address from a cache. This technique is generally implemented for branch instructions such as switch statements or polymorphic calls. In the case of the former, the predictor value is a selector operand, while in the case of the latter the predictor value is a class object address (in JAVA) or a virtual function table address (in C++).

    Abstract translation: 在本发明中,基于两个值来预测与要被预取的目标指令对应的分支目标地址。 第一个值是分支目标地址已知的“预测值”。 第二个值是目标指令从程序代码中分支到的分支指令的地址。 一旦提供了这两个值,就可以对它们进行处理(例如,散列)以产生索引值,索引值用于从缓存中获得预测分支目标地址。 这种技术通常用于分支指令,例如switch语句或多态调用。 在前者的情况下,预测值是一个选择器操作数,而在后者的情况下,预测值是类对象地址(在JAVA中)或虚函数表地址(在C ++中)。

    Method for making computer for multipath dynamic profiling execute, system and computer program
    2.
    发明专利
    Method for making computer for multipath dynamic profiling execute, system and computer program 有权
    用于制作多路动态分布执行,系统和计算机程序的计算机的方法

    公开(公告)号:JP2011022993A

    公开(公告)日:2011-02-03

    申请号:JP2010109001

    申请日:2010-05-11

    Abstract: PROBLEM TO BE SOLVED: To optimize execution of an application in a compiler.
    SOLUTION: In a method for making a computer execute, a plurality of code regions of an application are instrumented with annotations for generating profile data (S410), the execution of the application instrumented with code regions generates profile data for each of the plurality of code regions (S420), a delinquent code region is identified on the basis of the profile data (S430), a plurality of code partial regions of the delinquent code region are instrumented with annotations for generating profile data (S440), the execution of the application having the instrumented code partial regions generates profile data (S450), the delinquent code partial region is identified on the basis of the generated profile data (S460), and application execution is optimized by using the delinquent code partial region (S470).
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:优化编译器中应用程序的执行。 解决方案:在一种用于使计算机执行的方法中,应用程序的多个代码区域具有用于生成简档数据的注释(S410),用代码区域检测的应用程序的执行生成用于 多个代码区域(S420),基于简档数据来识别违规代码区域(S430),对代码区域的多个代码部分区域进行了用于生成简档数据的注释(S440),执行 (S450),根据生成的简档数据识别违法代码部分区域(S460),通过使用违规代码部分区域来优化应用程序执行(S470) 。 版权所有(C)2011,JPO&INPIT

    COMPILER INSTRUMENTATION INFRASTRUCTURE TO FACILITATE MULTIPLE PASS AND MULTIPLE PURPOSE DYNAMIC ANALYSIS

    公开(公告)号:CA2672337C

    公开(公告)日:2017-01-03

    申请号:CA2672337

    申请日:2009-07-15

    Abstract: Systems, methods and articles of manufacture are disclosed for optimizing execution of an application. A plurality of code regions of the application may be instrumented with annotations for generating profile data for each of the plurality of code regions. Profile data for each of the plurality of code regions may be generated via executing the application having instrumented code regions. A delinquent code region may be identified based on the generated profile data for each of the plurality of code regions. A plurality of code sub-regions of the identified delinquent code region may be instrumented with annotations for generating profile data for each of the plurality of code sub-regions. Profile data for each of the plurality of code sub-regions may be generated via executing the application having instrumented code sub-regions. A delinquent code sub-region may be identified based on the generated profile data for each of the plurality of code sub-regions. Execution of the application may be optimized using the identified delinquent code sub-region.

    LOOP ALLOCATION FOR OPTIMIZING COMPILERS

    公开(公告)号:CA2288614C

    公开(公告)日:2004-05-11

    申请号:CA2288614

    申请日:1999-11-08

    Applicant: IBM CANADA

    Abstract: Loop allocation for optimizing compilers includes the generation of a progra m dependence graph for a source code segment. Control dependence graph representations of the nested loops, from innermost to outermost, are generated and data dependence graph representations are generated for each level of nested loop as constrained by the control dependence graph. An interference graph is generated with the nodes of the data dependence graph. Weights are generated for the edges of the interference graph reflecting the affinity between statements represented by the nodes joined by the edges. Nodes in the interference graph are given weights reflecting resource usage by the statements associated with the nodes. The interference graph is partitioned using a profitability test based on the weights of edges and nodes and on a correctness test based on the reachability of nodes in the data dependence graph. Code is emitted based on the partitioned interference graph.

    INTERPROCEDURAL DEAD STORE ELIMINATION

    公开(公告)号:CA2321016A1

    公开(公告)日:2002-03-27

    申请号:CA2321016

    申请日:2000-09-27

    Applicant: IBM CANADA

    Abstract: A system for optimizing computer code generation by carrying out interprocedural dead store elimination. The system carries out a top down traversal of a call graph in an intermediate representation of the code being compiled. Live on exit (LOE) sets are defin ed for variables at call points for functions in the code being compiled. Bit vectors representing th e LOE sets for call points for functions are stored in an LOE table indexed or hashed by call graph edges. For each function definition reached in the call graph traversal, a LOE set for the function itself is generated by taking the union of the LOE call point sets. The entries in the LOE table for the L OE call point sets are then removed. The LOE set for each function is used to determine if variables tha t are the subject of a store operation in a function may be subject to a dead store elimination optimization.

    COMPILER WITH CACHE UTILIZATION OPTIMIZATIONS

    公开(公告)号:CA2503263A1

    公开(公告)日:2005-10-30

    申请号:CA2503263

    申请日:2005-04-19

    Applicant: IBM

    Abstract: A compiling program with cache utilization optimizations employs an inter- procedural global analysis of the data access patterns of compile units to be processed . The global analysis determines sufficient information to allow intelligent application of optimization techniques to be employed to enhance the operation and utilization of the available cache systems on target hardware.

    OPTIMIZING SOURCE CODE FOR ITERATIVE EXECUTION

    公开(公告)号:CA2365375A1

    公开(公告)日:2003-06-18

    申请号:CA2365375

    申请日:2001-12-18

    Applicant: IBM CANADA

    Abstract: An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operativel y coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to stor e a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.

    9.
    发明专利
    未知

    公开(公告)号:DE602006012721D1

    公开(公告)日:2010-04-15

    申请号:DE602006012721

    申请日:2006-10-02

    Applicant: IBM

    Abstract: A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.

    OPTIMIZING COMPILATION BY FORWARD STORE MOVEMENT

    公开(公告)号:CA2321018A1

    公开(公告)日:2002-03-27

    申请号:CA2321018

    申请日:2000-09-27

    Applicant: IBM CANADA

    Abstract: An optimizing compiler includes a component for the determination of potenti al forward movements of store operations in the compilation of the computer software code. An intermediate representation of computer code is generated including a control flow graph, a data flow graph, a dominator tree, and a reaching defs table. These data structures are accessed to determine where in a conditional branch of code a store operation in the code may be moved to potentially increase efficiency in the execution of the compiled code. Tree structures corresponding to store operations are identified for possible movement, either entirely, or partially. Where a movement of a part of a tree structure is identified, temporary registers may be used to retain values of variables to enable the move to be carried out.

Patent Agency Ranking