Abstract:
Under the present invention, a branch target address corresponding to a target instruction to be pre-fetched is predicted based on two values. The first value is a "predictor value" that is known for the branch target address. The second value is the address of the branch instruction from which the target instruction is branched to within the program code. Once these two values are provided, they can be processed (e.g., hashed) to yield an index value, which is used to obtain a predicted branch target address from a cache. This technique is generally implemented for branch instructions such as switch statements or polymorphic calls. In the case of the former, the predictor value is a selector operand, while in the case of the latter the predictor value is a class object address (in JAVA) or a virtual function table address (in C++).
Abstract:
PROBLEM TO BE SOLVED: To optimize execution of an application in a compiler. SOLUTION: In a method for making a computer execute, a plurality of code regions of an application are instrumented with annotations for generating profile data (S410), the execution of the application instrumented with code regions generates profile data for each of the plurality of code regions (S420), a delinquent code region is identified on the basis of the profile data (S430), a plurality of code partial regions of the delinquent code region are instrumented with annotations for generating profile data (S440), the execution of the application having the instrumented code partial regions generates profile data (S450), the delinquent code partial region is identified on the basis of the generated profile data (S460), and application execution is optimized by using the delinquent code partial region (S470). COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
Systems, methods and articles of manufacture are disclosed for optimizing execution of an application. A plurality of code regions of the application may be instrumented with annotations for generating profile data for each of the plurality of code regions. Profile data for each of the plurality of code regions may be generated via executing the application having instrumented code regions. A delinquent code region may be identified based on the generated profile data for each of the plurality of code regions. A plurality of code sub-regions of the identified delinquent code region may be instrumented with annotations for generating profile data for each of the plurality of code sub-regions. Profile data for each of the plurality of code sub-regions may be generated via executing the application having instrumented code sub-regions. A delinquent code sub-region may be identified based on the generated profile data for each of the plurality of code sub-regions. Execution of the application may be optimized using the identified delinquent code sub-region.
Abstract:
Loop allocation for optimizing compilers includes the generation of a progra m dependence graph for a source code segment. Control dependence graph representations of the nested loops, from innermost to outermost, are generated and data dependence graph representations are generated for each level of nested loop as constrained by the control dependence graph. An interference graph is generated with the nodes of the data dependence graph. Weights are generated for the edges of the interference graph reflecting the affinity between statements represented by the nodes joined by the edges. Nodes in the interference graph are given weights reflecting resource usage by the statements associated with the nodes. The interference graph is partitioned using a profitability test based on the weights of edges and nodes and on a correctness test based on the reachability of nodes in the data dependence graph. Code is emitted based on the partitioned interference graph.
Abstract:
A system for optimizing computer code generation by carrying out interprocedural dead store elimination. The system carries out a top down traversal of a call graph in an intermediate representation of the code being compiled. Live on exit (LOE) sets are defin ed for variables at call points for functions in the code being compiled. Bit vectors representing th e LOE sets for call points for functions are stored in an LOE table indexed or hashed by call graph edges. For each function definition reached in the call graph traversal, a LOE set for the function itself is generated by taking the union of the LOE call point sets. The entries in the LOE table for the L OE call point sets are then removed. The LOE set for each function is used to determine if variables tha t are the subject of a store operation in a function may be subject to a dead store elimination optimization.
Abstract:
A compiling program with cache utilization optimizations employs an inter- procedural global analysis of the data access patterns of compile units to be processed . The global analysis determines sufficient information to allow intelligent application of optimization techniques to be employed to enhance the operation and utilization of the available cache systems on target hardware.
Abstract:
An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operativel y coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to stor e a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.
Abstract:
A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.
Abstract:
An optimizing compiler includes a component for the determination of potenti al forward movements of store operations in the compilation of the computer software code. An intermediate representation of computer code is generated including a control flow graph, a data flow graph, a dominator tree, and a reaching defs table. These data structures are accessed to determine where in a conditional branch of code a store operation in the code may be moved to potentially increase efficiency in the execution of the compiled code. Tree structures corresponding to store operations are identified for possible movement, either entirely, or partially. Where a movement of a part of a tree structure is identified, temporary registers may be used to retain values of variables to enable the move to be carried out.