Abstract:
A digital system is provided with a several processors, a private level one (L1) cache associated with each processor, a shared level two (L2) cache having several segments per entry, and a level three (L3) physical memory. The shared L2 cache architecture is embodied with 4-way associativity, four segments per entry and four valid and dirty bits. When the L2-cache misses, the penalty to access to data within the L3 memory is high. The system supports miss under miss to let a second miss interrupt a segment prefetch being done in response to a first miss. Thus, an interruptible SDRAM to L2-cache prefetch system with miss under miss support is provided. A shared translation look-aside buffer (TLB) is provided for L2 accesses, while a private TLB is associated with each processor. A micro TLB (nullTLB) is associated with each resource that can initiate a memory transfer. The L2 cache, along with all of the TLBs and nullTLBs have resource ID fields and task ID fields associated with each entry to allow flushing and cleaning based on resource or task. Configuration circuitry is provided to allow the digital system to be configured on a task by task basis in order to reduce power consumption.
Abstract:
A digital system is provided with a memory (506) shared by several initiator resources (540-550), wherein a portion of the initiator resources are big endian and another portion of the initiator resources are little endian. The memory is segregated into a set of regions by a memory management unit (MMU) (500-510) and an endianism attribute bit is defined for each region. For each memory request to the memory, the endianism attribute bit for the selected region is provided by the MMU. Each memory transaction request is completed in accordance with the endianism attribute of the selected region. Depending on the capability of a given initiator resource, the memory request address is adjusted to agree with the endianism attribute of the selected region, or an access fault is generated (530) if the endianism of the initiating resource does not match the endianism attribute of the selected memory region. A resource identification value (R-ID) provided by each of the initiator resources is used to identify the endianism of each of the initiator resources.
Abstract:
A digital system and method of operation is provided in which the digital system has at least one processor, with an associated multi-segment cache memory circuit (506(n). A single global validity circuit (VIG) is connected to the memory circuit and is operable to indicate if any segment of the multiple segments holds valid data. Block transfer circuitry (700, 702) is connected to the memory circuit and is operable to transfer a block of data (1650) to a selected portion of segments (1606) of the cache memory circuit. The block circuitry is operable to transfer data from a pre-selected region of the secondary memory (1650) to a particular segment of the plurality of segments and to assert the global valid bit at the completion of a block transfer. Direct memory access (DMA) circuitry (1610) is connected to the memory cache for transferring data between the memory cache and a selectable region (1650) of a secondary memory and is also operable to assert the global valid bit at the completion of a DMA block transfer. The cache can be operated in a first manner such that when a transfer request from the processor requests a first location in the cache memory that does not hold valid data, valid data is transferred (1652) from a pre-selected location in a secondary memory that corresponds directly to the first location. The cache can then be operated in a second manner such that data is transferred (1662) between the first location and a selectable location in the secondary memory, wherein the selected location need not directly correspond to the first location.
Abstract:
A digital system and method of operation is provided in which the digital system has at least one processor, with an associated multi-segment cache memory circuit (506(n). Validity circuitry (VI) is connected to the memory circuit and is operable to indicate if each segment of the plurality of segments holds valid data. Block transfer circuitry (700, 702) is connected to the memory circuit and is operable to transfer a block of data (1650) to a selected portion of segments (1606) of the cache memory circuit. Fetch circuitry associated with the memory cache is operable to transfer data from a pre-selected region of the secondary memory (1650) to a particular segment of the plurality of segments and to assert a first valid bit corresponding to the segment when the miss detection circuitry (1610) detects a miss in the segment. Direct memory access (DMA) circuitry (1610) is connected to the memory cache for transferring data between the memory cache and a selectable region (1650) of a secondary memory. The cache can be operated in a first manner such that when a transfer request from the processor requests a first location in the cache memory that does not hold valid data, valid data is transferred (1652) from a pre-selected location in a secondary memory that corresponds directly to the first location. The cache can then be operated in a second manner such that data is transferred (1662) between the first location and a selectable location in the secondary memory, wherein the selected location need not directly correspond to the first location.
Abstract:
The present invention is a method for automating database bufferpool tuning for optimized performance that employs certain heuristic algorithms to achieve its goals. Over a period of time, memory (bufferpool) performance is measured and accumulated in a repository. The repository becomes a knowledge base that is accessed by the algorithms and the ideal memory (bufferpool) configurations, which optimize database performance, are learned and implemented. The sampling of performance continues at regular intervals and the knowledge base continues to grow. As knowledge continues to accumulate, the algorithms are forbidden from becoming complacent. The ideal bufferpool memory configurations are regularly reevaluated to ensure they continue to be optimal given potential changes in the database's use or access patterns.
Abstract:
A method for determining a single reference residency time of a cache comprises the steps of causing test data to be staged to the cache and measuring a response time after a wait time has elapsed. The measuring step is repeated for a plurality of values of wait time. The method also includes the step of determining a boundary value of the wait time. A wait time of less than or equal to the boundary value yields a corresponding response time representing a cache hit and a wait time of greater than the boundary value yields a corresponding response time representing a cache miss. The boundary value is an estimate of the single reference residency time of the cache.
Abstract:
A link-time optimization scheme is capable of removing from dead code from code fragments in a program which arise after the linking of code fragments. The scheme may be applied runtime to fragments which are linked in a caching dynamic translator or applied when linking fragments subsequent to the compilation of object code. The removal of dead code may be facilitated by the use of epilogs corresponding to exits from a fragment and prologs corresponding to entries into a fragment.
Abstract:
The invention is used to balance and control traffic within a single internet site or between multiple sites. A static or dynamic analysis of site usage is performed. When an under-visited portion of a site is identified, then a link or message is created on a more-visited portion of the web-site to motivate the user visiting the site to explore the under-visited portion. The portion can be a specific page of the web site, or an entire section.
Abstract:
An invention for reassigning data elements of an application to cache lines to decrease the occurrence of cache line faults is described. First, an application is executed and used in a typically manner. While the application is running, data is collected concerning the loading and storing of data elements. This collection process creates a massive volume of data that is then processed to determine correlations between the loading and storing pairs of elements within each of the application's data structures. These correlations provide a mechanism for weighing the probability of pairs of intra-structure data elements being accessed in sequence, which is best accomplished when the data elements are within a single cache line. A set of simultaneous equations describe the probabilities using the data recording the correlations. These equations are then solved using commonly known linear programming techniques to derive a suggested ordering of data structures. An interactive editor is then used to reorder these data elements in the derived preferred order as authorized by the programmer.
Abstract:
The present invention discloses a method, apparatus, and article of manufacture for monitoring performance of a parallel database in a computer. In accordance with the present invention, the parallel database is stored on a data storage device in the computer. Groups of database nodes are identified. Collection time periods for collecting performance statistics from the identified group of database nodes are determined. Performance statistics are periodically collected from a subset of each identified group of nodes during the collection time periods. The collected performance statistics are stored in a memory connected to the computer and re-used when collecting performance statistics from one or more groups of database nodes in a succeeding collection time period.