Abstract:
A system comprises a first node including data having an associated D-state and a second node operative to provide a source broadcast requesting the data. The first node is operative in response to the source broadcast to provide the data to the second node and transition the state associated with the data at the first node from the D-state to an O-state without concurrently updating memory. An S-state is associated with the data at the second node.
Abstract:
A retry-based mechanism resolves late race conditions in a computer system between a first processor writing modified data back to main memory and a second processor trying to obtain a copy of the modified data. A low occupancy cache coherency protocol tracks ownership and sharing status of memory blocks. When a memory reference operation forwarded from the second processor results in a miss at the first processor's cache, because the requested memory block was written back to memory, the first processor issues a Retry command to the second processor. In response to the Retry command, the second processor issues another memory reference operation. This time, however, the operation explicitly specifies the version of the memory block being written back to main memory. Once the memory block has been written back to main memory, thereby providing main memory with the desired version, a copy is sent to the second processor.
Abstract:
A method and apparatus for a mechanism for handling explicit writeback in a cache coherent multi-node architecture is described. In one embodiment, the invention is a method. The method includes receiving a read request relating to a first line of data in a coherent memory system. The method further includes receiving a write request relating to the first line of data at about the same time as the read request is received. The method further includes detecting that the read request and the write request both relate to the first line. The method also includes determining which request of the read and write request should proceed first. Additionally, the method includes completing the request of the read and write request which should proceed first.
Abstract:
A processor (300) in a distributed shared memory system (10) has ownership of a cache line. The processor modifies the cache line and wishes to update the home memory (17) of the cache line with the modification. The processor (300) generates a return request for routing by a processor interface (24). Meanwhile, a second processor (400) wishes to obtain ownership of the cache line and sends a read request to a memory directory (22) associated with the home memory (17) of the cache line. The memory directory (22) generates an intervention request towards the processor interface (24) corresponding to the last known location of the cache line. The processor interface (24) has now forwarded the return request to the memory directory (22) but subsequent to the read request from the second processor (400). Rather than waiting for an acknowledgment from the memory directory (22) that the return request has been processed, the processor interface (24) sends an intervention response to the second processor that includes the modified cache line.
Abstract:
L1 cache synonyms in a two-level cache system are detected and resolved by logic in the L2 cache. Duplicate copies of the L1 cache tags and state (“Dtags”) are maintained in the L2 cache. After a miss occurs in the L1 cache, the Dtags in the second-level cache that correspond to all possible synonym locations in the L1 cache are searched for synonyms. If a synonym is found, the L2 cache notifies the L1 cache where the requested cache line can be found in the L1 cache. The L1 cache then copies the cache line from the location where the synonym was found to the location where the miss occurred, and it invalidates the cache line at the original location. The Dtags in the second-level cache are updated to reflect the changes made in the L1 cache.
Abstract:
A retry-based mechanism resolves late race conditions in a computer system between a first processor writing modified data back to main memory and a second processor trying to obtain a copy of the modified data. A low occupancy cache coherency protocol tracks ownership and sharing status of memory blocks. When a memory reference operation forwarded from the second processor results in a miss at the first processor's cache, because the requested memory block was written back to memory, the first processor issues a Retry command to the second processor. In response to the Retry command, the second processor issues another memory reference operation. This time, however, the operation explicitly specifies the version of the memory block being written back to main memory. Once the memory block has been written back to main memory, thereby providing main memory with the desired version, a copy is sent to the second processor.
Abstract:
L1 cache synonyms in a two-level cache system are detected and resolved by logic in the L2 cache. Duplicate copies of the L1 cache tags and state (nullDtagsnull) are maintained in the L2 cache. After a miss occurs in the L1 cache, the Dtags in the second-level cache that correspond to all possible synonym locations in the L1 cache are searched for synonyms. If a synonym is found, the L2 cache notifies the L1 cache where the requested cache line can be found in the L1 cache. The L1 cache then copies the cache line from the location where the synonym was found to the location where the miss occurred, and it invalidates the cache line at the original location. The Dtags in the second-level cache are updated to reflect the changes made in the L1 cache.
Abstract:
One embodiment of the present invention provides a system that facilitates speculative load operations in a multiprocessor system. The system operates by maintaining a record of speculative load operations that have completed at a processor in the multiprocessor system, wherein a speculative load operation is a load operation that is speculatively initiated before a preceding load operation has returned. Next, the system receives an invalidation signal at an L1 cache that is coupled to the processor, wherein the invalidation signal indicates that a specific line in the L1 cache is to be invalidated. In response to this invalidation signal, the system examines the record of speculative load operations to determine if there exists a matching speculative load operation that is completed and is directed to the same location in the L1 cache that the invalidation signal is directed to. If there exists a matching speculative load operation, the system replays the matching speculative load operation so that the matching speculative load operation takes place after an event that caused the invalidation signal completes.
Abstract:
In a chip multiprocessor system, the coherence protocol is split into two cooperating protocols implemented by different hardware modules. One protocol is responsible for cache coherence management within the chip, and is implemented by a second-level cache controller. The other protocol is responsible for cache coherence management across chip multiprocessor nodes, and is implemented by separate cache coherence protocol engines. The cache controller and the protocol engine within each node communicate and synchronize memory transactions involving multiple nodes to maintain cache coherence within and across the nodes. The present invention addresses race conditions that arise during this communication and synchronization.
Abstract:
An apparatus for controlling a cache in a computing node, which is located between a node bus and an interconnection network to perform a cache coherence protocol, includes: a node bus interface for interfacing with the node bus; an interconnection network interface for interfacing with the interconnection network; a cache control logic means for controlling the cache to perform the cache coherence protocol; bus-side dual-port transaction buffers coupled between said node bus interface and said cache control logic means for buffering transaction requested and replied from or to local processors contained in the computing node; and network-side dual-port transaction buffers coupled between said interconnection network interface and said cache control logic for buffering transaction requested and replied from or to remote processors contained in another computing node coupled to the interconnection network.