Verbessern der Prozessorleistung für Befehlsfolgen, die Sperrbefehle enthalten

    公开(公告)号:DE112013000891T5

    公开(公告)日:2014-10-16

    申请号:DE112013000891

    申请日:2013-01-22

    Applicant: IBM

    Abstract: Eine Technik zum Verarbeiten einer Befehlsfolge, die einen Sperrbefehl, einen Ladebefehl vor dem Sperrbefehl und einen nachfolgenden Speicherzugriffsbefehl nach dem Sperrbefehl enthält, beinhaltet ein Feststellen, durch einen Prozessorkern, dass der Ladebefehl auf der Grundlage des Empfangs, durch den Prozessorkern, einer frühesten Antwort einer guten kombinierten Antwort für eine Leseoperation abgearbeitet ist, die dem Ladebefehl und Daten für den Ladebefehl entspricht. Wenn das Ausführen des nachfolgenden Speicherzugriffsbefehls nicht vor dem Beenden des Sperrbefehls eingeleitet wird, beinhaltet die Technik ferner als Reaktion auf ein Feststellen, dass der Sperrbefehl beendet ist, ein Einleiten, durch den Prozessorkern, des Ausführens des nachfolgenden Speicherzugriffsbefehls. Wenn das Ausführen des nachfolgenden Speicherzugriffsbefehls vor dem Beenden des Sperrbefehls eingeleitet wird, beinhaltet die Technik ferner als Reaktion auf ein Feststellen, dass der Sperrbefehl beendet ist, ein Unterbrechen, durch den Prozessorkern, des Verfolgens des nachfolgenden Speicherzugriffsbefehls im Hinblick auf ein Ungültigmachen.

    Processor performance improvement for instruction sequences that include barrier instructions

    公开(公告)号:AU2013217351A1

    公开(公告)日:2014-08-14

    申请号:AU2013217351

    申请日:2013-01-22

    Applicant: IBM

    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    Layering cache and architectural specific functions

    公开(公告)号:GB2325541B

    公开(公告)日:2002-04-17

    申请号:GB9806453

    申请日:1998-03-27

    Applicant: IBM

    Abstract: Cache and architectural specific functions are layered within a controller, simplifying design requirements. Faster performance may be achieved and individual segments of the overall design may be individually tested and formally verified. Transition between memory consistency models is also facilitated. Different segments of the overall design may be implemented in distinct integrated circuits, allowing less expensive processes to be employed where suitable.

    Layering cache and architectural-specific functions to permit generic interface definition

    公开(公告)号:GB2325764A

    公开(公告)日:1998-12-02

    申请号:GB9806536

    申请日:1998-03-27

    Applicant: IBM

    Abstract: Cache and architectural functions within a cache controller 202 are layered and provided with generic interfaces 220-230. Layering cache and architectural operations allows the definition of generic interfaces between controller logic 212-218 and bus interface units 204,208, within the controller. The generic interfaces are defined by extracting the essence of supported operations into a generic protocol. The interfaces themselves may be pulsed or held interfaces, depending on the character of the operation. Because the controller logic is isolated from the specific protocols required by a processor or bus architecture, the design may be directly transferred to new controllers for different protocols or processors by modifying the bus interface units appropriately.

    Forward progress mechanism for stores in the presence of load contention in a system favoring loads

    公开(公告)号:GB2512804B

    公开(公告)日:2015-03-04

    申请号:GB201414384

    申请日:2013-01-23

    Applicant: IBM

    Abstract: A multiprocessor data processing system includes a plurality of cache memories including a cache memory. In response to the cache memory detecting a storage-modifying operation specifying a same target address as that of a first read-type operation being processed by the cache memory, the cache memory provides a retry response to the storage-modifying operation. In response to completion of the read-type operation, the cache memory enters a referee mode. While in the referee mode, the cache memory temporarily dynamically increases priority of any storage-modifying operation targeting the target address in relation to any second read-type operation targeting the target address.

    Handling of Deallocation Requests in System Having Upper and Lower Level Caches

    公开(公告)号:GB2502662A

    公开(公告)日:2013-12-04

    申请号:GB201303300

    申请日:2013-02-25

    Applicant: IBM

    Abstract: A deallocate request specifying a target address associated with a target cache line is sent from processor core to lower level cache; if the request hits the replacement order of the lower level cache is updated such that the target is more likely to be evicted (e.g. making the target line least recently used [LRU]) in response to a subsequent cache miss. The replacement order may not be updated with further accesses to target cache line prior to eviction. The lower cache may include load and store pipelines, with deallocation requests sent to the load pipeline. The deallocate instruction may be executed at completion dataset processing, and may be sent to lower level cache regardless of hitting in the upper cache. Lower cache may include state machines servicing data requests, with retaining and updating performed without allocation of state machine/s to the request. A compiler may insert the deallocation instruction into program code executed by the processor core, in response to the detection of an end of dataset processing. An interconnect fabric may couple the processing units.

    Forward progress mechanism for stores in the presence of load contention in a system favouring loads by state alteration.

    公开(公告)号:GB2500964A

    公开(公告)日:2013-10-09

    申请号:GB201300936

    申请日:2013-01-18

    Applicant: IBM

    Abstract: Disclosed is a cache coherency protocol for multiprocessor data processing systems 104. The systems have a set of cache memories 230. A cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected. The coherence state may be a first state indicating that the target cache line can source copies of the target cache line to requestors. In response to issuing the read-type operation, the cache memory receiving a coherence message indicating the state, wherein setting the coherence state for the target cache line comprises the cache memory setting the coherence state to the first state indicated by the coherence message if the competing store-type operation is not detected.

Patent Agency Ranking