Forward progress mechanism for stores in the presence of load contention in a system favouring loads by state alteration.

    公开(公告)号:GB2500964A

    公开(公告)日:2013-10-09

    申请号:GB201300936

    申请日:2013-01-18

    Applicant: IBM

    Abstract: Disclosed is a cache coherency protocol for multiprocessor data processing systems 104. The systems have a set of cache memories 230. A cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected. The coherence state may be a first state indicating that the target cache line can source copies of the target cache line to requestors. In response to issuing the read-type operation, the cache memory receiving a coherence message indicating the state, wherein setting the coherence state for the target cache line comprises the cache memory setting the coherence state to the first state indicated by the coherence message if the competing store-type operation is not detected.

    A method and apparatus for adding and removing components of a data processing system without powering down

    公开(公告)号:GB2334120B

    公开(公告)日:2001-05-02

    申请号:GB9909356

    申请日:1997-09-30

    Applicant: IBM

    Abstract: A method and system for providing the ability to add or remove components of a data processing system without powering the system down ("Hot-plug"). The system includes an arbiter, residing within a Host Bridge, Control & Power logic, and a plurality of in-line switch modules coupled to a bus. Each of the in-line switch modules provide isolation for load(s) connected thereto. The Host Bridge in combination with the Control & Power Logic implement the Hot-plug operations such as ramping up and down of the power to a selected slot, and activating the appropriate in-line switches for communication from/to a load (target/controlling master).

    I/O PAGE KILL DEFINITION FOR IMPROVED DMA AND L1/L2 CACHE PERFORMANCE

    公开(公告)号:CA2298780A1

    公开(公告)日:2000-09-30

    申请号:CA2298780

    申请日:2000-02-16

    Applicant: IBM

    Abstract: A special 'I/O' page, is defined as having a large size (e.g., 4K bytes), but with distinctive cache line characteristics. For DMA reads, the first cache line in the I/O page may be accessed, by a PCI Host Bridge, as a cacheable read and all other lines are noncacheable access (DMA Read with no intent to cache). For DMA writes, the PCI Host Bridge accesses all cache lines as cacheable. The PCI Host Bridge maintains a cache snoop granularity of the I/O page size for data, which means that if the Host Bridge detects a store (invalidate) type system bus operation on any cache line within an I/O page, cached data within that page is invalidated (L1/L2 caches continue to treat all cache lines in this page as cacheable. By defining the first line as cacheable, only one cache line need be invalidated on the system bus by the L1/L2 cache in order to cause invalidation of the whole page of data in the PCI Host Bridge. All stores to the other cache lines in the I/O Page can occur directly in the L1/L2 cache without system bus operations, since these lines have been left in the 'modified' state in the L1/L2 cache.

    Processor performance improvement for instruction sequences that include barrier instructions

    公开(公告)号:AU2013217351B2

    公开(公告)日:2016-04-28

    申请号:AU2013217351

    申请日:2013-01-22

    Applicant: IBM

    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    Virtual Machine Backup
    36.
    发明专利

    公开(公告)号:GB2516083A

    公开(公告)日:2015-01-14

    申请号:GB201312417

    申请日:2013-07-11

    Applicant: IBM

    Abstract: A system comprises a processor running a hypervisor for virtual machines (VMs), a cache, e.g. a write-back cache, and a memory storing VM images for a differential check-pointing failover technique. The cache comprises rows having a memory address, a cache line, and an image modification flag. The modification flag is set (430) when a cache line is modified (420) by a backed-up VM (425), for which an image is saved in memory, while hypervisor actions in privilege mode do not set the flag. Flagged cache lines addresses are written in a log of the memory upon eviction (440) or during periodic checkpoints. Replication of the VM image in another memory can be obtained by fetching the cache lines stored at the logged addresses. Using the modification bit flag instead of dirty bit tags ensures that modified cache lines are written to the log without being flushed at the same time.

    Processor performance improvement for instruction sequences that include barrier instructions

    公开(公告)号:GB2513509A

    公开(公告)日:2014-10-29

    申请号:GB201414381

    申请日:2013-01-22

    Applicant: IBM

    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    Verbessern der Prozessorleistung für Befehlsfolgen, die Sperrbefehle enthalten

    公开(公告)号:DE112013000891T5

    公开(公告)日:2014-10-16

    申请号:DE112013000891

    申请日:2013-01-22

    Applicant: IBM

    Abstract: Eine Technik zum Verarbeiten einer Befehlsfolge, die einen Sperrbefehl, einen Ladebefehl vor dem Sperrbefehl und einen nachfolgenden Speicherzugriffsbefehl nach dem Sperrbefehl enthält, beinhaltet ein Feststellen, durch einen Prozessorkern, dass der Ladebefehl auf der Grundlage des Empfangs, durch den Prozessorkern, einer frühesten Antwort einer guten kombinierten Antwort für eine Leseoperation abgearbeitet ist, die dem Ladebefehl und Daten für den Ladebefehl entspricht. Wenn das Ausführen des nachfolgenden Speicherzugriffsbefehls nicht vor dem Beenden des Sperrbefehls eingeleitet wird, beinhaltet die Technik ferner als Reaktion auf ein Feststellen, dass der Sperrbefehl beendet ist, ein Einleiten, durch den Prozessorkern, des Ausführens des nachfolgenden Speicherzugriffsbefehls. Wenn das Ausführen des nachfolgenden Speicherzugriffsbefehls vor dem Beenden des Sperrbefehls eingeleitet wird, beinhaltet die Technik ferner als Reaktion auf ein Feststellen, dass der Sperrbefehl beendet ist, ein Unterbrechen, durch den Prozessorkern, des Verfolgens des nachfolgenden Speicherzugriffsbefehls im Hinblick auf ein Ungültigmachen.

    Processor performance improvement for instruction sequences that include barrier instructions

    公开(公告)号:AU2013217351A1

    公开(公告)日:2014-08-14

    申请号:AU2013217351

    申请日:2013-01-22

    Applicant: IBM

    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

    Handling of Deallocation Requests and Castouts in System Having Upper and Lower Level Caches

    公开(公告)号:GB2502663A

    公开(公告)日:2013-12-04

    申请号:GB201303302

    申请日:2013-02-25

    Applicant: IBM

    Abstract: A deallocate request specifying a target address associated with a target cache line is sent from processor core to lower level cache; if the request hits, replacement order of lower level cache is updated such that the target is more likely to be evicted (e.g. making the target line least recently used [LRU]) in response to a subsequent cache miss. On a subsequent miss, the target line is cast out to the lower level cache with an indication that the line was deallocation request target (e.g. by setting a field in directory). The lower level cache may include load and store pipelines, with the deallocation request sent to the load pipeline. The deallocation may be executed at completion of dataset processing. Lower cache may include state machines servicing data requests, with retaining and updating performed without allocation of state machine/s to the request. A previous coherence state of the target may be retained. An interconnect fabric may connect processing units.

Patent Agency Ranking