Monitoring multiple memory locations for targeted stores in a shared-memory multiprocessor
    21.
    发明授权
    Monitoring multiple memory locations for targeted stores in a shared-memory multiprocessor 有权
    监控共享内存多处理器中目标存储的多个内存位置

    公开(公告)号:US08990503B2

    公开(公告)日:2015-03-24

    申请号:US13754700

    申请日:2013-01-30

    Abstract: A system and method for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. More specifically, the disclosed embodiments provide a system that notifies a waiting thread when a targeted store is directed to monitored memory locations. During operation, the system receives a targeted store which is directed to a specific cache in a shared-memory multiprocessor system. In response, the system examines a destination address for the targeted store to determine whether the targeted store is directed to a monitored memory location which is being monitored for a thread associated with the specific cache. If so, the system informs the thread about the targeted store.

    Abstract translation: 用于在共享存储器多处理器中支持目标存储的系统和方法。 目标商店使得第一处理器能够将要存储在第二处理器的高速缓冲存储器中的高速缓存行推送。 这消除了对多个高速缓存相干操作的需要,以将高速缓存行从第一处理器传送到第二处理器。 更具体地,所公开的实施例提供了当目标商店被引导到被监视的存储器位置时通知等待线程的系统。 在操作期间,系统接收目标商店,其被定向到共享存储器多处理器系统中的特定高速缓存。 作为响应,系统检查目标商店的目的地地址,以确定目标商店是否被引导到被监视的与特定高速缓存相关联的线程的监视的存储器位置。 如果是这样,系统通知线程有关目标商店。

    Systems and Methods for Adaptive Integration of Hardware and Software Lock Elision Techniques
    22.
    发明申请
    Systems and Methods for Adaptive Integration of Hardware and Software Lock Elision Techniques 有权
    硬件和软件锁定Elision技术的自适应集成系统和方法

    公开(公告)号:US20150026688A1

    公开(公告)日:2015-01-22

    申请号:US14254758

    申请日:2014-04-16

    Abstract: Particular techniques for improving the scalability of concurrent programs (e.g., lock-based applications) may be effective in some environments and for some workloads, but not others. The systems described herein may automatically choose appropriate ones of these techniques to apply when executing lock-based applications at runtime, based on observations of the application in the current environment and with the current workload. In one example, two techniques for improving lock scalability (e.g., transactional lock elision using hardware transactional memory, and optimistic software techniques) may be integrated together. A lightweight runtime library built for this purpose may adapt its approach to managing concurrency by dynamically selecting one or more of these techniques (at different times) during execution of a given application. In this Adaptive Lock Elision approach, the techniques may be selected (based on pluggable policies) at runtime to achieve good performance on different platforms and for different workloads.

    Abstract translation: 用于提高并发程序(例如基于锁的应用程序)的可扩展性的特殊技术在一些环境中以及对于一些工作负载而言可能是有效的,而不是其他工作负载。 基于当前环境中的应用和当前工作负载的观察,本文所述的系统可以自动选择在运行时执行基于锁的应用时应用的这些技术中适当的系统。 在一个示例中,可以集成两种用于提高锁可伸缩性的技术(例如,使用硬件事务存储器的事务锁定检测和乐观软件技术)。 为此目的而构建的轻量级运行时库可以通过在执行给定应用程序期间动态选择这些技术(在不同时间)中的一个或多个技术来调整其方法来管理并发性。 在这种自适应锁定Elision方法中,可以在运行时选择(基于可插拔策略)的技术,以在不同的平台和不同的工作负载下实现良好的性能。

    SUPPORTING TARGETED STORES IN A SHARED-MEMORY MULTIPROCESSOR SYSTEM
    23.
    发明申请
    SUPPORTING TARGETED STORES IN A SHARED-MEMORY MULTIPROCESSOR SYSTEM 有权
    在共享存储器多处理器系统中支持目标存储

    公开(公告)号:US20140089591A1

    公开(公告)日:2014-03-27

    申请号:US13625700

    申请日:2012-09-24

    CPC classification number: G06F9/50 G06F9/5066 G06F9/544 G06F12/0888

    Abstract: The present embodiments provide a system for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor in the shared-memory multiprocessor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. The system includes an interface, such as an application programming interface (API), and a system call interface or an instruction-set architecture (ISA) that provides access to a number of mechanisms for supporting targeted stores. These mechanisms include a thread-location mechanism that determines a location near where a thread is executing in the shared-memory multiprocessor, and a targeted-store mechanism that targets a store to a location (e.g., cache memory) in the shared-memory multiprocessor.

    Abstract translation: 本实施例提供一种用于在共享存储器多处理器中支持目标存储的系统。 目标商店使得第一处理器能够将存储在共享存储器多处理器中的第二处理器的高速缓冲存储器中的高速缓存行推送。 这消除了对多个高速缓存相干操作的需要,以将高速缓存行从第一处理器传送到第二处理器。 该系统包括诸如应用编程接口(API)的接口以及提供对用于支持目标商店的多种机制的访问的系统调用接口或指令集架构(ISA)。 这些机制包括一个线程定位机制,它确定线程在共享存储器多处理器中执行的位置附近的位置,以及将存储器定位到共享存储器多处理器中的位置(例如高速缓冲存储器)的目标存储机制 。

    Compact synchronization in managed runtimes

    公开(公告)号:US12045670B2

    公开(公告)日:2024-07-23

    申请号:US17245820

    申请日:2021-04-30

    CPC classification number: G06F9/526 G06F9/30087 G06F9/5016 G06F9/541 G06F9/542

    Abstract: A computer including multiple processors and memory implements a managed runtime providing a synchronization application programming interface (API) for threads that perform synchronized accesses to shared objects. A standardized header of objects includes a memory word storing an object identifier. To lock the object for synchronized access, the memory word may be converted to store the tail of a linked list of a first-in-first-out synchronization structures for threads waiting to acquire the lock, with the object identifier relocated to the list structure. The list structure may further include a stack of threads waiting on events related to the object, with the synchronization API additionally providing wait, notify and related synchronization operations. Upon determining that no threads hold or desire to hold the lock for the object and that no threads are waiting on events related to the object, the memory word may be restored to contain the object identifier.

    Systems and Methods for Safely Subscribing to Locks Using Hardware Extensions

    公开(公告)号:US20240028424A1

    公开(公告)日:2024-01-25

    申请号:US18478820

    申请日:2023-09-29

    CPC classification number: G06F9/526 G06F9/467 G06F9/3851 G06F9/30087

    Abstract: Transactional Lock Elision allows hardware transactions to execute unmodified critical sections protected by the same lock concurrently, by subscribing to the lock and verifying that it is available before committing the transaction. A “lazy subscription” optimization, which delays lock subscription, can potentially cause behavior that cannot occur when the critical sections are executed under the lock. Hardware extensions may provide mechanisms to ensure that lazy subscriptions are safe (e.g., that they result in correct behavior). Prior to executing a critical section transactionally, its lock and subscription code may be identified (e.g., by writing their locations to special registers). Prior to committing the transaction, the thread executing the critical section may verify that the correct lock was correctly subscribed to. If not, or if locations identified by the special registers have been modified, the transaction may be aborted. Nested critical sections associated with different lock types may invoke different subscription code.

    SCALABLE RANGE LOCKS
    26.
    发明公开

    公开(公告)号:US20230252081A1

    公开(公告)日:2023-08-10

    申请号:US18183891

    申请日:2023-03-14

    CPC classification number: G06F16/9024 G06F11/3006 G06F16/1774

    Abstract: A computer comprising one or more processors and memory may implement multiple threads performing mutually exclusive lock acquisition operations on disjoint ranges of a shared resource each using atomic compare and swap (CAS) operations. A linked list of currently locked ranges is maintained and, upon entry to a lock acquisition operation, a thread waits for all locked ranges overlapping the desired range to be released then inserts a descriptor for the desired range into the linked list using a single CAS operation. To release a locked range, a thread executes a single fetch and add (FAA) operation. The operation may be extended to support simultaneous exclusive and non-exclusive access by allowing overlapping ranges to be locked for non-exclusive access and by performing an additional validation after locking to provide conflict resolution should a conflict be detected.

    Ticket locks with enhanced waiting
    27.
    发明授权

    公开(公告)号:US11442730B2

    公开(公告)日:2022-09-13

    申请号:US16572532

    申请日:2019-09-16

    Abstract: A computer comprising one or more processors and memory may implement multiple threads that perform a lock operation using a data structure comprising an allocation field and a grant field. Upon entry to a lock operation, a thread allocates a ticket by atomically copying a ticket value contained in the allocation field and incrementing the allocation field. The thread compares the allocated ticket to the grant field. If they are unequal, the thread determines a number of waiting threads. If the number is above the threshold, the thread enters a long term wait operation comprising determining a location for long term wait value and waiting on changes to that value. If the number is below the threshold or the long term wait operation is complete, the thread waits for the grant value to equal the ticket to indicate that the lock is allocated.

    Generic concurrency restriction
    28.
    发明授权

    公开(公告)号:US11221891B2

    公开(公告)日:2022-01-11

    申请号:US16791178

    申请日:2020-02-14

    Abstract: Generic Concurrency Restriction (GCR) may divide a set of threads waiting to acquire a lock into two sets: an active set currently able to contend for the lock, and a passive set waiting for an opportunity to join the active set and contend for the lock. The number of threads in the active set may be limited to a predefined maximum or even a single thread. Generic Concurrency Restriction may be implemented as a wrapper around an existing lock implementation. Generic Concurrency Restriction may, in some embodiments, be unfair (e.g., to some threads) over the short term, but may improve the overall throughput of the underlying multithreaded application via passivation of a portion of the waiting threads.

    Compact NUMA-aware locks
    29.
    发明授权

    公开(公告)号:US10949264B2

    公开(公告)日:2021-03-16

    申请号:US16573863

    申请日:2019-09-17

    Abstract: A computer comprising multiple processors and non-uniform memory implements multiple threads that perform a lock operation using a shared lock structure that includes a pointer to a tail of a first-in-first-out (FIFO) queue of threads waiting to acquire the lock. To acquire the lock, a thread allocates and appends a data structure to the FIFO queue. The lock is released by selecting and notifying a waiting thread to which control is transferred, with the thread selected executing on the same processor socket as the thread controlling the lock. A secondary queue of threads is managed for threads deferred during the selection process and maintained within the data structures of the waiting threads such that no memory is required within the lock structure. If no threads executing on the same processor socket are waiting for the lock, entries in the secondary queue are transferred to the FIFO queue preserving FIFO order.

    Generic Concurrency Restriction
    30.
    发明申请

    公开(公告)号:US20200183759A1

    公开(公告)日:2020-06-11

    申请号:US16791178

    申请日:2020-02-14

    Abstract: Generic Concurrency Restriction (GCR) may divide a set of threads waiting to acquire a lock into two sets: an active set currently able to contend for the lock, and a passive set waiting for an opportunity to join the active set and contend for the lock. The number of threads in the active set may be limited to a predefined maximum or even a single thread. Generic Concurrency Restriction may be implemented as a wrapper around an existing lock implementation. Generic Concurrency Restriction may, in some embodiments, be unfair (e.g., to some threads) over the short term, but may improve the overall throughput of the underlying multithreaded application via passivation of a portion of the waiting threads.

Patent Agency Ranking